AI-Generated Code: The Hidden Cost for Businesses

~13 min read · AI coding tools are excellent at generating working code. They have no opinions about how the pieces should fit together. For app owners, that's where the hidden cost lives — and where the next conversation has to start.

By 2026, almost every app being built uses AI coding tools somewhere in the stack. This isn't a controversial statement; it's just how software is made now. The developers your business hires use them. The agencies you might evaluate use them. The freelancers you might work with use them. Even the developers who claim they don't use them mostly do, just in smaller doses.

This is not, on its own, a bad thing. AI tools have made certain kinds of work dramatically faster and cheaper. A login screen that took two days to write five years ago takes two hours now. The same is true for a dozen kinds of standard features that used to require careful hand-work.

But there's a part of this that's getting written about poorly, and it's the part that matters most to app owners. The thing AI coding tools are good at — generating local, plausible solutions to bounded problems — is not the same thing as building software that holds together over time.

Pro Tip

The pieces are getting cheaper. The thing that connects the pieces hasn't.

This article is about what AI coding tools do well, what they don't do, and what that means for someone paying for an app to be built. The short version is in the title: AI tools don't have opinions, and a lot of what makes good software good is exactly the opinions that get omitted.

The Compliment First

Let's start with what's true about modern AI coding tools, because if we don't, the rest of this article will sound like a complaint about a technology that has, in fact, changed the field for the better.

AI coding tools are very good at local pattern matching. Asked to write a function that does a specific, well-defined thing, they write it competently — often better than a developer would write it on a tired Friday afternoon. They handle the kinds of cases that human developers forget about. They use standard idioms. They produce code that, if you reviewed each piece in isolation, you would approve of.

They're also extremely good at boilerplate. The kind of code that used to require hours of careful typing — setting up a form, wiring up an API call, scaffolding a screen — can now be generated in a moment. This is real productivity gain, not a marketing claim. Developers who use these tools well genuinely ship more code per day than they used to.

For some kinds of work, this is enough. A simple website. A small internal tool. A landing page. A weekend prototype to test an idea. For these, the local-pattern-matching strength of AI tools is everything you need.

The trouble starts when the project is bigger than the AI's window of attention — which is to say, when the project is most of what businesses actually need to build.

What Opinions Are, in Code

The word "opinion", when it comes to software, means something specific. It doesn't mean a developer's personal preference. It means a decision the codebase has made about how it works — a rule that's been encoded into the structure of the project and that subsequent features are expected to follow.

A simple example: an app might have the opinion that all authentication happens in one specific layer. Every API call goes through that layer; no other part of the app knows anything about tokens, sessions, or user identity. This is an opinion. Someone decided it. The decision is written into the structure.

Another example: an app might have the opinion that errors always have the same shape — a code, a human-readable message, and a category. Every endpoint returns errors in this shape; every screen knows how to handle this shape; nothing else is allowed. This is also an opinion.

Or: an app might have the opinion that the rules of the business live in one place, separately from the screens. The screens display things; the rules decide what's allowed; the database stores what happened. This is, in fact, the opinion we describe in the three apps inside your app — a foundational opinion that shapes almost every subsequent decision.

Opinions, in this sense, are guardrails. They prevent the next feature from being written in a way that contradicts the last one. They make it harder to do the wrong thing and easier to do the right one. The more opinionated a codebase is — within reason — the less likely it is to drift into the tangled state where every new feature breaks an old one.

The opinions are what makes a codebase a codebase, not just a pile of code.

What Happens When There Are No Opinions

Now consider what happens when each piece of code is generated independently, by a tool that doesn't know — and isn't asked to know — what opinions the project has.

The first feature is generated. It works. It has, implicitly, some idea of how authentication happens, how errors work, where the rules live. These ideas weren't decided by anyone; they were just whatever the AI produced first.

The second feature is generated. It works too. It has, also implicitly, some idea of how authentication happens, how errors work, where the rules live. These ideas weren't decided by anyone either, and they're not quite the same as the first feature's ideas.

The first feature stored the token in one place; the second feature stored it in a slightly different place. The first feature returned errors as one shape; the second feature returned them as a slightly different shape. The first feature handled the rules in the screen itself; the second feature, generated separately, also handled the rules in the screen — but with slightly different rules.

This continues. Five features in, the codebase has five subtly different ideas about how authentication, errors, and rules work. None of them are wrong. None of them are inconsistent enough to break anything immediately. But the opinions of the codebase — the rules that keep features from contradicting each other — are not there. There's nothing to keep features in line, because nothing decided what "in line" means.

Ten features in, the divergence has compounded. The developer is now spending time reconciling what each feature thinks authentication is, manually, every time they touch anything that depends on it. The codebase has no spine.

This is not because the AI is bad at writing code. The AI is good at writing code. The AI is not good at deciding what opinions the codebase should hold, because that's not what it's optimized to do.

It's optimized to produce the most likely next piece of code given the context — which is usually fine for the piece in question, and slowly catastrophic for the project as a whole.

A human architect's job — and this is the part of software work that has not gotten easier — is to hold the opinions of the codebase in mind. To say "no, this feature has to go through the auth layer", even when the AI would happily generate one that doesn't. To say "errors always look like this" even when the AI would happily generate ones that don't. To say "the rules go here, not in the screens" even when the AI would gladly put them wherever they're convenient.

Pro Tip

Without that architect, the AI doesn't fail at any single step. It just produces a codebase that, over time, has no opinions at all.

The 30-to-50 Feature Wall

We've seen a pattern enough times now to call it.

Apps that are built primarily with AI tools, by a developer who isn't actively holding the architecture in their head, tend to work fine for the first 10 to 20 features. Often impressively well. The team feels fast. The buyer feels lucky. The roadmap is moving.

Somewhere between feature 30 and feature 50, velocity falls off a cliff.

It isn't gradual. It's a wall. The reason is the geometry we described in the piece on technical debt. The number of possible feature-to-feature interactions grows roughly with the square of the feature count. Up to a point, the implicit divergences between features stay small enough not to matter. Past that point, every change touches three things, and two of them are unexpectedly connected to a fourth.

The 30-to-50 number isn't magic. It depends on the kind of app, the developer's skill, how clean the original opinions were, and how disciplined the AI usage has been. But the shape of the curve — fine, then still fine, then a sudden, unexpected drop — is consistent enough that we now look for it specifically when we evaluate someone's app.

If your app is past feature 30 and you've started hearing

We need to refactor that part first

before most new requests, you're at the wall. The wall doesn't mean the app is broken. It means the implicit opinions of the codebase have diverged enough that the next phase of growth requires making the opinions explicit — which is the kind of work AI tools cannot do for you, because making opinions explicit is the part where someone has to decide.

What This Looks Like From the Buyer's Seat

You don't see any of this directly. What you see are symptoms — the same ones we cover in the technical-debt piece, but with a specific AI-coding-tool flavor to them.

You see estimates creeping up, even on features that seem like they should be similar to features you've already paid for.

We already built notifications; why does the new kind of notification cost almost as much as the first one?

The answer, often, is that the first notification was generated with one set of implicit assumptions, and the new one would need to reconcile its assumptions with the first one's — which costs almost as much as building from scratch.

You see bugs that are hard to reproduce. A user reports something broken; the developer can't reproduce it on their machine; eventually it turns out to depend on the order in which two features interact, which neither feature was designed to be aware of.

You see "we need to refactor X before we can do Y" appearing in conversations more often. Each refactor sounds reasonable in isolation, but they're cumulative. The developer is doing the work of making opinions explicit, one piece at a time, while also trying to add features.

You see your developer apologizing more, even though they're working harder. They're not less competent than they were. They're working in a codebase whose opinions are starting to contradict each other, and reconciling those contradictions is becoming a significant fraction of the work.

If any of this sounds familiar, the framing in this article is probably the missing piece. It isn't your developer. It isn't even the AI tools. It's the structural cost of producing a codebase without explicit opinions, finally arriving.

The Mixed Model That Actually Works

This is not an article that ends with "stop using AI tools". That advice would be both wrong and impossible to follow. The tools are too useful, and they've too thoroughly become the way software is made.

The right model — the one we use ourselves and the one we recommend — is to use AI tools heavily for the parts they're good at, while making the opinions of the codebase explicit and human-held.

In practice, this looks like the following:

A human architect — sometimes the same person who'll write the code, sometimes a senior developer overseeing several juniors — decides the opinions of the project before any code is generated. Authentication happens in this layer. Errors always look like this. The rules of the business live here, not in the screens. The data has this shape and lives in this place.

These decisions are written down somewhere, even if briefly. They're the project's spine.

Then AI tools are used to generate the code that fits into this structure. The architect reviews what's generated not just for whether it works, but for whether it honors the opinions. A login screen that puts the auth logic in the screen itself gets rejected, even if it runs, because the project's opinion is that auth lives elsewhere.

The AI is happy to regenerate.

Over time, the opinions become so encoded into the structure of the project that the AI's generations naturally drift toward them. Each new feature has examples to follow. The opinions self-reinforce.

This is dramatically faster than building everything by hand, and dramatically more durable than letting the AI generate without oversight. It's also harder than either pure approach, because it requires a kind of judgment that neither pure-human nor pure-AI work needs. Holding opinions and reviewing AI output against them is its own skill.

What this means for a buyer: the developers who do this well are not the cheapest, and not the fastest in a short demo. But they produce apps that don't hit the 30-to-50 feature wall. The work they're doing in months one through three is partly invisible — it's the holding of opinions — but it pays itself back compounded by month 8.

We've written about what these opinions look like at the architecture level in our piece on clean architecture, and at the level of modeling the business itself in the introduction to domain-driven design. Both articles are, in a sense, about the same thing this one is: the value of explicit opinions in a world where it's increasingly easy to ship without any.

What to Ask the Developer Using These Tools

The question that separates a developer-with-AI from an AI-with-a-typist is short and uncomfortable.

Who decides what the contracts between features are?

If the answer is "we figure it out as we go", you're getting an AI-with-a-typist. If the answer is "I do, and here's what they are" you're getting a developer-with-AI.

Variations of this question work too. Where in the code are the rules of our business? If I asked you to add a new feature today, would you generate it first and then check it against the rest of the app, or check it against the rest of the app and then generate it? Do you have a written description of how authentication works in this project, or does it live in everyone's head?

The answers won't always be neat. Real engineering involves compromises and judgment calls. But a developer who's thinking architecturally will have answers, and a developer who isn't will have deflections. The texture of the conversation tells you which one you're dealing with.

This isn't about AI being bad. AI tools are a real and durable improvement in how software is built. They just don't, and probably won't anytime soon, hold opinions about your project.

That part remains a human job — and the developers who do it well are the ones whose work still holds together two years in.

On Owning Code That Was Largely Generated

There's one more wrinkle worth naming, because it affects the answer to "what did I actually buy?":

When a developer hands you a codebase that was largely generated, the question of ownership gets quietly complicated. The license terms of the tool that generated the code, the assumptions the developer made about who owns the output, the documentation of how it was put together — all of these can matter later, especially if you need to hand the project to someone else or if the code has to be audited.

We touch on this in our piece on what owning your code actually buys you. It's worth a separate read if you're paying for an AI-heavy build. The legal answer is usually clear in your favor, but the practical answer — can someone else pick this up — depends entirely on whether the opinions of the codebase were ever made explicit.

A generated codebase with no explicit opinions is hard to inherit, even when you legally own every line of it. Ownership and usability are not the same thing.

The Honest Close

It's tempting, in 2026, to imagine that AI coding tools have made the architecture conversation obsolete. They haven't. They've made it more important.

Pro Tip

When code was expensive to write, the architecture was implicit in the cost — there were so few features, you could see all of them in your head. When code is cheap to generate, the architecture has to be deliberate, because the cost of generating yet another inconsistent piece is now zero.

The discipline that used to be enforced by the difficulty of writing code now has to be enforced by the developer making the calls.

If your app was built mostly by AI tools, with no one holding the opinions, you may already be heading for the wall — or have hit it. The good news is that the wall is recoverable. The work that makes it recoverable is the same work that should have happened up front: deciding the opinions of the codebase, writing them into the structure, and bringing the divergent pieces into line.

It's not a rewrite. It's the architectural pass that wasn't done the first time. Done well, it transforms the app's velocity dramatically and quickly — usually within weeks, not months.

You don't need to know the engineering. You need to know that the opinions are missing, and that the absence of them is the cost you're paying. Once you know that, the path forward becomes a real choice instead of a guess.

Your AI Coding Tool Doesn't Have Opinions. That's the Problem.

The Compliment First

What Opinions Are, in Code

What Happens When There Are No Opinions

The 30-to-50 Feature Wall

What This Looks Like From the Buyer's Seat

The Mixed Model That Actually Works

What to Ask the Developer Using These Tools

On Owning Code That Was Largely Generated

The Honest Close

Related Topics

Ready to build your app?