There's a name for how most teams are using AI right now.
Vibe coding.
You type something. It gives you something back. You look at it and think: "This is wrong." So you fix it. Then you do it again. And again.
That loop feels like progress. It isn't. It's rework.
The uncomfortable truth
The problem usually isn't the code. It's the specification.
AI doesn't misunderstand requirements. It fills in gaps. Silently. Confidently. At scale.
When the output is wrong, most teams do the obvious thing: they fix the code. The teams pulling ahead do something different. They fix the spec.
That is the shift: from reacting to output → to controlling it.
Specifications fail in predictable ways
Bad AI output is not a mystery. It is a diagnostic signal. The agent did exactly what the context told it to do. The context was incomplete, ambiguous, or over-prescriptive. There are three failure modes, and after you have seen them named you will recognize them on every team you work with.
Ambiguity. A sentence in the specification can be read two ways. The agent picks one. You intended the other. The code compiles, the tests you wrote pass, and the behavior is simply not what you expected. Words like should, appropriate, handle, and properly are the giveaway — each one delegates a decision to the agent without acknowledging that the decision is being delegated. "The function should parse the date string and return a date object" is the canonical example: which formats, which library, what happens on invalid input? The agent guesses. You lose.
Underspecification. The specification never mentions a requirement at all. The agent does not misinterpret intent; it never receives it. The happy path works. The first edge case you try explodes. "Read a CSV file, return a list of dictionaries" sounds complete right up until the file is empty, or the row has an extra column, or a field contains a quoted comma. Underspecification is the failure mode you only notice at 10:47 PM on a Tuesday, because the missing requirement only matters when someone hits it.
Overspecification. The specification dictates implementation instead of behavior. "Use a dictionary called _cache. Iterate with a for loop. Check the cache before reading the file." The agent follows the instructions faithfully and produces code that is correct, fragile, and locked into decisions that should have stayed flexible. The damage is invisible until you try to modify, port, or extend the system. Then you discover that the spec constrained the mechanism, not the outcome.
There is a fourth failure mode that only appears over time: specification drift. You edit the generated code to fix a bug but forget to update the specification. Or the agent produces output beyond what the spec asked for, and you accept it. The specification and the implementation gradually diverge. The next time you regenerate, a feature you had come to rely on quietly disappears — because the spec, the document that is supposed to be the source of truth, never knew about it.
The diagnostic sequence
Here is the loop the teams pulling ahead run every time validation fails:
- Identify the failing behavior. What did the output do wrong? What did it omit? Be specific — "it didn't handle empty input" is a fact, "it felt off" is not.
- Locate the specification gap. No relevant sentence exists? Underspecification. A sentence exists but reads two ways? Ambiguity. A sentence dictates a mechanism? Overspecification. The gap classifies the fix.
- Classify and fix the spec. Add precision for ambiguity. Add the missing requirement for underspecification. Replace an implementation detail with a behavioral constraint for overspecification.
- Regenerate and revalidate. Confirm the fix resolved the failure without breaking anything that already passed.
This is deterministic. You compare observed behavior against the specification, the gap tells you what to fix, and the fix goes into the document — not into the code. After a handful of cycles, your first-pass specifications become substantially more complete, and the rework loop collapses.
What a real specification looks like
A specification that a coding agent can implement reliably has a specific structure. It is not a paragraph of wishes. It is a contract: five components, every one of them present.
- Input contract — what the component receives. Types, formats, valid ranges, source. If the input is a file, the file format. If the input is a function argument, the type and constraints.
- Output contract — what the component produces. Return types, structure of output files, format of displayed information, success vs. failure behavior. "Returns a list of results" is weak. "Returns a JSON array where each element has
id(string),score(float, 0.0–1.0), andlabel(string)" is strong. - Acceptance criteria — testable assertions. "Given input X, the output must be Y." "When the input is empty, exit code 1 and print 'Error: empty input' to stderr." These are the bridge between the specification and the automated validation pipeline that gates the merge.
- Constraints — the boundaries the implementation must respect. Performance budgets. Dependency restrictions. Style requirements. Compatibility requirements. The things that are invisible in the functional description but non-negotiable in production.
- Examples — concrete input-output pairs. Examples disambiguate natural language and double as test cases. They are the single fastest way to tell the agent what you actually mean.
A 32-line specification can produce hundreds of lines of correct implementation. That ratio — short specification, long implementation — is why this operating model wins.
What to do this week
- Take one piece of code your team hand-edited after an AI generated it. Find the spec that produced it. If there isn't one, that is the finding. If there is, classify the failure: ambiguity, underspecification, overspecification, drift. You now know what to change — in the document, not the code.
- Add an "edge cases" section to every spec template on your team. Empty input. Oversized input. Malformed input. Missing fields. Extra fields. The failure mode you don't think about is the one the agent will fill in wrong.
- Write the next requirement change as a spec diff, not a prompt. When someone says "it should also do X," update the specification, then regenerate. If you patch the code directly, you have just planted the seed of specification drift.
- Make the rule explicit: we do not hand-edit generated code. The specification is the artifact under version control. The code is derivative. Every accepted change lives in the spec first. Every regeneration is a diff against the previous generation, reviewed for silent expansion.
The shift, one more time
Vibe coding feels fast. Reliable systems are fast — because they aren't spending the week re-prompting the same problem.
When output is wrong, fix the spec. Not the code.
Be honest: how much of your team's AI usage is still just fixing output?