The Strange Side-Effect of AI Coding
We Suddenly Know the Cost of a Feature With Ridiculous Precision
One of the weirdest things about using AI to write software isn’t the speed, the convenience, or the “oh wow it actually worked” moments. It’s the fact that-for the first time ever - we get a shockingly clear picture of what it costs to build something.
In a traditional setup, you ask a team to add a feature like “generate a social graph image that combines multiple product photos into one.” The team looks at it, throws some points on the ticket, and off you go. Maybe it’s an 8-pointer. If you’re lucky, you’ve got at least three months of stable velocity with a consistent group of developers so you can say, “Yeah, 8 points usually takes us a week of dev time and three days of QA.” You take an engineer’s fully-loaded cost, do the same with QA, plug in the percentages of time, and you get… a ballpark. A ballpark with dents in the walls and bushes growing in left field.
Because even if your velocity tracking is pristine, the reality is messy. Humans are variable. Day to day, week to week, sprint to sprint - life happens. Systems are unpredictable. Dependencies change. Someone disappears into a rabbit hole. Someone gets sick. Someone realizes the system doesn’t behave the way the documentation hinted it did. Even the best teams get blindsided by things outside their control.
With AI coding, though, something strange shows up: a predictable cost curve.
When the AI Does the Heavy Lifting
Let’s assume most of the coding and test generation is now done through a tool like Cursor or Factory.ai. You start noticing a pattern: the majority of the work can be quantified by the number of tokens burned.
That’s wild because tokens have a literal price. So instead of “8 points = usually a week,” you’re suddenly looking at:
- AI Developer → 80% of effort
- AI SDET/Test Generator → 10%
- Human Developer (prompting, reviewing, fixing) → 5%
- Human QA (validation, sanity checks) → 5%
There’s no velocity guessing. No conversion of points to time. No wondering how much energy someone has this week. The model tells you exactly how many tokens it used, the costs are stable, and the process becomes way more measurable.
You still need the humans-but the human slice is small enough that the unpredictability shrinks dramatically. Even if their time varies by 50%, it doesn’t swing the whole project the way it used to.
Suddenly you can look at a past feature and say, with a straight face:
“This cost $27.42 in model usage, plus about 4 hours of human review. Cool.”
That level of clarity has never really existed in software development.
But Then Reality Shows Up
Of course, the whole thing isn’t magic. There are new sources of drift:
1. Models Change
New models appear, older ones get smarter or dumber, token prices shift, safety rails get tighter or looser. The same prompt today may not behave like it does next quarter.
2. Harnesses Change
Your agent, your prompt templates, your retrieval pipeline-whatever scaffolding you’re using - also evolves. If your harness gets better at isolating the task, token costs drop. If it gets worse, they spike.
3. Prompts Still Matter
Bad prompts are expensive. Not in money-though that’s part of it-but in wasted cycles. A vague or incomplete prompt leads to wrong results, extra iterations, more edits, more tests, and more human review. Same as with humans: garbage in → garbage out.
4. Context Still Matters
This is the one people underestimate. AI is incredibly fast, but it’s not omniscient. If the system it’s modifying is a hairball of legacy decisions and context nobody documented, the human with 10 years of scars is still the one who saves you from disaster. Those people shortcut the whole process because they simply know things the model doesn’t.
Experienced engineers don’t just write code. They prune ambiguity. They clarify intent. They narrow the possible solution space. That stuff drastically reduces the number of AI iterations needed - and, ironically, reduces the token bill.
So yes, the model does the heavy coding. But the “human in the loop” is still the one who stops the wheel from wobbling off.
The Twist
AI development gives us this illusion of precision - a number we can measure, track, and compare. But the accuracy still depends on all the messy human inputs that have always mattered: clarity, context, communication, and experience.
What’s different now is this:
Human variability used to be 100% of the cost.
Now it’s closer to 10%.
And that shift alone could change how we estimate, how we budget, and how we think about engineering work over the next decade.
If nothing else, it’s pretty wild that our new “developer” comes with a built-in odometer.