The Adoption War Is Over. The Quality War Is Being Lost.

92% of US developers use AI coding tools. Forty-six percent of new code on GitHub is AI-assisted. And across a year of model releases, the security pass rate of that code has stayed flat while the coding benchmarks climbed. The adoption question is settled. The governing question is now the only one that matters, and it resolves at the gate.

Two facts establish where the software industry actually is in 2026, and they should be read together. The first is that the adoption war is over. GitHub's widely cited figure puts AI coding tool use among US developers at 92 percent, and the share of code on the platform that is now AI-assisted has reached 46 percent, with credible projections of 60 percent by year end. There is no longer a meaningful debate about whether AI writes enterprise code. It does, at close to half of all new volume, and the curve is still steepening. The second fact is the one the first one makes urgent: the code is not getting safer as fast as it is getting more plentiful, and by at least one rigorous measure it is not getting safer at all.

Veracode has been running the same controlled test across more than a hundred large language models since 2025, completing security-sensitive coding tasks in Java, Python, C#, and JavaScript, then running static analysis to see whether the generated code is vulnerable. The headline result has barely moved: roughly 45 percent of AI-generated samples fail the security tests, and on cross-site scripting specifically the failure rate on susceptible tasks runs near 86 percent. The detail that matters most is the trajectory. Across the testing period, the overall security pass rate stayed flat at around 55 percent, even as the coding-capability benchmarks the same models are measured against climbed steadily. The models got measurably better at writing code that works. They got no better at writing code that is safe.

The historical frame

Every general-purpose technology goes through a phase where adoption outruns governance, and the gap between the two is where the cost accumulates. The automobile arrived two decades before the traffic signal, the driver's license, and the crash standard. The consumer internet reached ubiquity before anyone had built the security, privacy, or content-moderation apparatus it required. The pattern is not that the technology is bad. The pattern is that adoption is driven by the benefit, which is immediate and visible, while governance is driven by the cost, which is delayed and diffuse. Adoption always wins the first lap, and governance has to catch up under pressure once the cost becomes undeniable.

AI-generated code is at exactly that point in the cycle. The benefit, velocity, is immediate, visible, and now thoroughly proven; 46 percent of new code did not become AI-assisted because engineers were skeptical of the speed. The cost, the security and architectural debt riding along with the velocity, is delayed and diffuse, surfacing weeks or months later as an incident, an audit finding, or a CVE. The benefit closed the adoption question in about three years. The cost is now forcing the governing question open.

What changed: passing the test stopped meaning what it used to mean

The flat security pass rate alongside rising capability benchmarks is the most important single finding in the recent data, because it severs a connection most engineering organizations still rely on. The implicit assumption underneath most CI pipelines is that a model good enough to write working code is a model whose code is good enough to ship. The benchmark scores and the test suite are treated as proxies for shippability. The Veracode trajectory says the proxy is broken: capability and safety are now improving on different curves, and the gap between them is widening as the models get better at the thing the benchmarks measure.

This is the same severance N° 008 identified between SWE-bench scores and what actually merges, now confirmed on the security axis with a year of controlled data behind it. A model can pass the functional test, satisfy the linter, clear the benchmark, and still produce a cross-site scripting vulnerability the overwhelming majority of the time on susceptible tasks. The test suite was never designed to catch that, and making the model better at the benchmark does not make the test suite better at catching it. Passing the test and being safe to ship have come apart, and they are not converging.

The models got measurably better at writing code that works. They got no better at writing code that is safe. Passing the test and being safe to ship have come apart.

The principle: when velocity is free, the gate is the binding constraint

Economics has a precise idea for this situation. In any production system, the output is limited by whichever input is scarcest, the binding constraint. For two decades the binding constraint on shipping software was the speed of writing it. Generating code was expensive, so tools, headcount, and process were all optimized to produce more of it faster. AI has now driven the cost of generating code toward zero. When the formerly scarce input becomes abundant, the binding constraint does not disappear. It moves to whatever input is now scarcest.

The scarcest input in an AI-saturated pipeline is no longer the writing. It is the judgment about whether what was written should ship, whether it is safe, whether it fits, whether the organization can be accountable for it in production. That judgment lives at the gate: the review and merge decision where a candidate change becomes the company's code. Velocity being free has not relieved pressure on the gate. It has concentrated all of the remaining pressure there, because the gate is now the one place in the pipeline where the scarce input, judgment, is still required and still cannot be faked by a passing test.

This is why the merge gate is the control plane for AI-generated code, a claim this publication has made on architectural grounds and that the market data now makes on economic grounds. The architecture said the gate is the only point with full context that is still upstream of consequence. The economics say the gate is the binding constraint on the entire system's output and risk. Both lines arrive at the same place. When two independent arguments converge on the same point, the point is usually load-bearing.

The implications

For engineering leaders, the data settles a prioritization question that has been contested in many organizations. Investment in generating more code faster now has sharply diminishing returns, because generation is no longer the constraint. Investment in the gate, in making the merge decision faster, better-informed, and properly recorded, now has increasing returns, because the gate is where the unrelieved pressure has concentrated. The organization that keeps optimizing generation while leaving the gate as a tired human glancing at a diff is pouring resource into the input that is already abundant and starving the one that is now scarce.

For security teams, the flat pass rate is the end of a hope that has quietly shaped a lot of planning: that the model vendors would solve this upstream, that the next release would be the one where the security numbers caught up to the capability numbers. A year of controlled data says that is not happening on the timeline the velocity requires. The XSS failure rate near 86 percent on susceptible tasks is not an artifact of an early model generation; it has persisted across releases that improved on every capability axis. Waiting for the model to get safe is not a strategy. Governing the output is.

For the board and the auditor, the convergence of the adoption data and the quality data is the disclosure-relevant fact. The enterprise is now shipping AI-assisted code at close to half of new volume, the security properties of that code are not improving, and the regulatory frameworks arriving this year presume a defensible record of human judgment behind production changes. The exposure is the product of those three facts, and it is growing on the adoption curve. The question a board should be asking is not whether the organization uses AI to write code, that question is closed at 92 percent, but whether it can produce, on demand, the record of who decided each of those changes was safe to ship, and on what basis.

The closing observation

The adoption war is the one everyone watched, and it ended the way the benefit always wins it. The quality war is the one that matters now, and it is being lost in the specific, measurable sense that the safety of the code is not keeping pace with the volume or with the capability. That gap is not a temporary artifact of immature tooling. It is the structural signature of a technology whose benefit is immediate and whose cost is deferred, at the point in the cycle where the cost is coming due.

Governance always catches up to adoption eventually, because the cost eventually becomes undeniable and forces the catch-up. The only variable is how much accumulates in the gap before it does. For AI-generated code, the gap is the difference between a passing test and a safe ship, multiplied by 46 percent of all new code, compounding on a steepening curve. The place that difference gets resolved, or doesn't, is the gate. Everything the velocity made cheap flows through it, and it is the last place a human judgment is still required before the code becomes the company's problem.

When writing code became free, the binding constraint moved to the decision about whether it should ship. That decision is the gate. It is the only scarce thing left in the pipeline.

End N° 013