Script from my podcast at – AI – AI Development Life Cycle
Teams using AI are shipping more than they ever have. They’re also breaking things more than they ever have. And the burnout that all this speed was supposed to fix? That still exists.
The 2025 DORA report notes that AI adoption among software development professionals is at ninety percent now. That’s not a trend anymore; that’s the baseline. The same report found that higher AI adoption correlates with increased delivery but it also correlates with increased delivery instability. Output is up. But the health of the system around that output hasn’t caught up. Speed rose; the structure around it didn’t.
Why? Here’s my read. We’re heading toward fully AI-powered development — agents that don’t just suggest the next line of code, but plan, build, test, and deploy entire applications. Humans don’t disappear from that picture, but the role shifts. Instead of writing the software, you’re governing the outcome. You’re the decision-maker in the loop, not the person doing every step.
The problem is, the process we’re running all this on was never designed for that world. The traditional SDLC — the Software Development Life Cycle most teams run on — assumes humans doing every phase. Drop AI agents into that model and some things get faster. But the structure itself doesn’t change. And that mismatch is exactly what’s showing up in the data.
That’s the gap AIDLC — the AI Development Life Cycle — is trying to close. Not another tool layered on top of an old process. A rethink of the lifecycle itself, for the world we’re actually moving into.
WHERE MOST COMPANIES ACTUALLY ARE
In a recent book excerpt published as part of their AI Revolution in Software Development work, McKinsey laid out a framework that’s worth using as a map here. They describe the progress of AI in software as four levels of developer support.
- Level 1 is developing without AI at all. Here the developer writes every line.
- Level 2 is where most companies are right now: AI speeds up individual tasks. You write a few lines, the AI suggests the next ten lines. A meaningful productivity boost, but the human’s still driving everything.
- Level 3 is where things start to shift. A developer describes a new feature in plain English, and the AI generates the first version of the code, the tests, and the documentation. McKinsey notes Level 3 is increasingly being adopted as models have evolved from simple inline tools into systems that can run long, multi-file tasks on their own.
- Level 4 is where AIDLC becomes the necessary operating model. At Level 4, a small team guides a coordinated system of AI agents that can deliver an entire application end to end — design, code, testing, integration — raising only the decisions that genuinely require human judgment. McKinsey calls Level 4 largely experimental as of the time of writing, but with promising developments already emerging.
That one word — coordinated — is the operative one. At Levels 2 and 3, AI is still a tool attached to an individual developer. At Level 4, you’re orchestrating multiple agents across the full lifecycle. And that requires something qualitatively different: a lifecycle model designed for it. That’s AIDLC.
Here’s the challenge, though. Most companies are trying to capture Level 4 outcomes while still running Level 2 workflows. They’ve added agents to an SDLC that was built for humans doing every step. The delivery gains that should follow from coordinated agents don’t materialize — because the coordination layer, the governance, the shared context, the human approval gates, was never built.
And that gap is where the enterprise knowledge problem lives. For complex systems built over decades, the harder constraint comes earlier than coding speed. No single person knows the full depth of the system. Architecture decisions were made by people who’ve left. Requirements were never cleanly documented. The true subject matter experts rotate out, and the tribal knowledge they carried goes with them. Forrester has a phrase for the data version of this problem: data without meaning is unusable for autonomous systems. The same holds for the knowledge that never made it into a data system at all — the undocumented requirements, the implicit design decisions, the understanding that exists only in people’s heads.
In that environment — and McKinsey’s modernization research makes the same point — the bottleneck is rarely writing the code. The slow part is the understanding work that comes before it: the loops of discovery, mapping, and reconciliation. Aligning with business and product. Reverse-engineering the existing system deeply enough to know what’s safe to change — the kind of archaeology a subject matter expert might need weeks, sometimes months, to complete.
And here’s why that’s a Level 4 problem specifically. At Levels 2 and 3, a human developer is still in the driver’s seat, and that developer carries the context — they know what the system does and what’s safe to touch, so the AI just has to be fast. Level 4 is the case where no one carries that context anymore. You can’t hand a coordinated system of agents a goal and expect them to deliver an application end to end if the knowledge they’d need to act on it lives only in people’s heads — or in people who’ve already left. So before the agents can build anything, that understanding has to be reconstructed and turned into something they can actually use. That reconstruction is the real starting line — the understanding work I’ll call Discovery for the rest of this episode.
And this isn’t just my framing. Remember that Forrester line about data without meaning? It comes from their Top 10 Emerging Technologies for 2026— and that same report reaches a similar conclusion about the lifecycle itself, from the outside. It ranks each technology by how soon it pays off, and it places agentic software development — agents that generate and refine software across the lifecycle — in the medium-term bucket: real and coming, but a few years away from delivering significant benefits. The revealing part is why. The reasons they give are that agent coordination needs to mature and stronger guardrails need to be in place. Notice what’s not on that list: the raw capability of the models. The blockers are coordination and governance — operating-model problems, not technology problems.
WHAT LEVEL 4 LOOKS LIKE IN PRACTICE
So what does a Level 4 lifecycle look like when a serious vendor tries to operationalize it? AWS has published one — they call it the AI-Driven Development Lifecycle. It’s open source on GitHub, and it runs across Kiro, Claude Code, Cursor, Copilot, and other agents.
The AWS framework organizes all work into three phases: Inception, Construction, and Operations. What I’ve been calling Discovery maps onto Inception — where AI turns business intent into requirements, user stories, and units of work. Construction is where the domain models, architecture, code, and tests get produced. Operations covers deployment, monitoring, and incident management.
Rather than walking you through every stage, here’s the shape of the thing — three principles.
First: rigor adapts to complexity. A simple bug fix compresses most of the process and goes nearly straight to code generation. A complex system migration runs the full sequence — requirements analysis, application design, decomposition into units of work. The framework decides how much process the request deserves.
Second: nothing moves without a human gate. At every stage, the agent asks clarifying questions, produces a plan, and waits for human approval before proceeding. Construction works incrementally — unit by approved unit — rather than generating the whole solution in one pass. And every decision, input, and approval is logged with timestamps as the work happens, building a complete audit trail.
Third — and this is a detail that matters to enterprise teams that deal with years old legacy code — Inception includes a reverse engineering stage for brownfield work: legacy systems where the documentation doesn’t exist or can’t be trusted. The framework’s documented position is that understanding has to be generated from the code itself before requirements can be reliably developed with the business. That’s the Discovery problem, named and addressed directly inside a Level 4 framework.
Point to note. In the open-source release today, Operations is essentially a placeholder — a phase reserved for future expansion rather than fully built out. The richer Operations vision AWS describes in its methodology writing — deployment planning & execution, monitoring & observability, feeding incident response, change management — is intent, not functionality.
THE FULL AIDLC, END TO END – My View
Now let me set the published frameworks aside and describe the lifecycle the way I think about it — independent of any one vendor’s model. If you’re building an AIDLC for real, here’s what it has to handle, stage by stage.
#1 Discovery. This is where you establish shared, machine-usable context — what the system already does, what the new business needs are, and the undocumented knowledge that lives only in people’s heads. For brownfield systems, that means archaeology on the existing code. In every case, it means turning ambiguity into specification the agents can act on. Get this wrong and every downstream stage inherits the error. It’s where the leverage is highest — and where most organizations underinvest.
#2 Solution design. Intent becomes architecture. The distinctive AIDLC move here is decomposition — breaking the work into small, well-scoped units with clear inputs, outputs, and acceptance criteria, so a coordinated set of agents can work them in parallel without drifting. This is also where you set the guardrails: security boundaries, non-functional requirements, what “good” looks like. Design is where you decide what the agents are allowed to do.
#3 Implementation. Agents generate the code, the tests, and the documentation per unit; humans review. The shift is from “write, then review” to “review continuously, at volume” — which makes review capacity, not coding speed, the real constraint. It’s the constraint most teams forget to staff for. And there’s a subtle trap: when the same agent writes the code and the tests, a passing test is weaker evidence than it used to be. Someone has to check that the tests actually test the right thing — verifying the verifier becomes part of the work. That can be a combination of a second agent and a human reviewer.
#4 Deployment. Once the units are built and reviewed, they still have to ship. Releases run through automated pipelines and infrastructure-as-code, but governed by risk: routine, reversible changes flow through automatically, while material changes still get a human gate. The aim is pipelines fast enough to keep up with agent throughput, with oversight proportional to risk — a gate that protects you without becoming the bottleneck.
#5 Operational support. Shipping isn’t the finish line. The system’s live, and agents can help here too — triaging alerts, investigating incidents, surfacing root causes. But that only works if observability keeps pace with the increased volume of change. Without it, speed just produces instability faster.
Running underneath all of this is the thing that makes it a system rather than five disconnected stages: persistent, linked context. Every requirement, design decision, unit, and test is captured as an artifact and traced both ways — so when something breaks in production, you can follow it back to the unit that built it, the design choice behind it, and the requirement that asked for it. That traceability isn’t bureaucracy. It’s what makes the next part possible.
And then the part most lifecycle diagrams leave out:
#6 The improvement feedback loop. Production is the richest source of truth you have about where the system could fall short. Incidents, latency, error patterns, regressions, real usage: all of it is signal. A mature AIDLC captures that signal and routes it back into Discovery, design, and implementation as improvement work — hardening reliability, paying down the tech debt the agents introduce, tightening guardrails where they made mistakes, strengthening tests where regressions slipped through.
The critical part — and the reason I draw it as its own stage — is that this is deliberately not the feature pipeline. New features carry their own pressure, and they’ll always win the prioritization fight. If you don’t protect a dedicated loop whose only job is making the existing system better, the system quietly degrades while the agents keep shipping volume. This loop is what turns raw speed into durable speed — the difference between shipping more, and shipping more that lasts..
Closing
Let me pull this together.
Forrester’s State of Agentic AI 2026 report found three-quarters of enterprise leaders adopting agentic AI — but only a small minority running it in meaningful production. Their diagnosis isn’t that the technology fell short; it’s that enterprise readiness hasn’t caught up. Which is the same story DORA tells from the inside: speed without the governance to match it doesn’t remove risk. It defers it.
So here’s my take. AIDLC isn’t mainly a construction problem — it’s a coordination and context problem that looks at the entire software assembly line starting from requirements to operations. That’s what McKinsey’s Level 4 really says: coordinated agents delivering end to end need human judgment at defined points, and that takes a lifecycle designed for it, not retrofitted onto one that wasn’t. And the foundation underneath all of it is Discovery — turning the knowledge your organization actually runs on into context the agents can act on.
So the reflection I’ll leave you with: what level of the McKinsey progression are you actually operating at — and does your lifecycle match it? If you’re running Level 4 agents on a Level 2 workflow, that’s the gap to close first.
SOURCES & CITATIONS
- McKinsey — The AI Revolution in Software Development
- McKinsey — AI for IT Modernization: Faster, Cheaper, Better
- AWS — AI-Driven Development Lifecycle (AI-DLC) Financial Services Blog
- AWS Labs — AI-DLC Workflows
- Forrester — Top 10 Emerging Technologies 2026
- Forrester — The State of Agentic AI, 2026
- Forrester — Top 10 Emerging Technologies 2026: Beyond Chat
- 2025 DORA State of AI-Assisted Software Development Report
- DORA — Balancing AI Tensions