↑ Back to Contents
1

From Autocomplete to Software Factory

Levels of AI assistance and what each demands


ow do we use AI? Are we using it at the basic level, perhaps as a detached research tool (Google/StackOverflow replacement)? Do we use it for editing or explaining selected sections? To analyze or refactor within a single file or across multiple? Do we let the toolset build the plan, or do we strictly micromanage each step?

This chapter uses a ladder metaphor to illustrate the progress from simple, atomic tasks to complex, fully autonomous software factories. Each of these types has different implications for how we structure our teams, processes, and governance, and how much guidance, up-front planning, and supervision is required to get the most out of them.

For me it's usually a starting point to approach new topics through a Venn diagram, layer model, or something along those lines. Once an architect, always an architect, right? Anyway, I think there's a clear 'maturity model' or 'ambition level' about how much we're willing and capable of letting the AI do for us. You could argue that the trend is just an additional abstraction on top of old; bit like from Assembly to C to Java to Python. Or, from proprietary system to no code/low code or commercial SaaS services or platforms instead.

Over the course of my career, I've been involved hands-on in the actual development since the beginning and to this date. Sometimes to the annoyance of many developers, no doubt. Over these years, I've become less and less concerned about code: what language and stack you use is not as interesting as it was just a few years back. What's important is what it does and to what end, and of course, that the guy maintaining it won't come after us somewhere in the future. Fact is that most issues you encounter in typical software projects have ceased to be technical problems at all a long time ago, but organizational and human problems. The same applies to AI-assisted development: making the most of it means you need to get out of the 'code zone' regardless of how much pain that causes, and start thinking more about deliverables, processes, and workflows.

Four Levels of AI AssistanceFrom autocomplete to autonomous decision-making1Autocomplete & Code SuggestionsDeveloper initiates → AI completes. Low autonomy, high transparency.✓ RealisticSpecs: Minimal - the task is very smallRisk: Low or noneValidation: Human review2Structured Suggestions & DraftsAI generates multiple options → Developer selects & refines. Moderate autonomy.✓ RealisticSpecs: StructuredRisk: ModerateValidation: Human judgment3Autonomous Generation with Governance Gates (Factory 0.1)AI generates full feature → Passes automated tests & gates → Human approval on major.✓ Realistic (or -ish)Specs: DetailedRisk: Moderate to highValidation: Automated + Human4Autonomous Decisions (Aka the AGI Dream)AI makes autonomous decisions without human intervention. No governance gates needed.✗ Still Sci-fiSpecs: Made up on the flyRisk: UnknownValidation: Impossible
Click to enlarge

In the figure above, the starting point is something like autocomplete++. At the end of the spectrum (rainbow?) we have the fully autonomous AGI software factory. The mass-production of software is supposedly something where we just feed in high-level requirements and the system churns out a working product with minimal to no human involvement.

The level 3, the Factory 0.1 is definitely within reach or a reality for increasing part of the developer population. What it takes to get there and beyond is the thing I'm trying to shed some light with this book.

Developer perspective: The levels of AI assistance

The ladder model above is probably not going to convince anybody in the frontline trenches and foxholes. I know this because that's where I spend most of my working hours, too, so I figured to approach this from more practical perspectives.

Going up these ladders, developers are gonna ask questions like:

  • What do you really need to do to make this work?
  • What is the scope of the task on each ladder?
  • How much context do you need to provide to the AI to get good results (as a function of task size)?
  • What kind of instructions and specifications do you need to give?
  • How much review and iteration (and rework, rewrites, debugging, ...) is typically required?

Figuring out the starting point is of course a good place to start. There aren't many developers out there who are not already at least halfway on board. According to a recent poll on social media by a CEO of a Finnish company Sysart, Petri Mäenpää got 300 answers to a question: to what extent do you use AI tools in software development. The results were pretty clear: only 11% said they don't use AI tools at all, 36% were on the agentic level, and 24% were not writing any code themselves at all.

I'll walk through these levels below and will revisit them several times later in this book as well.

Atomic: Small context and easy to verify

The most common use of AI tools is rather simple. Ask for a specific thing related to a small context, and get a specific answer. This was the original offering equal to the browser-based AI prompting that people often refer to as 'AI'.

Goal
Fix a specific issue or make a small change with minimal context.
Context size
1-100ish lines, or a single error message.
Interaction style
Developer chooses what to include, and applies the changes manually to the source.

Imagine editing a single line, figuring out a syntax error or a linter warning, the context and intent are clear: fix something, change something, explain something. For example:

Example
You: "Make this function async and add error handling."
AI: "Here's the modified function with async/await and a try/catch block."

This kind of tasks are quite straightforward, and with good project configuration, results are consistently good. Bit like Intellisense on steroids. This kind of ability has been remarkably well integrated into most IDEs already for some time, so let's move on to a more ambitious level.

Smart: Single file context, refactoring, and feature additions

When we aim for a bigger scope, often hoping to save some effort, the problem becomes more complex.

Goal
Refactor or add functionality to a single file, module or a page.
Context size
A few hundred lines, or a single file.
Context source
User selects the file or section to edit.
Interaction style
Developer chooses what to include, and usually applies the changes one-by-one.

So when operating on this 'smart' level, you basically delegate tasks which span over a single module (a single file or a handful of files), the scope widens perhaps to a few hundred lines in total with a handful of cross-references to worry about.

This is pretty much the "Edit" mode in the early AI code assistant tools, where you could select a block of a file, and ask something to be done with it. The AI didn't have to look for more advice (and it didn't) and the new capabilities were more related to for instance refactoring, adding new features, or other reorganizing still keeping in a relatively small scope.

So let's consider a practical example.

Example
You: "Add a new function to calculate the average of an array."
AI: "Here's the new function added to your file." (+20 new lines)

This level of AI-assistance provides better results than what I can write myself, with a lot better comments. For the developer, the amount to check and verify typically remains manageable, and the risk of introducing new bugs or issues is low (or not any greater than you doing the same manually). And it's easy to revert if the damned thing doesn't compile after all or breaks a test.

IDE-integrated editing tools have been around forever. For instance, the LSP (Language Server Protocol) powered refactoring tools are still much faster for simple tasks like adding a parameter to a method, extracting a method, or renaming something. And won't eat your premium token budget. No need to use the AI for everything when you know the tricks.

When keeping our expectations on this level, the results are astoundingly good. Yes, you're likely to encounter hallucinations, usually in the form of assuming something that is not there. Like a non-existent method in another class being called, or adding some parameters to a common API which you should not touch. These are usually easy to spot and fix, as they are isolated incidents. You might be guessing where I am getting here when we go further: what if there were 100 places to fix after your new feature or fix spanning a lot of files or modules? Well... that's what you're gonna see happening when your elevator takes you up to more advanced levels. Not that AI wouldn't be able to fix the issues by itself -- it often is, especially if the error is something a compiler or linter can catch.

Agentic: Multiple file / system context working semi-autonomously

This is the level the majority of developers are already at, at least in some form. Compared to the previous level, the difference is huge. You merely express intent -- 'Do this' -- and let AI figure out what needs to be done and where, perhaps with some tips for the starting point, such as a stack trace or a test failure.

Goal
Implement a feature that requires changes across multiple files or modules, or refactor a subsystem.
Context size
Thousands of lines, or an entire codebase.
Context source
System discovers relevant files based on the task, or user selects multiple files.
Interaction style
Developer provides high-level instructions, and the AI determines which files to modify and how.

When scope expands to multiple files, regardless of whether they are selected by the user or discovered by the system, the problem becomes significantly more complex. Needless to say, the probability of misinterpretation increases substantially. Let's compare this to early 'Agent' modes in AI code assistants, where the tool could access the entire codebase and make changes across files.

This is also the level where 'direct' hallucinations, like made-up functions and naming errors, are most common, and where the early adopters often got most frustrated. Trickier problems arise from conflicting changes to several files, which for instance failed to identify the correct sequence of things to be done, or failed to understand e.g. intricacies of your front-end stack's page lifecycle. In the past couple of years, however, this kind of task is in much better shape, especially if your 'design patterns' are properly referred to in the context (more about that later). Many of the subtle bugs arise from this: syntactically correct code that looks very good does not work as intended.

Let's consider a practical example. This might work right off the bat, as it is probably something repeated in thousands of examples in the modern LLM's training data (might use an outdated API though).

Example
You: "Refactor the authentication module to support Google authentication."
AI: "Here's the refactored module with Google authentication support." (10 changed files, -1000 lines, +2000 new lines)

Anyway, ambitious refactoring across files, adding new features that require changes in multiple places, or even implementing a new module with interactions across the system are well within the capabilities of rather off-the-shelf configured agent-based tools, especially when you keep your ambition level in check and still know pretty well what you are doing.

Also with AI fighting on your side, you need to be able to validate the answers you get. Like "I know the correct answer but I'd happily let you do the heavy lifting to get there."

If you can't, you're no longer practicing engineering.

Key difference between 'Agentic' and the 'Smart' level is the context management and the iterative nature of things getting done. The AI figures out what it needs, and often tries to do so until it hits a barrier and/or your premium token limit. The developer no longer explicitly tells it what to look at, but relegates the 'blast radius estimation' and the dependency management to the AI. And as easy as it sounds, working code is not just text that you can traverse, but a dynamic, structured beast hard enough for a human to understand. There's a reason why we have debuggers and good Intellisense features.

Let me give you an example where things easily go south when operating on this level. Let's imagine you want to add new properties and behaviors to existing code, let's say some controls in your React app. Unfortunately, as it often is, this kind of technology may seem rather easy, but you need to know the higher-level patterns, possible limitations, and good practices to get it right. Just typing syntactically correct code, which passes the linter and even your unit tests, does not prove ****.

For instance, forgetting about proper cleanup, or using things that the linter does not catch but cause runtime errors, still happens.

When using agents, the amount of changes per session easily becomes too much to carefully review. Yes, I admit just pressing that 'Keep' or 'Accept all Changes' quite a few times, and yes, have often regretted that. "Commit often" is an essential practice to follow here: be prepared to roll back even long sessions (and why not, it's not your sweat and blood that paid for it).

Next, let's look at what happens when we start using much more independent agents to do the work for us.

Multi-agentic: System/product-level context

Goal
High-level feature development, architectural changes, and major refactoring.
Context size
Entire codebase, documentation, and relevant external information.
Context source
System discovers relevant context: code, documentation, and external resources.
Interaction style
Developer provides high-level requirements as documents and prompts.

In a way this is the Factory 0.1 (or perhaps, Factory 0.01) level as suggested in the ladder model. Instead of typing, refactoring, and refining the code, the developer is reduced to describing their input in documents or a multi-stage interview instead of prompting something simple (certainly that can still be done as long as the intent is crystal clear). This intent may be detailed or very simple based on the task at hand, but based on that, the 'factory' is expected to autonomously determine the necessary context (what to touch), what actions to take, and what changes need to be made, and somehow verify the outcome.

Example
You: "Implement a new recommendation engine based on user behavior data."
... a lot of tokens and time are spent ...
AI: "Here's the new recommendation engine implemented across the system." (Changes in data processing, database schema, API endpoints, and frontend components) (+1000 new lines, 20 changed files, +10 md new files with 2000 lines)

A major theme in this book is really about how to make this level of work both technically sustainable and organizationally manageable. And how the developer can remain in control and stay sane.

Even with the sample numbers thrown in above (which are easily on the low side when you get serious), additional governance and checkpoints are needed, no matter how competent the coding model you have is. At this scale AI will get things sometimes horribly wrong, in ways that are hard to detect. Nobody is going to or even able to review all that. That's the case without AI already: if you think somebody will review your 1000-line PR properly, you're fooling yourself.

Then onto what this means from the organizational perspective.

Business scale and scope

Final thing I'm going to briefly touch here is the business scale and scope of my ladder model. Using a T-shirt-like challenge sizing model (Scope), we see a (rather obvious perhaps?) pattern. The bigger the challenge or scope, and the more you look forward to getting from AI, the more you need to up your game in terms of AI adoption and the legwork around it.

I tried, engineer as I am, to collect the above into a Venn-ish diagram to illustrate the relationship between the levels of AI assistance and what kind of things you need to have in place to get there at different scopes.

Scaling AI Adoption: Who Needs to Change?Higher ambition requires alignment across all three scopesDeveloperTeamOrganizationTool proficiencyPrompting skillsCode verificationTechnical judgmentShared configurationsWorkflow designKnowledge sharingCoordination protocolsGovernance policiesChange managementCompliance & licensingStandards &patternsReview practicesAccountabilityCertificationNew roles &oversightProcess definitionSpecificationsValidationFeedback loopsAI Assistance LevelAtomicDeveloper onlySmartDev + Team overlapAgenticDev + Team scopesMulti-agenticAll three scopesMore scopeAtomicSmartMulti-agentic
The three scopes of AI adoption and how the levels of AI assistance expand through them.
Click to enlarge

In case you're not exhausted (or, bored) by this onslaught of complex diagrams, check the table below. It tries to summarize the relationship between the scope of the problem you're trying to solve, the level of AI assistance you need to get there, and what kind of organizational changes you need to make to get there. Yes, this is indeed a gross simplification of a very multivariate problem, but simply, the more ambitious your goals are, the more you'll need to up your game in terms of AI adoption and the legwork around it. In short, I'll postulate that no amount of jargon, you know, 'spec driven development', 'agents/agentic', 'factory', or buying more licences (and, pay-as-you-go tokens) of the next best-thing-since-sliced-bread CLI coding agent tool thrown on PowerPoint is gonna get you there. You need to learn to use them properly, and you need to have the right processes, governance, and organizational structure in place to get there.

Business ScopeSingle DeveloperTeamCompany
SmallAtomicNo changeBuy the Copilot. Get approval from the customer.
FeatureSmartShared settings and configurationsTraining, certificates
StreamAgenticBuild the workflow around AI assistanceProcesses, oversight, new roles
ProductMulti-agenticBuild the coordination layer and toolingEstablish governance, train and nominate experts, manage change

So, we've concluded (my snide remarks about slideware aside) that to reach the multi-agentic 'Software Factory' churning out production-ready code with minimal human involvement 24x7, a lot needs to change. You need to up your game from the developer all the way to how the company operates. Again, there's no product that does this with just a few simple prompts for anything more complex than the simplest apps. In the wild, things get more complex, restrictive, are often subject to regulation, have portions with no counterparts in GitHub (and hence, are not something your models know already). Coding that simple web page for a toy problem, or generating an entire website, does not prove a thing. You could clone a similar one from GitHub in minutes with no LLM any time.

The more you expect and the more ambitious you are, the more you need beyond Copilot licences and 2 hours of training for developers.

  • A single developer working on a feature can use less formal processes.
  • A company-wide product change requires extensive specification, validation, and coordination across the full system and teams.

Why this matters

If you are to get more ambitious, I'd suggest to to get the following in order. Limit the creativity, validate everything at every step, have clear rules (also beyond the agentic world), and align your team and the methods together.

  • Better specifications Constrain the search space
  • Stronger validation Catch errors before they become catastropies
  • Clear governance Ensure accountability
  • Team alignment Make the system predictable

Things I list above sure aren't new or even really specific to AI. In the old days you could fend your way out even if everything wasn't that systematic, governed etc.

With traditional development, a muddy specification might slow you down. With AI-assisted development, a muddy specification sends you in the wrong direction entirely.