How to pick the AI agents worth building
Map out the steps your work goes through, from idea to done. Find the one step everything waits on. Aim your next agent at it, not at the part that’s already fast. Pace everything else to its speed.
By now the pitch is in every keynote. Salesforce’s Agentforce is busy turning its customers into “agentic enterprises.” Microsoft ships a program literally named Agent Factory. IBM’s watsonx Orchestrate sells a control plane for running thousands of agents at once. The message is the same wherever you look, and now you’ve got the one-line version in your inbox: ”Where’s our agent strategy? I want us building these by the dozen.” So now you’re sketching the list. A drafting agent here, a research agent there, a QA agent, a triage agent for the support queue. Twenty workflows, twenty agents, a real factory of them. It feels productive. It will demo beautifully.
Before you spin up the first one, let me hand you a book. It’s called The Goal, a novel published in 1984 by Eliyahu Goldratt. I first read it in mechanical engineering school. I went back to it a few weeks ago, and what struck me is how completely it still applies, except the factory I kept picturing was one running on AI agents. Yes, the book itself is about a real factory: steel, machines, a loading dock, not a line of code in sight. Trust me anyway, because it is the most useful thing you’ll read about AI agents all month.
The hero is Alex Rogo, a plant manager with three months before headquarters shuts his plant for good. Alex’s one bright spot is a set of new robots that pushed productivity in one department up 36%. Everyone’s proud of them. Then Alex runs into Jonah, an old physics professor of his who now consults for factories, and Jonah asks three simple questions. Did the robots increase your sales? Did they reduce your inventory? Did they lower your operating expense? The answer to all three is no. The robots made one station faster and the business no healthier. The plant is still three months from closing.
Here’s the part that makes me smile. Goldratt was writing about big, clunky factory robots, the most exciting technology of his day. The technology has changed completely since then. The move people make with it hasn’t changed at all. A powerful new tool shows up, everyone rushes to point it at every task at once, and all that activity gets mistaken for real progress. The tool gets an upgrade every few years. The trap ships unchanged.
What actually held up is harder and more useful. The Goal forces one question on Alex over and over: what is a company even for? Not to run machines. Not to look modern. Not to automate for the sake of it. The book’s answer is blunt, and it’s still exactly right: a company exists to make money, and every robot, every process, and now every agent is worthless unless it moves that number. Swap “robot” for “agent,” and the book turns into a warning for anyone building AI agents inside a company: the factory urge to make more and make them faster is exactly the trap, and the only thing that counts is whether each one moves the business.
Agents are genuinely powerful, and building them is my day job. I find the use cases, work with developers to ship them for the company I work for, and build smaller ones for my own work on the side. But here’s the subtler truth the rush skips over: a powerful tool tends to create new problems while it solves old ones. An agent aimed at the wrong step doesn’t just fail to help. It quietly manufactures more work, a backlog of half-finished output that someone downstream now has to deal with. So the whole game is where you aim them, and that turns out to be the part everyone skips.
So here’s the bottom line, up front. A factory of agents is not the goal. The goal is throughput: the actual work that ships and moves the business. Pour agents onto a step that was never your slow point, and all you do is stack work in front of the one that actually is slow, which today is almost always a person reviewing and verifying what the agents produced. That slow step is your constraint. The winning move is the opposite of agents everywhere: find your one constraint, point your next agent at it, and pace everything else to its speed.
Here’s where this article goes, in four stops.
The trap. I’ll show you why building agents by the dozen stalls, and the factory novel that saw it coming.
The bottleneck. You’ll find the one step everything waits on, which is rarely where you’d guess.
The principles. A few ideas from the book for feeding that step instead of burying it under more agents.
The build. What aiming your next agent at the right step actually looks like, hands on keyboard.
By the end, you’ll have a simple way to look at any agent plan and say where the next agent actually belongs, plus the words to explain it to the people around you. It’s the difference between an agent strategy that looks busy and one that moves a number.
Hard hats on.
Why agents everywhere stalls
When the scoreboard is “agents deployed” or “workflows automated,” you optimize for the count, and the count is the easy thing to grow. Six new agents this quarter, twelve workflows touched, a demo that makes the room lean in. It feels like progress. But Goldratt put the whole trap in one line: ”activating a resource and utilizing a resource are not synonymous.” In plain English, keeping something busy is not the same as using it well. A factory machine stamping out parts nobody ordered is busy, but useless. An agent churning out drafts faster than anyone can check them is busy in exactly the same way. None of that activity is the business.
And the bill is already coming due. Gartner expects more than 40% of agentic AI projects to be canceled by the end of 2027, citing runaway cost and unclear business value. That is the same verdict Jonah, the book’s mentor, gave those factory robots, only now it lands on AI agents instead of machines.
So keep his three questions on a sticky note. They were written to judge a factory, and they barely change to judge an agent rollout:
Did you ship more of what the business actually needs?
Did the pile of half-finished, waiting-on-a-human work get smaller?
Did your cost to run the place, the model bills and the hours spent steering, go down?
If you can’t say yes to at least one without making another worse, you didn’t buy an agent strategy. You bought robots.
Those questions come from the only three numbers Goldratt says a company should track. He names them ”throughput, inventory and operational expense,” and once they click you start seeing them everywhere.
Throughput. The rate the system turns work into money out the door. For your team, it’s the work that actually ships and changes something for a customer, not the work that merely got generated.
Inventory. Everything you’ve paid for that’s sitting around not yet sold. For you, it’s every draft, pull request, and ticket an agent produced that’s still waiting on a human to finish it.
Operating expense. What you spend turning inventory into throughput, the cost of running the whole thing. For you, the model bills, the tools, and the hours your people spend steering and reviewing.
The robots moved none of the three. They just made a station that was never the slow part run faster, so half-finished parts piled up behind the slow station instead.
Here’s the deeper problem underneath, the one it takes Alex most of the book to see.
A factory isn’t a set of independent machines you keep individually busy. It’s a chain of steps that depend on each other, and what the whole chain produces is set by its slowest link, not by how hard the other links work. Alex’s plant was run the other way, every station scored on its own efficiency, every machine pushed to stay busy because an idle machine looked like waste. So the fast stations overproduced, and their output piled up as inventory in front of the slow ones. Every local chart looked healthy while the plant as a whole lost money. The robots didn’t cause that pattern. They poured fuel on it, supercharging a station that was never the constraint.
Sit with that, because your company is a factory too, even if nobody calls it one. Work comes in as a request and leaves as something shipped, and in between it moves down a line: someone drafts, someone reviews, someone approves, someone ships. Agents are the shiny new machines you just bolted onto the front of that line, and they make the drafting station absolutely roar. A line is only ever as fast as the station every job has to squeeze through, no matter how loud the front end gets. Build it the plant’s old way, with each agent scored on how busy it is, and you earn the plant’s old result.
That pile is the part most teams never name, and it’s worse than the factory kind. The robot’s parts were all good. This pile isn’t, because some of the output is wrong in confident, plausible ways, so the human downstream is inspecting for quality, not just clearing a backlog. You paid to create all of it, and none of it counts until someone finishes it and ships it. Which means the real question isn’t “how many agents.” It’s “where’s the slow station,” and the book finds the answer on a hiking trail.
Find your real bottleneck
Goldratt teaches the idea away from the factory, and it’s the scene that stuck with me for years. Halfway through the book, Alex, the plant manager, gets roped into leading a weekend hike for his son Dave’s scout troop, and he can’t keep the line together. The fast kids race ahead, a long gap opens in the middle, and the whole troop falls behind no matter how he reshuffles it. Watching that gap stretch and snap shut, he has the realization the rest of the book turns on. The troop is a chain, and a chain doesn’t travel at the speed of its quickest kid. It travels at the speed of its slowest. That kid is Herbie, a heavy boy lugging an overloaded pack at the very back. No amount of hustle from the kids up front buys back a minute, because sooner or later they all have to wait for Herbie.
So Alex does two things that feel backwards. He moves Herbie to the front, where nobody can be stuck behind someone slower. Then he opens Herbie’s pack and parcels the heavy gear out to the fast kids who have room to carry it. Herbie speeds up, and because Herbie now sets the pace, the whole troop speeds up with him. Pushing the fast kids was never the answer. Helping the slow one was.
That’s the whole theory of constraints in one walk. Every system has one step that sets the pace for all the others, and improving any step that isn’t that one is a mirage. Goldratt says it flatly: ”An hour lost at a bottleneck is an hour lost for the entire system. An hour saved at a non-bottleneck is a mirage.” Speed up the fast kids and the line just spreads out further. The only move that matters is finding Herbie and taking weight off Herbie.
So how do you find your Herbie? The book’s method is refreshingly low-tech: follow the piles. Alex doesn’t spot his slow machines in a report. He finds them by walking the factory floor and seeing the mountains of half-finished parts stacked up in front of them. A bottleneck always looks the same. There is a pile of work waiting in front of it, because the steps before it keep feeding it faster than it can keep up. And there are idle, starved steps behind it, because they have run out of work to do while everything is stuck upstream. So don’t theorize about where your slow step is. Go find where the work stacks up, and where people are stuck waiting on someone else.
Run that test on an AI-powered workflow today and the pile shows up in a predictable place. Generation got cheap and fast. A model writes the first draft, the first version of the code, the first pass of almost anything, and it costs almost nothing. When the cheap step gets ten times cheaper, it stops being the constraint, and the constraint moves to whatever it feeds. What it feeds is judgment. Someone has to read the output, decide whether it’s right, catch the confident mistake, and put their name on it. Review, verification, and the taste to know what’s worth shipping: that’s the heavy pack now.
The receipts back this up. When METR, an AI research lab, ran a randomized trial on experienced developers in 2025, the developers believed AI tools had made them 20% faster, and the measurements showed they’d actually gone 19% slower. Feeling productive and being productive came apart. Google Cloud’s 2025 DORA report found the matching pattern at the team level: AI lifts how much you produce, but it lowers delivery stability, because the acceleration exposes the weak step downstream instead of removing it. The time you save generating gets spent right back auditing, because nobody can review the volume the agents produce.
And this isn’t only an engineering story. Point agents at a marketing team and the one brand reviewer who keeps fifty draft variants on-voice becomes the slow step. Point them at support and the person who owns tone and policy is the gate a hundred suggested replies wait behind. Wherever a human still has to verify the work and own the result, that judgment is the new bottleneck, no matter how cheap the drafting got.
Now watch what the typical agent factory does: it pours more agents into generation, the step that was already fast. Every agent you add upstream of the constraint just hands it a taller pile. That’s not a throughput machine. It’s an inventory machine, and a very impressive one. Engineering teams already hit this exact wall with AI coding tools. Output jumped at first, then flattened out, because the extra code just piled up waiting for a human to review it.
Finding the bottleneck is the easy half. What you do once you’ve found it is where the throughput actually comes from, and the book gave me a handful of principles for that.
Work your real constraint
A quick word on what follows. These are principles from the book, not a rigid checklist to run top to bottom. The order helps, but take the ones that fit your situation and try them.
Map your flow first
The honest starting point isn’t “find the constraint.” You can’t find a thing you can’t see. The first move is to draw your actual flow, the path one real piece of work takes from idea to shipped.
Put one box per step in a row: request, spec, build, review, approve, ship. Then do the part everyone forgets. Between each pair of boxes, write down how long work usually sits there waiting to be picked up. In knowledge work, a task spends most of its life parked in a queue, not being worked on, which is exactly why value-stream mapping puts wait time on the board right next to work time. It’s also what Alex does when he walks his plant floor instead of reading reports: he follows the piles. Do it on a wall, in half an hour, with the people who actually do the work. They already know where it jams; the map just makes it impossible to deny.
Find your one real constraint
With the map and its wait times in front of you, the biggest pile of waiting work usually points straight at your bottleneck.
Right now, for most teams using AI, that slow step is review. Generation got cheap, so work piles up wherever a human still has to read it, judge it, and sign off. But usual is not the same as always, and reaching for the fashionable answer is its own trap. Look at your own map and check. For one team the slow step is writing the spec, because nothing gets built until the requirements are nailed down. For another it’s pulling the data, because one tricky query sits between every question and a real answer. For another it’s the security sign-off, or the final deploy, or any step that hangs on a rare approval.
Notice that none of those is a person. It’s a step in the process. The reviewer is not your bottleneck; the review step is. The difference matters, because you fix a step, you don’t fix a human. Find the step where work piles up, name it out loud, and write it down before you spend a dollar.
Get more out of it before you spend
Before you spend a dollar widening the slow step, get more out of the capacity it already has. This is the cheapest win there is, and the book makes it concrete.
In Alex’s plant, the bottleneck machines sat idle during lunch, because the operators all broke at the same time. So they staggered the breaks and the machines ran straight through: same machines, more output, zero dollars spent. They made two more moves like it. They moved quality inspection to before the bottleneck, so it never wasted a minute on a part that was going to be scrapped later. And they stopped feeding it parts for orders that weren’t even due, so every hour went to work that actually mattered.
Run those same three moves on a review step that has become your bottleneck:
Never let it sit idle. Keep a small batch of work queued and ready in front of it, so the review step never stalls waiting for something to land.
Never feed it junk. Put an AI agent in front of review to do a first pass. It waves through the obviously fine and flags the obviously broken, so the only things that reach a human are the genuine judgment calls.
Never feed it work that doesn’t matter. Stop sending output to review for a feature that just got cut or a draft nobody asked for. Every review hour should go to something that’s actually shipping.
None of that costs money or a new hire. You’re not making the slow step bigger yet. You’re making sure not a minute of it is wasted.
Pace the rest of the work to it
This is the one almost everyone gets wrong, because it feels backwards. Once you know your slow step, you deliberately slow everything else down to match its pace.
The book calls the mechanism the drum, the buffer, and the rope. The slow step is the drum: it sets the beat for the whole line. A small buffer of work sits right in front of it so it never starves. And a “rope” holds back the start of the line, releasing new work only as fast as the slow step can actually take it. In the plant, Alex stops dumping raw material onto the floor just to keep people busy, and the mountains of half-finished inventory melt away.
For an AI workflow it comes down to one rule: don’t let your agents generate faster than your slow step can absorb. In practice you cap how much unreviewed work is allowed to pile up. Say the review step can clear about thirty pull requests a day. You set the limit at roughly a day’s worth in the queue. When the queue hits that cap, generation pauses, and the people who would have started new work go help clear the backlog instead.
If that sounds like a Kanban board with work-in-progress limits, that is exactly where the idea came from. The rule of thumb is blunt, and it works: stop starting, and start finishing.
Then widen it, and aim your next agent there
Only now, after you’ve squeezed the free gains out of the slow step, do you actually spend on it. And the spending goes to the slow step, never to the part that’s already fast.
When Alex’s team ran out of no-cost tricks, they got their hands dirty. They hauled two old machines out of storage to take some load off the bottleneck, and they sent the overflow to an outside vendor. Not elegant, but it widened the exact step that was holding the whole plant back.
Your version is the same idea, aimed at your slow step. If review is the bottleneck, the most valuable thing you can build is not another generator. It is something that widens review itself:
A first-pass review agent that handles the easy 80% of changes, so a human only has to look at the hard 20%.
An eval (a quick automated check against examples you trust) that clears the routine cases, so a human only sees the ones it flags.
If your slow step is writing the spec instead, you build the agent that drafts the first version of the spec from the request, so the work starts at “edit this” rather than “write this from a blank page.” The agent worth building is the one that widens your slowest step. Every other agent just makes a bigger pile.
Expect the constraint to move
Here’s the part that catches people off guard. The moment you widen one slow step, the bottleneck doesn’t vanish. It jumps somewhere else. Fix review, and the slow step becomes deployment, or one overloaded approval, or a sign-off that was invisible only because review was hiding it.
In the book, once Alex fixes his machines, the constraint leaves the factory floor entirely. It moves first to the market, and then, to everyone’s surprise, to the company’s own outdated rules, policies that made sense years before and were now quietly throttling everything. Goldratt’s word for that trap is inertia: pouring effort into yesterday’s bottleneck long after it stopped being the problem. Half the time the real constraint isn’t a machine or even a step. It’s a rule nobody has revisited, like “every change needs three approvals.”
So this is never a one-time project. Put a recurring half-hour on the calendar to re-walk your flow map and ask one question: where does the work pile up now? The answer will have moved, and so should your next fix and your next agent.
That is what replaces “twenty workflows, twenty agents.” Not a sprawl you launch all at once, but one constraint found, fed, and widened, then the next one. Fewer agents, pointed at better places, and a number that finally moves.
Build the one that matters
So what does building that agent actually look like? Smaller than you’d think. Say your slow step is review. You don’t need a platform or a six-month program. In Claude Code, the smallest version is a subagent: a short markdown file (just a plain text file) in your project that gives an agent a name, a set of tools that can only read and never change anything, and a system prompt (a written instruction) spelling out exactly what to check on a first pass. It reads every change, flags the handful that need a person, and waves the rest through. Your reviewer stops drowning in trivial diffs and spends their scarce attention on the calls that actually need judgment. The same shape works far from code: a subagent that drafts the first version of a spec, or one that writes the first cut of the gnarly query for your analyst to correct.
The part people skip is what the agent checks against. The teams getting real mileage out of agents right now aren’t the ones with the best model, they’re the ones with the best evals. An eval is just a test with an answer key. For a first-pass review agent, the answer key is a handful of past changes where you already know which ones had real problems. You run the agent over them and see where it disagrees with you. Does it catch the genuinely broken ones and wave the clean ones through? You tighten its instructions until its calls line up with yours, and only then do you trust it on live work. Anthropic’s own write-up on evals is a hype-free place to start. An agent at your constraint with no eval behind it is just a faster way to be confidently wrong.
You don’t have to do any of this alone, and you shouldn’t. The people doing the work already know where it jams. Put the flow map on a wall and let your engineers point at the step that always stalls; ask the analyst what they wait on; ask the designer what sits in their queue. The framing that tends to land, whether you’re explaining it to the team or the people who hold the budget, is short: we don’t have an agent shortage, we have a constraint we haven’t named. Once everyone is looking at the same slow step, the argument about how many agents to build mostly answers itself.
One last thing, and it’s the part I find genuinely fun. The specific tools in this article will age fast. A year from now, generation will be cheaper, the agents sharper, and your slow step sitting somewhere new. None of that changes the move. Draw the flow, find the one step everything waits on, feed it, and aim your next build straight at it. That is why a novel about a steel plant still reads like a manual for building AI agents. The machines changed. The mistake didn’t, and neither did the fix.










