Stepwise traps and the limit of continuous improvement • fdbck lps

Beginning with the end in mind is received wisdom, probably because it usually works. But what happens when it doesn’t?

Continuous improvement

Let’s say you run some kind of stream-aligned team and you’re not getting the results you want. Maybe your widget defect rate skews high and you start getting angry emails, or maybe consumers mysteriously stop buying your product. Then your boss gives you the performance talk during a 1:1 , and you know you’ve got to react.

You quickly rule out monocausality. After all, there’s an entire constellation of factors out there that can affect your outcome, and way fewer of them have effect size zero than you might like. So most of the potential drivers you jot down might be playing some role here. But it would be unscientific to change everything at once and hope for the best, so what do you do? You probably start by finding the single factor with the biggest systemic effect. You do this by developing hypotheses, drawing causal diagrams, running iterative experiments to test your views, and adjusting accordingly. Then you find the next-largest contributor through the same kind of testing, and so on. Eventually, either you’ve made it so far down the list that you’re basically nitpicking (in which case the problem is solved and your widget DPMO plummets to record lows) or enough time has passed for new issues to arise (in which case goto 1).

Over time you get so good at this that you start to do it proactively, in a kind of rhythm choreographed by instinct. The practice starts spreading through your team, then your division. Even peers closer to the periphery of Dunbar’s number begin to notice your success and ask how you manage to generate such outsize returns. You think for a second, then christen it… forward stepwise regression.

Two months later, your competitor finds a global maximum that you missed, they launch a new product segment, and they proceed to eat your lunch, crusts and all.

Reversing causal arrows considered harmful

Okay, obviously nobody thinks stepwise regression and continuous improvement are the same. For one thing, the inventor of continuous improvement was literally a statistician. And at any rate, improving “constantly and forever” was only one of Deming’s 14 points. It would be unfair to strip away the entire context of total quality management by reading one sentence in a vacuum.

But I’m starting to see the stepwise regression approach to continuous improvement in more places than it perhaps ought to be. I don’t know if this is a recent development or if I’m only catching it now that I have a name for it. But either way it’s subtle, and it’s subtle because it’s slow.

The stepwise trap essentially reverses the arrows in the causal diagram by swapping the actor (process-creator) and the object (process), and by inverting the legibility of the waste. The following may be a helpful way to frame the reversal:

notice waste
iterate to reduce said waste
design and formalize process to reduce waste by default
follow process
follow process regardless of legibility of waste

How bad is this, really? Can’t local maxima be pretty nice, especially if you’re willing to lower your standards? That may be so, but I think there are a few compelling reasons to double back on stepwiseness.

Interpassive processes obscure problems

In the numbered list above, there’s a clear inflection point somewhere around step 4. I think we reach that point precisely when we allow interpassive thinking to take hold in a system. Basically, process empowers you to reduce waste in a predictable, standardized way. But if you’re not careful, you might also accidentally empower the process — to free you from having to think about the problem at all. In this sense, the process watches for waste so that you don’t have to. And just as you probably never actually caught up on the 2010s-era blog posts you sent to your read-it-later app of choice,¹ you lose line of sight when you play the interpassive set-and-forget game with problem ownership.

Critically, this loss of control is never intentional. Nobody buys a DVR intending to never watch the episodes of 24 they’d like to tape. And nobody institutes a process expecting that it will obscure the problem it was intended to solve. But both happen anyway.

You might see this interpassive dynamic crop up when things go wrong. For instance, when a defective product escapes QA, you’ll sometimes hear the responsible individual pivot to talking about the existence of their review process as a justification, as if it weren’t that very same process that failed to catch the defect in the first place! On some level, the QA process is watching for defects so that no human has to, at least not until they get hauled in front of their boss.

Now, I want to be clear about the scope of my critique for two reasons. First, much like culture, process is descriptively just a “ready-made set of solutions for human problems”,² neither good nor bad in isolation. And second, delegating observability to a process is often a force multiplier. As Erlend Hamberg writes, when “your trust in your system is in place, your subconsciousness will stop keeping track of all the things you need to do and stop constantly reminding you. This reduces stress and frees up precious brain time to more productive thinking”. If you believe that, then you should also believe that process works, up to a point. But tiptoe further and you’ll start to lose visibility.³

Predictable processes accrete

Alright, we’ve established that things look dodgy beyond the inflection point. But what’s so bad about keeping two hands on the reins? Well, there’s also a reinforcing feedback loop at play here, one hidden within the usual inertial justifications: “we’ve always done it this way”, “perfect is the enemy of good”, “let’s get it out the door and worry about that in the optimization phase”, etc. This is the linguistic register of a systemic force that is incredibly hard to control because it’s so predictable and therefore memetic. While predictability is often a useful tool,⁴ it tends to push you headlong toward the inflection point, accreting as it goes. This shouldn’t be too surprising; after all, when a process works the first time, people want to do it faster and more frequently. So the flywheel effect empowers the process to become even more predictable and even harder to escape.

A good example of accretion in action is the pilot project. Such projects have an uncanny ability to end up in production as the default approach endorsed by the org. But the oft-promised “phase 2” of the pilot is inversely unlikely to see the light of day, which means the sketchy scope and resourcing assumptions underlying the pilot aren’t adjusted to reflect mission success. In this sense, “pilots” are organizational contronyms; they cease to become “pilots” exactly when they’re given that label.

This example also motivates why the inflection point is earlier than you might have guessed: the reinforcing feedback loop of predictability takes hold the first time a process is seen to work. And since reinforcing feedback loops are superlinear, early inflection points are really dangerous.

Observability is not a panacea

So when’s the right moment to grab the policy levers and try to balance out the predictability feedback loop? Could observability as a value system enable more timely course-correction and more legible waste? This endgame still isn’t much of a winner in my view, for three reasons.

First, if you drive on black ice and start to lose control of your car, oversteering and slamming the brakes is basically the worst thing you can do. But that’s exactly what observability does when it encounters a reinforcing feedback loop! Metrics are designed to be predictable,⁵ and as we saw earlier, predictable things become superlinearly hard to steer. So by the time you try to wrest back control, anything more than a light touch will send the system into a spinout.

Second, even if you commit to quantify everything and build out the instrumentation required to track it, you still need an institutional way to zoom out far enough to find unexpected global maxima. That won’t work if your metrics are as predictable as your processes. One way I’ve seen this happen is when orgs choose a metric for which they control only part of the fraction — say, they own the policy levers to shift the denominator but the numerator is out of their hands. In such cases, if a purely notional, reasonably motivated grey-hat actor foresees downward shifts in the numerator’s behaviour, they may also see… opportunities to game the metric by providing a strictly worse service.⁶ But often this perverse global incentive is obscured from the org itself, because the metric — or the process used to design it — is missing a scroll wheel. There’s also the related problem that if lots of KPIs end up incentive-misaligned, then their imprecision may not even improve your R², which probably won’t stop your competitor from eating your lunch either.

And finally, we should think about how the purpose of metrics affects their fitness for avoiding the stepwise trap. Metrics are fundamentally tools to increase system legibility. Now, legibility is great if you like data, but the fact is that some domains naturally resist quantification.⁷ So if you impose a brute-force metric on these areas, then you may in fact strip them of their essential character. This is culturally destructive for Seeing Like a State reasons; you may end up killing the forest for the trees.

Moving laterally to move forward

If it’s so hard to pull the reins at the critical moment before the reinforcing feedback loop dominates, can we reap any asymmetric upside from delegating to a process?

What’s worked well for me so far is injecting as much creativity into early- and mid-stage process as possible. I typically look for a healthy inflow of outsider ideas or counterfactuals that can de-striate the system for long enough to expose other potential maxima. In systems language, lateral thinking turns our reinforcing feedback loop into a second-order system where creativity and counterfactuals are potential damping parameters.

Another concrete technique I like to employ is something I wrote about a few years ago in my second-brain post:

… grab ideas from a variety of disciplines, connect them in interesting ways, turn the nodes of the resulting connected graph into sentence fragments, then rearrange them on a page…

This interdisciplinary approach keeps me busy grabbing books off the shelf to hunt down half-remembered references that I should’ve more diligently logged in Obsidian. But when I do manage to track down a bold or unexpected idea, it often helps interactivize the process in question.

Another technique I like is applying a lateral thinking tool that can expose the fracture planes in conventional approaches. I quite like the Oblique Strategies and Liberating Structures for this purpose. The Oblique Strategies are better for solo work though; I wouldn’t necessarily break them out during a team meeting.⁸

Two words of caution. First, if you use these approaches, you will almost certainly introduce some measure of tension into the system. A little tension is a good thing; as Niels Bohr (probably) said, paradoxes are what give us hope of making progress. But the system will naturally push back against that tension, which can feel both unpleasant and unpleasantly personal. The key is that this resistance propagates through a balancing feedback loop, which we can map with a causal loop diagram to understand the often unintuitive factors underlying the pushback. Then it’s just a matter of finding the right creative lever to nudge the system back toward steady state and minimize the undertow.

Lastly, this mindset may totally change where you end up! So if you must begin with the end in mind, try giving yourself the creative freedom to get there via lateral, non-stepwise paths. The view might be better than what you expected when you set out.

Footnotes

Mine was Instapaper. ↩
Oscar Lewis in The Culture of Poverty. ↩
In GTD-land this rhymes with adding a thousand entries to your inbox every day, which turns your productivity management tool into Tristram Shandy. ↩
Try designing a taxation regime without interpretation bulletins and watch where your tax base migrates! ↩
This is indeed the whole point of frameworks like the SMART criteria. ↩
For example, if this purely notional individual were financially accountable for advertising and had reason to believe that a looming recession might affect consumer behaviour, cheesing CAC would be as easy as cutting ad spend. This is why counterweight metrics exist! ↩
To adapt Seeing Like a State, metrics are “far more static and schematic than the actual social phenomena they presume to typify” (p. 46). ↩
I fired up the Oblique Strategies app while writing this post and was gifted the idea to “put in earplugs”. An apposite suggestion, as the cat was unusually loud today. ↩