Good metrics development is hard. There’s never just one metric (opens in new tab), since your operating context almost certainly isn’t dominated by one principal component. And your KPIs have to interrelate anyway, both for standard Goodhart reasons, but more interestingly because your metrics can degrade due to floor or ceiling effects (opens in new tab). Imagine you’re a decision-making body that must publish reasons. You decide to track time-to-decision, reasoning that it furthers transparency, but it’s easy to forget that procedural fairness constrains delivery speed. Eventually you can’t move any faster, and even slight slowdowns later on might not show up in your metric due to the floor.
This is of course why counterweight metrics exist. But what happens when your counterweight is gameable too? Imagine your decision-making body is now pitching an advisory service to proactively guide clients along the happy path. You might track the number of outreach events you organize, or the number of client questions you answer, or how quick your turnaround time is. Obviously you can game this by splitting up your team or churning out more, smaller specs, or whatever. But there’s a deeper outcome problem here because your classic counterweights — think client satisfaction — are also easy to game. The deliberate gaming strategy is just running outreach with your friends.
The unintentional version is more incisive though: your clients might not have exit options, so they’ll probably give your service five stars. This is just another shade of median mismatch (opens in new tab); the underserved population disappears entirely when we select for the parties who don’t have other options. And on some level, that’s all your clients, because they’re all bound by the very system that’s aiming to measure them.
If you can’t manufacture accountability from within a captured system, a counterweight for your counterweight loses salience due to inherent solipsism. Gaming the counterweight basically just becomes self-referential measurement. In this case, you need an external anchor point, something that’s not captured by your own system. In the decision-maker example, this probably looks like an audit function, or an independent researcher looking at frequency or severity of harm, or anything else that could find adversely against you even when your metrics are sky-high. It takes a bold leader to suggest a metric that rhymes with “let’s see how often we lose on judicial review”, but it’s a concrete way to avoid counterweight metrics that don’t measure anything real.
On the surface, this is the value proposition for management consultants — if your corporate governance model isn’t working, get an outsider to interview the players and recommend a structural fix. But everyone they interview will be captured too. The systems fix is to ask someone with independent accountability to observe the consequences instead. You can imagine an entire such network of self-reinforcing, but not self-referential, bilateral relationships. And while individual nodes can be captured, defunded, or hollowed out from within by bad counterweights, the network topology is much more resilient.