Debugging as Detective Work: Stop Guessing, Start Reasoning — Shivank Bhardwaj

I've watched engineers debug the same way for years: something breaks, they form an immediate intuition about why, they go fix that thing, and when it doesn't work, they form another intuition. Repeat until solved or until someone more patient takes over.

The intuition-driven approach occasionally gets lucky. But it's systematically inefficient for bugs that aren't in the obvious place, and it almost never produces lasting understanding.

Here's the frame that changed how I debug: treat it as a detective problem, not a search problem.

The difference

In a search problem, you look for something and keep looking until you find it — randomly or with some heuristic, but the model is "keep going until you hit the answer."

In a detective problem, you reason from evidence to a conclusion. You gather observations, form hypotheses, and actively try to falsify them with the cheapest experiment that would distinguish them.

Most debugging is treated as search. It should be treated as detection.

The five steps

1. Characterize the symptom precisely.

Before any hypothesis, write down exactly what's happening. Not "it's slow" — "p99 on this endpoint went from 200ms to 800ms, starting Friday evening, only on requests that hit the pricing path." Precision constrains the hypothesis space. "It's slow" admits hundreds of causes; "slow since Friday's deploy, only on one path" admits maybe five.

2. Form multiple hypotheses — not one.

This is the step most engineers skip. They form one hypothesis (usually whatever they saw last time), test it, and when it fails, form another. That's sequential intuition-following.

Better: before touching anything, write down three to five hypotheses that could explain the symptom as characterized. Make them mutually exclusive where you can, or at least distinguishable by experiment. For a latency regression that might be: an infra change, a new query in the latest deploy, an N+1 from increased data volume, lock contention, or a network change to the database.

3. Find the cheapest falsifying experiment.

For each hypothesis ask: what's the cheapest evidence that would eliminate it? Not confirm — eliminate. Confirmation is seductive; elimination is faster. Check the deploy log (two minutes — rules out infra). Check the slow-query log for lock waits (five minutes — rules out contention). You can often drop from five hypotheses to two inside ten minutes without touching code.

4. Instrument, don't assume.

Eventually you exhaust cheap eliminations and need new evidence. This is where debugging gets expensive: people add a log line where they expect the bug, rather than where it would generate the most information regardless of hypothesis.

The best instrumentation is at boundaries — between caller and callee, between services, between a function's inputs and outputs. Timing at boundaries tells you where the time goes without requiring you to already know. One good instrumentation pass at the right boundary can eliminate several hypotheses at once.

5. Root cause, then fix — in that order.

Once you've found the location, you still need the mechanism. "This function is slow" is a location, not a root cause. "This function makes N sequential DB calls because a loop issues one query per item" is a root cause — and it makes both the fix and the prevention obvious. Engineers in a hurry skip from location to fix, which produces patches that treat symptoms and post-mortems that say "we cached the slow query" instead of "we removed an N+1 at the service boundary."

What changes when you use this frame

Debugging gets faster for hard bugs, even though the upfront hypothesis-forming feels slower. The expected cost of a session is roughly (probability of a hypothesis × cost to test it). If you test in order of cheapest-elimination-first, you minimize total expected time. Random intuition-following doesn't optimize that; systematic falsification does.

The other change is compounding. When you reason explicitly about what could cause a symptom, you build a model of the system. When you stumble onto the answer by intuition, you mostly just remember the one answer. The model is what makes the next investigation faster.

Where this shows up in my work

Most of my debugging lately is in distributed, event-driven systems, where the failure is rarely where the symptom is — a consumer looks slow because an upstream producer changed its event shape; a query looks flaky because a client's default retry policy changed under it. Those are environments that punish guessing and reward boundary instrumentation, because the cause and the symptom are usually in different services.

That's the compound return on detective-style debugging: the systematic habit, plus the system model it builds, is what turns a multi-day mystery into a bounded investigation.