Debug

Something's broken. We find it. We fix it.

A focused intervention. Usually one to three weeks. We diagnose what's actually wrong, design the smallest fix, ship it, document what we did and why, and we leave. We don't set up the whole way of working. That's a separate engagement. Debug is the first-aid kit.

Often a useful gateway. After we ship the fix, you have a clear-eyed read on what a deeper engagement would look like. Plenty of clients move from Debug into Build, Ride shotgun, or Teach once they've seen us work.

Section I · Common diagnoses we run

Four kinds of broken we see most often.

AI assistant making things up An AI assistant in production is wrong some non-trivial fraction of the time. Usually the testing system isn't catching it because it's checking the wrong sample of work. The fix is rarely the assistant itself.

Testing running blind Tests are "passing" but quality is dropping in the wild. The test checklist is calibrated to last quarter's failure modes and missing this quarter's. We rebuild the checklist against the failures actually happening now.

iii

Quality control not catching bad work Things ship that shouldn't, and quality decays in production. The review process reads strict on paper but isn't being applied in practice. Usually a discipline problem, not a checklist problem.

No one's using what you built The automations exist, the AI assistants work, but no one's using them. Adoption is the missing piece. Surfacing them inside the tools the team already uses fixes most of these.

Section II · In practice

An AI assistant. Wrong 8% of the time in production.

A two-week engagement. The assistant itself was fine — well-grounded in the right data, sensible logic, sound prompts. The testing system was checking 0.5% of real calls, and the check was skipping the topics where the errors clustered.

2 weeks · 1 fix · 8% → 0.7% error rate · testing rebuilt

We rebuilt the test sampling to weight by topic frequency, raised it to 5% across the long tail, and added an alert when errors clustered. Error rate stabilized at 0.7%. We didn't set up the whole way of working — that's a Build or Teach engagement. Foundry left. They called us back for a Teach intensive a quarter later.

Section III · The method

Diagnose. Design. Ship. Document. Out.

Diagnose: 2–4 days. We trace the actual failure mode, not the reported one. The reported problem and the real problem are usually different.
Design: 1–2 days. The smallest fix that solves the diagnosed problem and survives a year of continued use. Not the most elegant fix; the most boring one.
Ship: 3–7 days. We write the fix, your team reviews it, we ship it together. Telemetry on the fix from day one.
Document & out: 1 day. A short writeup of what was wrong, what we changed, and what to watch for. We leave. You have the documentation forever.

Section IV · When to choose Debug

You need a fix. Not a transformation.

You have a specific, named problem. Not "we should be doing more with AI"; something like "this AI assistant is wrong 8% of the time" or "this automation stopped getting used last quarter."
You're not yet ready to commit to a multi-month engagement and want to see how we work first. Debug is a fair test.
You have an immovable deadline (board meeting, customer renewal, quarter-end) and need the fix shipped before a longer engagement could even start.

Section V · Talk to us

If something's broken, send us the symptoms. We'll triage.

foundrysolutionsai@proton.me →