Building Sidekick: How I Used AI to Fix Incident Response
In March 2026, I stood in front of senior leadership to demo a tool we’d built in our spare time. It analyzed bug tickets and triaged CI failures. It reviewed pull requests, ran on a CI pipeline every ten minutes, and we’d built the whole thing in about three weeks.
The feedback was blunt: the presentation was scattered, but the tool was solid. We moved fast, broke things, fixed them, shipped. Polish came later.
The Problem
My team handles backend infrastructure for a complex product surface. A customer hits a bug or a CI test starts failing. Someone reads the ticket, finds the relevant code, traces the call chain, checks for prior incidents, and writes a diagnosis. On a good day that takes an hour. On a bad day the ticket sits in a queue because the team is heads-down on sprint work.
Code reviews had a similar bottleneck. PRs sat for days waiting for a reviewer with enough context to give useful feedback. The reviews that did happen stayed surface-level because the reviewer didn’t have time to trace changed code paths back to their callers.
We wanted to stop losing time.
Week One: The Ugly Prototype
The first version was a bash script that called our AI CLI tool (powered by Claude) with a bug ticket pasted into the prompt. No pipeline, no automation. I ran it from my laptop.
The output was bad. The AI hallucinated function names, referenced files that didn’t exist, mixed up context from previous conversations. Demoralizing when you’re trying to convince yourself this is worth pursuing.
A conversation with my manager shaped the rest of the project: we had a context problem. We were dumping ticket descriptions and stack traces into a single prompt and hoping the AI would sort it out. It couldn’t.
The Context Architecture
We rebuilt the prompt pipeline from scratch.
Give the AI less, not more. We wrote steering files that told the agent what to do and what to ignore. Each file stayed under two thousand tokens. If a piece of context wasn’t needed on every run, it didn’t go in the base prompt.
Isolate tasks. Instead of one long conversation, the analysis agent delegates code exploration to a subagent with its own clean context window. The parent gets back a summary. No cross-contamination between ticket analysis and codebase search.
Inject environment automatically. The agent knows which repo to search, which branch to check out, where the steering files live. The engineer doesn’t narrate any of this. Hooks and setup scripts handle it before the agent sees its first token.
I wrote about this architecture in a broader piece on AI context management. Sidekick was where we proved it worked.
The Pipeline
A scheduled CI pipeline runs every ten minutes, polling our issue tracker for tickets that someone labels for analysis. It triggers a child pipeline that:
- Fetches the full ticket: description, comments, attachments, linked issues
- Searches for similar past incidents
- Clones the relevant repo and checks out the affected branch
- Hands everything to the AI agent with structured instructions
- Extracts the analysis from the agent’s output
- Uploads it back to the ticket and posts to team chat
Code review follows the same pattern. Label a ticket for review, and the pipeline fetches the linked PR, grabs the diff, and produces a structured review with file-level risk triage and severity-ranked findings.
The system runs on CI. No custom infrastructure. No servers to maintain.
What Broke
The pipeline failed in ways I didn’t anticipate.
The TUI problem. The AI CLI has a terminal UI with progress spinners and status bars. In CI, those render into the output stream. Our first successful analysis was 161KB of garbled ANSI escape codes mixed with actual content. The fix: three CLI flags to disable the TUI, skip interactive prompts, and auto-approve tool calls. Three flags that took two days to discover.
The protocol crash. The first time the agent tried to query our issue tracker through its protocol server, the CLI crashed. Exit code 1, no useful error. We ripped out the dependency and had the agent use REST API calls. Simpler and more reliable.
The retry loop. The review pipeline couldn’t find a PR link in a ticket, so it removed the pending label but left the needed label. The next poll cycle rediscovered the ticket and triggered another run. And another. We found it after the pipeline had posted six duplicate “analysis started” comments to the same ticket.
Autonomous systems don’t call you when something goes wrong. They handle the failure or they don’t.
Going Headless
The original setup required a local runner on a developer’s desk. The AI CLI authenticated through a browser, so someone had to log in manually. This worked for our team but killed adoption for anyone else.
In April 2026, we migrated to headless mode: shared CI runners with API key authentication. No local machine required. The setup script installs the CLI, clones the repos, configures the agent, and runs the analysis. A new team can onboard in fifteen minutes.
We built an automatic fallback for quota exhaustion. If the API key runs dry mid-pipeline, the system detects the failure, triggers a new pipeline targeting a local runner, and retries there. A guard prevents infinite loops: the local pipeline never triggers another fallback.
What Worked
The tool caught real bugs. It one-shotted an obscure upgrade issue that would have taken hours to trace manually. On a set of four failing tests, it identified different root causes for each, catching a distinction the human investigator missed. On a PR review, it flagged a wildcard injection vulnerability and an ignored filter parameter. The PR author committed fixes for both.
During an urgent incident, an analysis came close to the actual root cause and assigned the owning squad. It gave the on-call engineer a running start.
Within a month of opening it up, multiple teams across the org were running Sidekick on their own ticket queues.
Lessons
The model is a commodity. The steering files and context architecture, that’s where the value lives. Our steering went through a team review with nearly ninety comments. Maintaining it is ongoing work, like documentation but with sharper consequences when it goes stale.
The pipeline has graceful fallbacks on optional steps, marker-based output extraction with fallback to raw output, stale-label cleanup for stuck tickets, retry with quota detection. The system runs unattended. It has to recover on its own.
Speed compounds. First working version: one week. Code review pipeline: another week. Multi-group support: another week. Headless migration: another week. None of these were polished. All of them shipped. Real usage data drove decisions instead of hypothetical requirements.
I used the AI CLI to build Sidekick. The steering files, the bash scripts, the pipeline config, the prompt engineering, the debugging, all done in conversation with the same AI the tool runs on. The tool is most useful when you know what you want and need help executing at speed.
Where It’s Going
The backlog has items we haven’t touched: automated test design documents from PR diffs and integration with our internal monitoring tools. Headless mode opens the door to running Sidekick as a service rather than a collection of developer laptops.
The core insight hasn’t changed since week one. Build the context pipeline right, give the model clear instructions and clean inputs, and it produces work you can use.