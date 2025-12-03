Vibe coding may be hype, but it hints at how site reliability engineers (SREs)could soon work: side-by-side with AI that embeds intelligence directly into reliability workflows. Among the many facets of SRE, the most immediate frontier for codebots lies in the investigation loop — triage, diagnosis and remediation. Here, the promise is tangible: reducing alert fatigue, accelerating MTTR and raising system quality through faster and more precise fixes or long-term optimization.

Working with vendors such as Qodo, Groundcover and Weights & Biases, we tested how AI assistants can chain together this loop. Our early findings reveal significant potential and exciting opportunities. This post shares those learnings and explores what they signal for the future of trusted AI in SRE investigation.

Why SRE investigation is critical

Investigation sits at the heart of SRE, bridging the gap between noisy alerts and meaningful remediation. It’s where SRE engineers triage incidents, diagnose root causes and propose fixes under pressure — directly impacting MTTR and, ultimately, system reliability.

Unlike resilience engineering or postmortems, investigation is immediate, high-stakes and repetitive, making it an ideal proving ground for AI coding assistants.

What can AI coding assistants do in SRE?

Effective AI coding assistants follow a disciplined pattern:

Awareness → Plan → Generate → Merge.

They first build awareness of complex contexts before writing any code. They then plan a resolution carefully — scoping the problem, handling exceptions and aligning with quality or compliance guidance — rather than rushing into implementation.

Next, they generate code that reflects best practices and balances short and long-term benefits. Finally, they assess the potential impact and merge changes into the trunk with confidence.

In the context of SRE investigation, the same pattern (table below.) applies.