AI, Toil, and the SRE Feedback Loops We Can’t Afford to Break
There’s a lot of energy right now around AI in incident management. Automating toil. Improving signal-to-noise. Self-healing systems. Agents that detect deviations, mitigate issues, and even resolve incidents before humans wake up. And honestly, I’m excited about it. There are real opportunities here to improve detection, triage, operational efficiency, and recovery speed. AI has the potential to meaningfully elevate how we run distributed platforms at scale. It’s also entirely possible that AI will transform the SDLC so profoundly that many of today’s assumptions will evolve. But in the world we still operate in today, there are a few important principles we need to keep top of mind as we adopt these capabilities. Feedback Loops Are How Systems and Engineers Learn If you go back to the DevOps movement, especially The Phoenix Project and the Three Ways, the second Way emphasizes fast, tight feedback loops. Engineers need to see the consequences of the systems they design. ...