It's funny how even minor version updates can cause such a stir in the AI world. That was certainly the case with last week's release of GPT-5.4, which brought significant improvements for enterprise and general knowledge work.
We discussed the implications on last week's episode of This Week in AI — sorry we're a little late this time, but we're sure you'll appreciate the chance to reflect on some of the key stories from the last week or so.
Other things we discussed include:
- Claude Code launching a /loop command for scheduled tasks, which allows users to run prompts automatically on a recurring schedule. An automated code review feature was also introduced, highlighting a growing need for "antagonistic agents" to stress test quality.
- Google rolling out more native Gemini integration across Workspace apps like Docs, Sheets and Slides.
Alongside these launches and releases were also some important stories that flag continuing challenges around security and stability in an AI-assistance context:
- McKinsey's internal AI platform "Lilli" was compromised in two hours by autonomous offensive agents. The attack exploited a development version of the API documentation, which listed unauthenticated endpoints, granting read and write access to the entire production database, including the critical system prompts that define the agents' behavior
- High-blast radius outages at Amazon and persistent availability issues at other providers like Claude suggest that instability is a trade-off for the current speed of innovation in AI engineering (although Amazon denied the blame lies with AI coding this last week)
- An analysis — published on X — showed that LLMs often write "plausible code" instead of functionally correct or performant code. This underscores the necessity of automated performance tests and explicit instructions, as implicit knowledge kills AI effectiveness.
Finally, we also discussed a shift back towards the command line. In a security context, there's an emerging concept of a CLI vault, proposed as a way to route agent access through a secure gateway, which should prevent them from accessing environment variables and API keys. We also discussed the perils of the cognitive load of managing a swarm of agents, and agreed that what really matters is focusing on effectiveness and optimizing workflow bottlenecks. We don't need to just keep agents busy.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.