Artwork by Dougal McPherson

A lesser known contemporary of ChatGPT is Meta’s Galactica AI, which was trained on 48 million research papers and intended to support scientific writing. However, the demo was shut down after just two days because it produced misinformation and pseudoscience, all while sounding highly authoritative and convincing.

There are two takeaways from this cautionary tale. First, throwing mountains of data at a model doesn’t mean we will get outcomes aligned with our intent and expectations. Second, creating and productionizing ML models without comprehensive testing comes at a significant cost to humans and society. It also increases the risk of reputational and financial damage to a business. In a more recent story, a Google spokesperson echoed this when they said that Google Bard’s “$100 billion error” underscored “the importance of a rigorous testing process.”

We can and must test AI models. We can do so by using an array of ML testing techniques to uncover sources of error and harm before any models are released. By defining model fitness functions — objective measures of “good enough” or “better than before” — we can test our models and catch issues before they cause problems in production. If we struggle to articulate these model fitness functions, then we’ll likely eventually discover that it’s not fit for production. For generative AI applications, this will be a non-trivial effort that should be factored into decisions about which opportunities to pursue.

We must also test the system from a security perspective and conduct threat modeling exercises to identify potential failure modes (e.g., adversarial attacks and prompt injections). The absence of tests is a recipe for endless toil and production incidents for machine learning practitioners. A comprehensive test strategy is essential if you are to ensure that your investments lead to a high quality and delightful product.

4. Risk: Governance and ethics need to be a guiding framework, not an after-thought

“AI ethics is not separate from other ethics, siloed off into its own much sexier space. Ethics is ethics, and even AI ethics is ultimately about how we treat others and how we protect human rights, particularly of the most vulnerable.” – Rachel Thomas

The Thoughtworks-sponsored MIT State of Responsible Technology Report observed that responsible technology is not a feel-good platitude, but a tangible organizational characteristic that contributes to better customer acquisition and retention, improved brand perception and prevention of negative consequences and associated brand risk.

Responsible AI is a nascent but growing field, and there are assessment techniques — such as data ethics canvas, failure modes and effects analysis, among others — that you can employ to assess the ethical risks of your product. It’s always beneficial to “shift ethics left” (moving ethical considerations earlier in the process) by involving the relevant stakeholders — spanning product, engineering, legal, delivery, security, governance, test users, etc. — to identify:

Failure modes and sources of harm of a product

Actors who may compromise or abuse the product, and how

Segments who/which are vulnerable to adverse impacts

Corresponding mitigation strategies for each risk

The risks (where risk = likelihood x impact) we identify must be actively managed along with other delivery risks on an ongoing basis. These assessments are not once-off, check-box exercises; they should form part of an organization’s data governance and ML governance framework, with consideration of how governance can be lightweight and actionable.

5. Execution: Accelerate experimentation and delivery with Lean product engineering practices

Many organizations and teams start their AI/ML journey with high hopes, but almost inevitably struggle to realize the potential of AI due to unforeseen time sinks and unanticipated detours — the devil is in the data detail. In 2019, it was reported that 87% of data science projects never make it to production. In 2021, even among companies who have successfully deployed ML models in production, 64% of them take more than a month to deploy a new model, an increase from 56% in 2020.

These impediments in delivering value frustrates all involved — executives, investment sponsors, ML practitioners and product teams, among others. The good news is that it doesn’t have to be this way.

In our experience, Lean delivery practices have consistently helped us to iterate towards building the right thing by:

Focusing on the voice of the customer

Continuously improving our processes to increase the flow of value

Reduce waste when building AI solutions

Continuous delivery for machine learning (CD4ML) can also help us improve business outcomes by accelerating experimentation and improving reliability. Executives and engineering leaders can help steer the organization towards these desirable outcomes by advocating for effective engineering and delivery practices.

Parting thoughts



Amidst the prevailing media zeitgeist of “AI taking over the world”, our experience nudges us towards a more balanced view: which is that humans remain agents of change in our world and that AI is best suited to augment, not replace, humans. But without care, intention and integrity, the systems that some groups create can do more harm than good. As technologists, we have the ability and duty to design responsible, human-centric, AI-enabled systems to improve the outcomes for one another.