We often work with organisations on AI opportunities for which the data doesn’t exist. Given the conventional wisdom that AI is driven by data, can we even pursue these opportunities without data?
Yes, we can, and there are many paths forward.
We have already covered a range of these techniques in previous mini-blogs. Let’s recap those and introduce some others, remembering that not all AI applications are based on supervised learning alone.
If we have no data, we devise a plan to curate a data set in parallel with model and product development. This could mean developing a UI that allows users to label data as part of their (improved) workflow, forming one of many active learning loops. It could also mean selecting an approach such as reinforcement learning, that solves the “cold start” problem of no historical data by learning through experimentation. If it’s too costly to acquire data by either of these means, we may still explore generated options through simulation, paired with optimisation techniques.
If we have unlabelled, unstructured, adjacent or historically incompatible data we can use techniques like transfer learning and representation learning to bootstrap more performant models and create flexible data products that can be rapidly adapted to novel situations, when new data becomes available. If we have only a small data set or noisy labels, we can amplify the training signal with weak labelling based on features inherent in the data, or by providing human annotators with data programming tools to cheaply create many weak hints.
Beyond recognised issues of bias, the COVID-19 pandemic demonstrates that historical labelled data can be rendered obsolete with extreme rapidity. Whatever your approach to AI with data, you also need to be thinking about doing AI without data, and how to rapidly validate and iterate solutions in real business processes or customer experiences. If this sounds interesting, please get in touch to learn more.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.