At Thoughtworks we're starting to see patterns of how people integrate various mixes of labelled and unlabelled data into systems. We're finding it needs careful planning and explicit loops to manage the different data flows. There are many questions though; At what point do you decide to introduce models around unlabelled data? How do you validate that it's worth the cost? These are all interesting questions that change from system to system! We would love to talk more about how we can help with any challenges you might have in this area.
It's common for modern machine learning systems to have more than a single model. We expect now to see multiple types of models augmenting data flow loops.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.