About the event:
Today’s organizations are faced with increasing pressure to become more “data-driven” or “AI-driven.” However, incorporating data science and data engineering approaches into the software development process presents a myriad of challenges. ThoughtWorks’ industry leading Continuous Delivery (CD) principles and practices can be applied to machine learning to solve these issues, uniting data scientists, developers, data engineers, and business stakeholders.
In this two-part series, you’ll learn from ThoughtWorks’ seasoned data experts on how to maintain productivity, collaborate effectively, and continuously and seamlessly deliver value in practice. These interactive sessions are geared toward tech and business practitioners.
Part One
What is CD4ML?
Attempts to get machine learning applications into production often fail because proof-of-concepts are not conducive to the delivery of real production applications at scale.
Machine Learning is usually taught from tutorials using small, clean datasets put into data-frames and orchestrated with Jupyter notebooks; all done in one, in-memory, local environment. While this is a fine format in theory, real industrial situations involve multiple environments and data sets from databases or other data stores rather than file-based input. They interact with live production systems and must be coordinated with software delivery teams and product owners. They must be production quality, with good design, well-tested, and maintainable. Data scientists are left to choose between the environment that they are used to and one that is suitable for delivery to production; leading to an awkward migration from one to the other.
Part Two:
CD4ML In Practice
We apply the CD4ML approach learned in session one to a hands-on, practical demonstration environment on your own laptop. We will demonstrate and guide participants through CI/CD (Continuous Integration/Continuous Delivery) practices for machine learning and a new pattern of working that avoids most of the pitfalls of typical proof-of-concept approaches.
We’ll use an open-source environment with common ML tools. Participants will learn how to utilize new patterns of repeatable continuous model development to collaborate effectively and deliver value continuously and seamlessly in industrial data science projects using CI/CD practices.
*All sessions will be held in English, closed captioning will be provided in recordings after each session