Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.
Agile and Lean techniques seem to be the best way we currently know to create complex software in the face of risk, uncertainty, and changing requirements. Agile hinges on embracing and adapting to change by enabling rapid feedback cycles and evolutionary development. However, bringing agility into big data (and small data) analytics has been a challenge for many, very bright and talented, data scientists and engineers. In this article we’ll explore what makes analytics uniquely different than application development, and how to adapt agile principles and practices to the nuances of analytics. We’ll also examine how the disciplines of data science and software development complement one another, and how these intersect in an agile project environment.
First let’s look at what differentiates analytics experts from software developers. C.F. Jeff Wu first introduced the term “data science” in 1998 as a discipline that encompasses statistical analysis, science, and advanced computing. The use of analytics by social media companies like LinkedIn, Facebook, and others in recent years has boosted the popularity of “data scientist” such that Harvard Business Review published an October 2012 article entitled “Data Scientist: The Sexiest Job Title of the 21st Century.” Simply put, a data scientist has a unique, and very deep, blend of the skills depicted in Figure 1.
Figure 1: The Disciplines of Data Science, Source: Calvin Andrus, Wikipedia
Data science skills are both complementary to, and overlapping with, software development skills. Data science requires programming, but data scientists are not often trained in modern software engineering practices. Conversely, many developers have skills in data engineering, advanced computing, and statistics, but these are not commonly their areas of deep expertise. Data scientists commonly code in multi-paradigm languages like R and Python, which have powerful statistics libraries and an active research community behind them.
Data engineering is the bridge between data science and software development. A data engineer supports the data scientist in data discovery, harvesting, and preparation. Data engineers support developers in operationalizing analytical models for production deployment, which we will discuss shortly. This role requires expertise in data management technologies (“big data”, NoSQL, and SQL), data modeling, data architectures, and data manipulation languages and techniques.
Read the rest of Agility, Big Data and Analytics on InfoQ.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.
Thoughtworks acknowledges the Traditional Owners of the land where we work and live, and their continued connection to Country. We pay our respects to Elders past and present. Aboriginal and Torres Strait Islander peoples were the world's first scientists, technologists, engineers and mathematicians. We celebrate the stories, culture and traditions of Aboriginal and Torres Strait Islander Elders of all communities who also work and live on this land.
As a company, we invite Thoughtworkers to be actively engaged in advancing reconciliation and strengthen their solidarity with the First Peoples of Australia. Since 2019, we have been working with Reconciliation Australia to formalize our commitment and take meaningful action to advance reconciliation. We invite you to review our Reconciliation Action Plan.