ThoughtWorks
  • Kontakt
  • Español
  • Português
  • English
  • 中文
Übersicht
  • Delivery Mindset trifft Software-Exzellenz

    Verfolgen Sie einen innovativen Ansatz in der Softwareentwicklung, um noch schneller erfolgreich zu sein.

    Erkenntnisgestützte Entscheidungsfindung

    Nutzen Sie Ihre Datenbestände, um neue Geschäftsmöglichkeiten zu erschließen.

  • Betriebsmodelle ohne Reibungsverluste

    Verbessern Sie die Fähigkeit Ihres Unternehmens, auf Veränderungen zu reagieren.

    Plattform Strategie

    Entwicklung dynamischer Technologieplattformen, die sich an Ihre Geschäftsstrategie anpassen.

  • Experience Design und innovative Produkte

    Liefern Sie schnell außergewöhnliche Produkte und Kundenerlebnisse. Entwickeln Sie Design und Funktion kontinuierlich weiter.

    Partnerschaften

    Nutzung unseres Netzwerks aus vertrauenswürdigen Partnern, um noch bessere Ergebnisse für unsere Kunden zu erzielen.

Übersicht
  • Automobil
  • Clientech, Energie und Versorgung
  • Banken und Versicherungen
  • Gesundheit
  • Medien
  • Non-Profit
  • Öffentlicher Sektor
  • Handel und E-Commerce
  • Reise und Transport
Übersicht

Unsere Empfehlungen

  • Technologie

    Ausführliche Betrachtungen neuer Technologien.

  • Business

    Aktuelle Business-Insights, Strategien und Impulse für digitale Querdenker.

  • Kultur

    Insights zu Karrieremöglichkeiten und unsere Sicht auf soziale Gerechtigkeit und Inklusivität.

Digitale Veröffentlichungen und Tools

  • Technology Radar

    Unser Leitfaden für aktuelle Technologietrends.

  • Perspectives

    Unsere Publikation für digitale Vordenker*innen

  • Digital Fluency Model

    Ein Modell zur Priorisierung digitaler Fähigkeiten, um für das Unvorhersehbare bereit zu sein.

  • Decoder

    Der Technology-Guide für Business Entscheider

Alle Insights

  • Artikel

    Expertenwissen für Ihr Unternehmen.

  • Blogs

    Persönliche Perspektiven von ThoughtWorkern aus aller Welt.

  • Bücher

    Stöbern Sie durch unsere umfangreiche Bibliothek.

  • Podcasts

    Spannende Gespräche über das Neueste aus Business und Technologie.

Übersicht
  • Bewerbungsprozess

    Finde heraus, was dich in unserem Bewerbungsprozess erwartet.

  • Hochschulabsovent*innen und Quereinsteiger*innen

    Dein Einstieg in die IT-Welt.

  • Stellenangebote

    Finde offene Stellen in deiner Region.

  • In Kontakt bleiben

    Abonniere unsere monatlichen Updates.

Übersicht
  • Konferenzen und Events
  • Diversity und Inclusion
  • Neuigkeiten
  • Open Source
  • Management
  • Social Change
  • Español
  • Português
  • English
  • 中文
ThoughtWorksMenü
  • schließen   ✕
  • Unsere Services
  • Unsere Kunden
  • Insights
  • Karriere
  • Über uns
  • Kontakt
  • Zurück
  • schließen   ✕
  • Übersicht
  • Delivery Mindset trifft Software-Exzellenz

    Verfolgen Sie einen innovativen Ansatz in der Softwareentwicklung, um noch schneller erfolgreich zu sein.

  • Experience Design und innovative Produkte

    Liefern Sie schnell außergewöhnliche Produkte und Kundenerlebnisse. Entwickeln Sie Design und Funktion kontinuierlich weiter.

  • Betriebsmodelle ohne Reibungsverluste

    Verbessern Sie die Fähigkeit Ihres Unternehmens, auf Veränderungen zu reagieren.

  • Erkenntnisgestützte Entscheidungsfindung

    Nutzen Sie Ihre Datenbestände, um neue Geschäftsmöglichkeiten zu erschließen.

  • Partnerschaften

    Nutzung unseres Netzwerks aus vertrauenswürdigen Partnern, um noch bessere Ergebnisse für unsere Kunden zu erzielen.

  • Plattform Strategie

    Entwicklung dynamischer Technologieplattformen, die sich an Ihre Geschäftsstrategie anpassen.

  • Zurück
  • schließen   ✕
  • Übersicht
  • Automobil
  • Clientech, Energie und Versorgung
  • Banken und Versicherungen
  • Gesundheit
  • Medien
  • Non-Profit
  • Öffentlicher Sektor
  • Handel und E-Commerce
  • Reise und Transport
  • Zurück
  • schließen   ✕
  • Übersicht
  • Unsere Empfehlungen

  • Technologie

    Ausführliche Betrachtungen neuer Technologien.

  • Business

    Aktuelle Business-Insights, Strategien und Impulse für digitale Querdenker.

  • Kultur

    Insights zu Karrieremöglichkeiten und unsere Sicht auf soziale Gerechtigkeit und Inklusivität.

  • Digitale Veröffentlichungen und Tools

  • Technology Radar

    Unser Leitfaden für aktuelle Technologietrends.

  • Perspectives

    Unsere Publikation für digitale Vordenker*innen

  • Digital Fluency Model

    Ein Modell zur Priorisierung digitaler Fähigkeiten, um für das Unvorhersehbare bereit zu sein.

  • Decoder

    Der Technology-Guide für Business Entscheider

  • Alle Insights

  • Artikel

    Expertenwissen für Ihr Unternehmen.

  • Blogs

    Persönliche Perspektiven von ThoughtWorkern aus aller Welt.

  • Bücher

    Stöbern Sie durch unsere umfangreiche Bibliothek.

  • Podcasts

    Spannende Gespräche über das Neueste aus Business und Technologie.

  • Zurück
  • schließen   ✕
  • Übersicht
  • Bewerbungsprozess

    Finde heraus, was dich in unserem Bewerbungsprozess erwartet.

  • Hochschulabsovent*innen und Quereinsteiger*innen

    Dein Einstieg in die IT-Welt.

  • Stellenangebote

    Finde offene Stellen in deiner Region.

  • In Kontakt bleiben

    Abonniere unsere monatlichen Updates.

  • Zurück
  • schließen   ✕
  • Übersicht
  • Konferenzen und Events
  • Diversity und Inclusion
  • Neuigkeiten
  • Open Source
  • Management
  • Social Change
Blogs
Wählen Sie ein Thema
Alle Themen ansehenschließen
Technologie 
Agiles Projektmanagement Cloud Continuous Delivery  Data Science & Engineering Defending the Free Internet Evolutionäre Architekturen Experience Design IoT Sprachen, Tools & Frameworks Modernisierung bestehender Alt-Systeme Machine Learning & Artificial Intelligence Microservices Plattformen Sicherheit Software Testing Technologiestrategie 
Geschäft 
Financial Services Global Health Innovation Retail  Transformation 
Karriere 
Karriere Hacks Diversity und Inclusion Social Change 
Blogs

Themen

Thema auswählen
  • Technologie
    Technologie
  • Technologie Überblick
  • Agiles Projektmanagement
  • Cloud
  • Continuous Delivery
  • Data Science & Engineering
  • Defending the Free Internet
  • Evolutionäre Architekturen
  • Experience Design
  • IoT
  • Sprachen, Tools & Frameworks
  • Modernisierung bestehender Alt-Systeme
  • Machine Learning & Artificial Intelligence
  • Microservices
  • Plattformen
  • Sicherheit
  • Software Testing
  • Technologiestrategie
  • Geschäft
    Geschäft
  • Geschäft Überblick
  • Financial Services
  • Global Health
  • Innovation
  • Retail
  • Transformation
  • Karriere
    Karriere
  • Karriere Überblick
  • Karriere Hacks
  • Diversity und Inclusion
  • Social Change
Data Science & EngineeringBangaloreTechnologie

Trends in Big Data

Shyam Kurien Shyam Kurien

Published: Mar 4, 2014

Companies that aspire to achieve competitive advantage by using data as a key asset must build their execution plan around two phases, as described by HBR bloggers Redman & Sweeney:

  1. In the Lab phase, they must find interesting, novel, and useful insights about the real world from data.
  2. Thereafter, the focus moves to the Factory phase, where the challenge is to turn those insights into products and services, in most cases supported by a robust, scalable, and high performance data analytics platform.

For example, let us consider an online retail organisation trying to forecast the demand for various items in their inventory, with the objective of maximising sales conversion and minimising inventory carrying costs. Demand forecasting techniques that tell us how to use historical data (primarily sales) to forecast the sales in the future have been around for some time. However, the more aggressive players explore ways of honing these base models by exploiting the wealth of data they have at their disposal. Maybe the sales of a type of women’s accessory is seen to go up whenever the sales of a specific cut of jeans goes up. Or the sales of a controversial book is observed to be affected by sentiment expressed in tweets about the same in the past few days.

During the lab phase, the data scientists explore and experiment with data from various sources to identify the right signals that impact sales of various items and how they could be correlated, with the objective of building a model capturing this interdependence. Once this model is codified (which is usually in the form of a series of equations of some complexity), the next step is to build a robust and scalable application that runs the forecasting model every period looking at the data sources, extracting the defined signals, and providing the probable demand for the next period. This phase, where the data engineering team takes over to build the application, is referred to as the Factory phase.

The so-called “Big Data technologies” of various strains have brought about a sea change in the approach to analytics - be it descriptive or predictive in nature. However, as with any kind of technology solution, SMEs and startups demand a high degree of business responsiveness for an analytics solution as well. These high expectations can be met only if the tech team can achieve agility in both the phases described above. In the lab phase, this implies that the data scientists need to be provided with nimble and lightweight approaches and tools to explore and experiment with data, allowing analysts to fail fast at low cost. In the factory phase, the engineering teams tasked with the responsibility of productionising the insights require platforms, frameworks, and tools to enable them to work iteratively and rapidly.

Approaches to adoption

The ability of the Big Data technologies to cheaply handle unstructured or semi-structured data in large volumes is being leveraged by organisations to induce agility into the data mining and analysis. This is enabling the new breed of data scientists to experiment and fail fast with sophisticated modelling and/or machine learning techniques and analytics, shrinking the cycle time of taking newer models from conceptualisation to production. On the one side, organisations like Amazon, Facebook, etc. have used these technologies to build complex applications to generate insights that provide real competitive advantage to the businesses, monetising the data they have collected. On the other side, traditional organisations have also started adopting these technologies, relooking at the legacy approach of building Enterprise Data Warehouse (EDW) solutions. The traditional (waterfall-centric) approach to building EDWs based on concepts like enterprise data modelling, holistic master data management strategies, heavyweight enterprise data governance policies, etc. are expensive and non-agile.

Given the open source nature of most Big Data technologies, many organisations, vendors, and users alike have been contributing back to the community. This rapid maturing of the stack is pushing the adoption from innovators and early adopters to the mainstream in a very short period. We at ThoughtWorks believe that a number of the advancements in the Big Data space in the last year will enable SMEs and startups to accelerate the adoption of advanced analytics.

Key trends enabling agility in Big Data Analytics

1. Lowering of the entry barrier:

Big Data on the Cloud: Capacity planning and operationalising an in-house Big Data environment takes considerable effort and does become a barrier of entry for SMEs and startups. Several companies and open source projects have come up to provide these infrastructural capabilities on the cloud, both in public and private flavours. For the Hadoop world, in addition to mature solutions like Amazon’s Elastic MapReduce, several newcomers like Rackspace and OpenStack’s project Savanna and startups like Qubole, Altiscale, etc. are providing entire Hadoop ecosystems on the cloud. Additionally, most of the MPP database vendors like Vertica and Teradata have introduced their cloud offering in the recent past. Most notable among them would be Amazon’s Redshift. Value added services on top of the basic Big Data environment augment the core infrastructure, with critical functions like ability to manage data processing workflows, schedule jobs or import and export data from other data sources. With such plumbing work out of the way, organisations can quickly put their solutions into production and extract business value economically.

2. Deepening of the capabilities of the ecosystem to support data analysis

SQL-on-Hadoop: A key drawback of the dominant paradigm in Hadoop world, Map Reduce, is very much a batch approach; the lack of interactivity puts a dampener on the agility of the analysis process, as it does not lend itself to the way analysts think. Most of the vendors have been working feverishly towards the goal of removing this impedance mismatch. Quite a few of them have been starting to see the light of the day in the last few months. Impala from Cloudera, Drill from MapR, Lingual from Cascading, Hadapt, Polybase from Microsoft, Hawq from Pivotal HD are but a few of them, the latest entrant being Presto from Facebook.

Machine Learning on Big Data: Availability of Machine Learning libraries for Big Data is reaching critical mass, enabling even smaller players like SME and startups to move into the realm of extraction of insights from very large data sets. In addition to Mahout, which has been around for some time, newer offerings like Oryx (Cloudera), Pattern (Cascading), and MLBase (Berkeley AmpLabs) provide implementations of advanced algorithms like clustering, classification, regression, collaborative filtering out of the box. The barrier of entry is being reduced, allowing organisations to focus more on building business functionality; think hyper personalisation, recommendations, fraud detection, etc.

3. Deepening of the capabilities of the ecosystem to support engineering

Hadoop 2.0 and separation of concerns: In Hadoop 2.0, via a new resource management framework called YARN, it is now possible to run a variety of workloads alongside traditional MapReduce on the same Hadoop cluster, sharing data using the underlying distributed file system. For e.g. this can be used to run graph oriented processing (Giraph), or stream based processing (Storm) for real time analytics. This trend is only likely to accelerate further In 2014. The ability to run multiple frameworks on the same infrastructure will help users to select the right framework for solving their analytics problem. Such consolidation will potentially enhance the appeal of the Hadoop stack to the mainstream market.

Proliferation of small open source components: There has been a regular stream of smaller open source projects contributed to the open source community that focus on solving some repetitive, niche problems in the analytics space. For e.g. incremental data processing is a common problem in several Big Data aggregation systems. In October 2013, LinkedIn released an open source system called Hourglass that makes it easier to solve this problem. Usually, such projects get published by the originating companies after being used for a while in production, thereby giving credibility to the work, giving the opportunity to startups and SMEs to  “stand on the shoulders of giants” to achieve their aspirations.

Traditional data science libraries / applications and Big Data: Until recently, data scientists and analysts had to choose between leveraging the power of Hadoop and a wealth of open source libraries and applications, R and NumPy / SciPy being the chief ones. In the past year the community has built frameworks to enable these sophisticated libraries to be used in conjunction with Hadoop, democratising the access to a sophisticated statistical modeling and machine learning environment.

Looking Ahead

Most of these patterns are ushering in a high degree of agility in the Lab phase of the model described at the beginning the article, aiding nimble players to adopt Big Data and Agile Analytics, even with limited resources.

For analytics initiatives to be truly agile, the ecosystem should also be mature enough to provide tools which aid agile software development, not just in project management practices, but in engineering practices as well. Only then will agility fully percolate to the factory phase of the analytics value stream. Case in point is the testability of advanced analytics applications. MRUnit is a good start in that direction. However, breadth and depth of the testing tools is very limited. Ability to build a comprehensive test suite as a safety net is essential in supporting iterative development cycles. Conventional software application development has matured to a stage where there are a number of tools to support agile engineering practices like Test Driven Development, Continuous Integration, Refactoring, etc. One should start seeing the emergence of parallel concepts and tools supporting the same in the Analytics space as the adoption continues through the “Slope of Enlightenment.”

Learn about our Big Data Analytics practice. 

  • Unsere Services
  • Unsere Kunden
  • Insights
  • Karriere
  • Über uns
  • Kontakt

WeChat

×
QR code to ThoughtWorks China WeChat subscription account

Presseanfragen | Datenschutz | Impressum | Modern Slavery statement ThoughtWorks| Barrierefreies Webdesign | © 2021 ThoughtWorks, Inc.