Master
ThoughtWorks
Menú
Cerrar
  • Nuestros Servicios
    • Visión general
    • Experiencia del Cliente, Producto y Diseño
    • Estrategia, Ingeniería y Análisis de Datos
    • Transformación Digital y Operaciones
    • Modernización Empresarial, Plataformas y la Nube
  • ¿Con quién trabajamos?
    • Visión general
    • Sector Automotriz
    • Servicios Médicos
    • Sector Público
    • Cleantech y Servicios Públicos
    • Medios y Publicidad
    • E-commerce y Retail
    • Servicios Financieros y Aseguradoras
    • Organizaciones sin fines de lucro
    • Viajes y Transporte
  • Insights
    • Visión general
    • Destacados

      • Tecnología

        Una exploración profunda de tecnología empresarial y excelencia en ingeniería

      • Negocios

        Mantenerse actualizado con los últimos insights empresariales y de industria para líderes digitales

      • Cultura

        El espacio para encontrar contenido y tips de desarrollo profesional, y nuestra visión sobre la justicia social y la inclusión

    • Publicaciones Digitales y Herramientas

      • Radar Tecnológico

        Nuestra guía de tendencias tecnológicas actuales

      • Opiniones

        Una publicación para líderes digitales

      • Digital Fluency Model

        Un modelo para priorizar las capacidad digitales necesarias para navegar en la incertidumbre

      • Decoder

        Una guía tecnológica de la A a la Z para líderes de negocio

    • Todos los Insights

      • Artículos

        Opiniones profesionales que ayudarán al crecimiento de tu negocio

      • Blogs

        Opiniones personales de ThoughtWorkers de todo del mundo

      • Libros

        Navega a través de nuestra extensa biblioteca

      • Podcasts

        Emocionantes charlas sobre las últimas tendencias en negocios y tecnología

  • Carreras
    • Visión general
    • Proceso de Aplicación

      Descubre lo que te espera durante nuestro proceso de selección

    • Graduados y cambio de carreras

      Empieza tu carrera en tecnología con el pie derecho

    • Ofertas de trabajo

      Encuentra puestos vacantes en tu región

    • Mantente conectado

      Suscríbete a nuestro boletín mensual

  • Acerca de Nosotros
    • Visión general
    • Nuestro Propósito
    • Premios y Reconocimientos
    • Diversidad, equidad e inclusión
    • Nuestros líderes
    • Asociaciones
    • Noticias
    • Conferencias y eventos
  • Contacto
Spain | Español
  • United States United States
    English
  • China China
    中文 | English
  • India India
    English
  • Canada Canada
    English
  • Singapore Singapore
    English
  • United Kingdom United Kingdom
    English
  • Australia Australia
    English
  • Germany Germany
    English | Deutsch
  • Brazil Brazil
    English | Português
  • Spain Spain
    English | Español
  • Global Global
    English
Blogs
Selecciona un tema
Ver todos los temasCerrar
Tecnología 
Gestión de Proyectos Agiles La Nube Entrega Continua Ciencia e Ingenieria de Datos Defendiendo el Internet Libre Arquitectura Evolutiva Experiencia de Usuario IoT  Lenguajes, Herramientas y Frameworks Modernización de sistemas heredados Machine Learning & Artificial Intelligence Microservicios Plataformas Seguridad Pruebas de Software Estrategia Digital 
Negocio 
Servicios Financieros Salud Global Innovación Ventas  Transformación 
Carreras 
Hacks Para Tu Carrera Diversidad e Inclusión Cambio Social 
Blogs

Temas

Elegir tema
  • Tecnología
    Tecnología
  • Tecnología Visión General
  • Gestión de Proyectos Agiles
  • La Nube
  • Entrega Continua
  • Ciencia e Ingenieria de Datos
  • Defendiendo el Internet Libre
  • Arquitectura Evolutiva
  • Experiencia de Usuario
  • IoT
  • Lenguajes, Herramientas y Frameworks
  • Modernización de sistemas heredados
  • Machine Learning & Artificial Intelligence
  • Microservicios
  • Plataformas
  • Seguridad
  • Pruebas de Software
  • Estrategia Digital
  • Negocio
    Negocio
  • Negocio Visión General
  • Servicios Financieros
  • Salud Global
  • Innovación
  • Ventas
  • Transformación
  • Carreras
    Carreras
  • Carreras Visión General
  • Hacks Para Tu Carrera
  • Diversidad e Inclusión
  • Cambio Social
IoT La NubeTecnología

BIG Data, Fast Data — Part Two

Tom Glover Tom Glover

Published: Oct 23, 2019

In this, the second of a three-part series, we’ll look at the Internet of Things world from a data perspective in order to appreciate the design challenges we face in a world where everything will be connected. Read Part One here.
 

Processing of Things’ Data

No discussion of the data explosion delivered by IoT would be complete without thinking about cloud computing. Historically, any requirement to process off-device data would necessitate expensive servers with expensive licensing. There were always bottlenecks in terms of costs and time to provision new servers and new software. Those constraints are now evaporating. Cloud providers deliver readily available and affordable services that can be provisioned in an instant to collect and process the deluge of data communicated between things and the digital world.
 

You don’t know what you don’t know

One of the greatest challenges of designing for IoT platforms is predicting the future data processing requirements, with all of its considerable uncertainty. Whether it’s a startup with ambitious customer (and device) adoption projections; or an established manufacturer wishing to transform its existing (and understood) machine monitoring solution; it’s incredibly difficult to plan for the long term. 
 
The insights gleaned from IoT data can sometimes take a period of time to accumulate (seasonality being a key example), while other scenarios may require re-processing of existing data following a tweaking of a machine learning algorithm. For example, computer vision machine learning technologies are evolving at such a rapid rate that newer algorithms may either provide completely new insights and/or significant improvements in predictive quality compared to their predecessors. The flexibility to be able to adapt the data processing or reprocess existing data is a powerful capability when working with the potentially large data volumes we are considering. 

Compute, network and storage are now practically unconstrained in real terms (fees may apply). More importantly, cloud providers such as Amazon AWS are acknowledging the increasing importance of IoT and providing more and more tailored services around these demands that also allow rapid coupling of IoT data to their existing and emerging services.
 
Even in recent years, we’ve seen rapid technology innovation in how we consume IoT data. Serverless technologies offering, such as AWS Lambda, now offer completely dynamic compute capability that starts to deliver on the promise of utility computing. This is radically changing how we design architectures to manage this data — and how quickly we can build and adapt such architectures to changing data demands.
 

You’re gonna need a bigger boat

Estimating the predicted data storage needs for IoT solutions can be even more challenging and complex than estimating data processing requirements. We can apply some rudimentary empirical predictions, based on our perceived understanding of the solution. But we’re unlikely to appreciate the complexity of all the various data touchpoints and repositories present in a large solution design at the outset. The diagram below illustrates a simplified example of the role data storage plays at various points in the lifecycle of an IoT message arriving from a device.

Figure 1: IoT includes many hidden storage requirements 
 
It can be clearly seen that an incoming IoT message does in fact leave a digital “footprint” multiple times throughout its lifecycle. This fact is commonly overlooked when assessing the commercial storage implications of various IoT cloud platform providers.

Speed comes at a price

In addition to the fact that message data may be present in many areas within the platform at one time, there is a powerful multiplier that can also significantly impact the performance design of the platform. This is the frequency of both the device messages and the rate at which they’re processed. We have various tradeoffs to consider which are essentially the speed at which you want to both receive incoming data (device frequency) and process incoming data (cloud platform processing). The absolute importance to the business of having the desire to have realtime (or near-realtime) data will determine the quantity and type of data processing components that must be enabled in the platform to handle the IoT “firehose” of streaming data. 
 

It’s all in the packaging

Things’ data may exist in many forms during its lifecycle from device to cloud. It may start its journey wrapped in a concise but human unreadable binary encoding format, such as CBOR, and spend its final days hiding in the vastness of a DynamoDB database. As shown in the previous illustration, it can also exist in many other formats at the same time. Each of these data formats has its relative advantages and disadvantages, depending on the particular use-case. For example, JSON may be easier to read by mortals and quick to encode/decode but can result in vastly larger data storage requirements long-term. Mix and matching of data formats can be a good way of optimising for speed versus storage but it can be very difficult to predict the actual resultant costs — owing to the various unexpected overhead penalties incurred by various data storage services. You can get a long way with a spreadsheet model for data projections for an IoT design, but I’d highly recommend you validate this with measurements of the actual data usage when your platform of choice goes live.
 

Best before

We frequently talk about the temperature of data when describing how frequently data is accessed in a Big Data scenario. Terms such as hot, warm and cold are used to classify different types of data in terms of how they’re stored and accessed. This can help as part of an architecture design to ensure we get the best tradeoff of cost versus performance. These approaches still apply to IoT data but we have an extra level of complexity to consider and that is data freshness. 
 
IoT data is generally considered to have a small lifespan, as far as value is concerned. Some change in variable in the physical world will generate a data event that’ill be consumed by interested parties. The longer it takes to both receive and process that event, the less relevant that information now is and thus the less value it provides. Obviously. the stale data still retains some degree of value but this is mostly when aggregating for trend analysis and predictive scenarios. The design goal is to optimise the platform so that the “freshest” data is stored on “hot” storage mechanisms, while the “stalest” data is retained in the most cost-effective storage medium possible.
 

The Big Chill

We’ve already discussed how to ensure that the freshest data is available on-demand as quickly as possible. The question remains as to what to do with all of the accumulating data that has passed its “best before date” ?. The approaches taken generally mirror the scenario in the real-world when we have too much “stuff” that we’re reluctant to dispose of for many reasons, mostly sentimental. There’s a perception that we must keep everything just in case a tiny fragment of that data may be needed some day. So, we resort to choosing the most cost-effective long-term storage mechanism available, such as Amazon S3 Glacier. This is a valid design decision but it’s all too easy to forget about this data and the cost accumulates significantly over time. The longer we archive this data and the more data we have, the harder it is to make the decision to purge it. What’s often overlooked are the challenges with then “defrosting” this deep-frozen data, which will have to be placed in a “warm” storage medium so it can be accessed accordingly. The challenge is that it’s difficult to predict how often we will need to perform these de-archiving activities and there is a cost premium associated. Remember: this is cloud, everything comes at a price.
 
The recommendation here is to not always go with what appears to be the simplest option. We’d normally recommend that customers consider accumulating large volumes of data for a limited time period, in order to then analyse and understand both which parts of the data are actually useful and also to evaluate the real storage costs associated when on the platform. Instead of simply archiving all raw data, we’ve a number of mechanisms available at our disposal, ranging from efficient compression data formats to data sampling techniques.
 
In Part III of this three-part series we will focus on the data integration and application aspects of the Internet of Things.

Technology Hub

An in-depth exploration of enterprise technology and engineering excellence.

Explore
Blogs relacionados
IoT

BIG Data, Fast Data - Part I

Tom Glover
Aprende más
IoT

IoT: Smart Ecosystems are door openers for new business models

Bernd Günter
Aprende más
IoT

Connected things, not devices

Tom Glover
Aprende más
Master
Política de Privacidad | Declaración sobre la esclavitud moderna | Accesibilidad
Connect with us
×

WeChat

QR code to ThoughtWorks China WeChat subscription account
© 2021 ThoughtWorks, Inc.