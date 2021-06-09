It was critical that data quality be maintained at a high level at all times, as the location data Geoscape delivers is critical to organizations including Australia's emergency services.

Within the original workflow, manual intervention was required to process the volume of geospatial data sourced by Geoscape. Engineers kicked-off processing jobs and waited hours for a result. It also opened the possibility of human error.

﻿With these factors in mind, developing a solution to process geospatial data without significant manual intervention was key to the success of our partnership.

The solution

With data pipelines at the core of Geoscape’s business, we sought to build a custom solution. In just 10 months, the combined Thoughtworks and PSMA team (now Geoscape) went from discovery and inception to launching the streaming data platform for real-time customer consumption, alongside a suite of Quality Assurance (QA) tools. The team focused equally on building the platform and lifting the capability of Geoscape to extend the platform.

Our solution outcomes included:

Streaming Data Platform

We built the streaming data platform to leverage cloud technologies and minimize operational work.

The platform consists of a series of data pipelines that automatically ingest, validate, sanitize and standardize data as soon as it’s made available by suppliers. From here, internal product teams consume the data at various points in the pipelines and use it to power the products they deliver to their customers and value-added resellers.

We used a streaming solution based on Amazon Kinesis to ensure multiple customers could tap into time series data, and AWS Lambda for compute. Amazon Kinesis ensures a near infinite scale up and down based on workload, while AWS Lambda reduces operational effort and provides significant horizontal scale capability.

Data QA

Over and above the technical solution delivered, we developed a process for data quality assurance. The process enables Geoscape to quickly verify an entire dataset on an ongoing basis. There is also a system for converting exploratory analysis (using Jupyter notebooks) into ongoing monitoring and alerting to ensure the quality of the data remains consistent.

Tooling

We implemented tools like centralized logging which allows Geoscape to see the state of the whole pipeline at any moment in time, as well as monitoring and alerting, error handling and repeatable local development environments.