Bayer is one of the world’s largest life sciences companies, specializing in the development, manufacturing, and distribution of products in health care and agriculture.
Pharmaceuticals is a highly regulated industry, and the drug development process involves numerous stages that are vital to ensuring safety and efficacy. For each potential drug, Bayer scientists must conduct extensive tests in controlled lab conditions before potentially advancing any product for testing in humans.
The review of this preclinical data – which historically could be spread across multiple systems – and study reports that could total as many as 1500 pages was difficult, time consuming and a potential risk to product development timelines.
To address these challenges and leveraging the latest digital technologies from AWS, Bayer built a new modern data platform for its preclinical data. Following Bayer’s vision “Health for all, Hunger for none”, new capabilities have been created to increase speed and accuracy of the drug development pipeline and with this to improve patient outcome.
A modern data platform built on AWS
In collaboration with Thoughtworks, Bayer developed a modern data platform hosted on AWS cloud using native AWS managed services to:
Access preclinical data and results from various internal data sources in a single place
Create and uphold a more structured approach to data management
Enable effortless searches of existing data sets and past studies
Help data interpretation through tailor-made visualizations
Remain compliant with the industry’s strict regulations
The new data platform PRINCE (Preclinical Information Center) acts as a one-stop shop for all preclinical data, uniting previously siloed data sources. “Having a single access point with unified data that can be customized when needed saves a lot of time,” explains Verena Ziegler, Head of Genetic & Computational Toxicology at Bayer.
The primary data product contains the results from thousands of toxicology studies alongside in vitro and in vivo bioassay data and compound information. Users have access to structured and unstructured information, whether they’re accessing the historical database through APIs, dashboards, or the platform’s search interface – enabling researchers to inform their work with what has been done and learned before.
With greater control over, and visibility into, the company’s preclinical data, Bayer teams can make greater data-driven decisions throughout the drug development process to ensure cross-functional access and reduce inefficiencies. “PRINCE represents a major milestone in the digitalizing of our preclinical data domain,” says Jonas Münch, Head of IT for Safety & Pharmacology at Bayer. “We think that it can serve as a blueprint for a future domain-centric decentralized data landscape in R&D”. With greater control over, and visibility into, the company’s preclinical data, Bayer teams can make greater data-driven decisions throughout the drug development process to ensure cross-functional access and reduce inefficiencies.
PRINCE represents a major milestone in the digitalizing of our preclinical data domain. We think that it can serve as a blueprint for a future domain-centric decentralized data landscape in R&D.
Custom-built search functionalities that save time
One of the most valuable components of the data platform is the custom-built search engine. In its initial form, researchers interacted with the search engine using free text. After gathering feedback from regular user sessions, Bayer evolved the search engine capabilities to allow researchers to set specific parameters instead of keywords alone. At its current version, the custom-built search tool greatly speeds up data collection.
Researchers can freely search and filter through both structured data including information about compounds, projects, targets, study metadata and key results, and unstructured data such as conclusions and summaries of approved study reports.
Researchers can further accelerate the data analysis process through advanced search capabilities such as the use of synonyms, legacy identifiers, and chemical structures.
“Our data platform will be the breakthrough innovation for me as a scientist,” says Jan Sternberg, Study Director and Lab Head at Bayer. “It enables me to be fast, efficient, and flexible, and helps me get answers for urgent issues and questions regarding preclinical studies.”
In addition, and in line with the ongoing digital transformation at Bayer, data scientists can also perform SQL-like queries directly on the platform’s backend enabling complex analyses and application for machine learning.
A platform that’s always evolving
Built using infrastructure-as-code based on Terraform, the product team created reusable infrastructure configuration templates throughout the platform development process. This helped to enable a frictionless handover of each component in the data platform and automate deployment orchestration using continuous delivery pipelines.
Development and deployment are agile. Bayer’s preclinical development teams provide continual input through regular live sessions with users and monthly wave sessions that capture user experience priorities for continuous platform improvements. The product team presents new features that were added based on stakeholder feedback during regular showcases.
Responsible governance of preclinical data, including the assurance that Bayer has control over who accesses which data sets, is guaranteed by using AWS Glue stack components and managed services as part of the data platform. This was achieved using a compliance-as-code approach, automatically demonstrating that new code complies with relevant policies and regulations at every step. One key benefit is the ability to automate audit trails when needed.
Expansion into new domains
For the future, project managers will focus on improving study design selection and optimizing retrospective and predictive data analyses. An example comes from PRINCE enabling the curation of historical data from past studies. Historical data could be used to generate virtual control groups for future preclinical studies – a concept currently under scientific exploration.
“PRINCE fulfills my long-cherished wish for a preclinical safety database,” says Thomas Steger-Hartmann, Head of Investigational Toxicology at Bayer. “It will enhance and accelerate data-driven decisions in R&D, as well as reduce a lot of our knowledge silos.”
In addition, Bayer is exploring how data products can support other areas and if benefits can be realized by enabling connections between several data products of this kind. As an example, PRINCE will be used as data and analysis backend for one of Bayer’s preclinical Laboratory Information Management System (LIMS) soon. Complementarily, benefits of using Large Language Models are currently being explored to provide advanced innovative functionality to Bayer’s scientists.
PRINCE fulfills my long-cherished wish for a preclinical safety database. It will enhance and accelerate data-driven decisions in R&D, as well as reduce a lot of our knowledge silos.