Pharmaceutical companies' ability to conduct research effectively is fundamental to their success. But research is rarely an efficient process — it takes time, requires exploration and demands exemplary organization.
This is an ongoing challenge for every pharmaceutical company — including one of our clients. To try and solve it, we were asked to explore how we might develop a research assistant that could help those inside the company to more easily find the information they need.
Given this was largely an information retrieval challenge, we looked to generative AI as the key technology that would power the research assistant. With the right data in place and an easy to use interface, we believed it would be possible to accelerate and improve research for thousands of the client’s employees.
The use cases
We identified two distinct use cases for an AI research assistant — both at different ends of the research process and with very different types of data. While they are very different they nevertheless both demonstrate the various ways AI can be leveraged to help humans better manage complex information on a huge scale.
Research summaries and updates for managerial oversight. Governance of research is critical for pharmaceutical companies; it drives planning in everything from supply chain management to marketing and distribution. While there are mechanisms in place for this, it can be challenging for those in senior positions to have full oversight due to the complexity, speed and scale of research happening, even in one small corner of the organization. The AI research assistant could leverage unstructured data in existing reports and documentation to provide summaries to this group at a regular cadence, helping to answer questions around things like progress and milestones.
Structured data exploration. Computational biologists — who play an important role in driving pharmaceutical research — leverage forms of structured data (about, say chemical or protein structures) to innovate. We believed a research assistant could provide researchers with an additional way to explore data. A chat interface, for instance, would make it possible to ask questions about the data and explore avenues of experimentation or further exploration.
How did you actually do it?
At the center of the project was bringing together all the necessary data. For our first use case, this meant connecting an LLM to the client’s sharepoint repository which contained all information about ongoing research across the organization. For the second, it was again about making it possible for the LLM to access the huge amount of chemical and pharmaceutical data inside the organization.
What technology stack was used?
The project actually began on Azure AI Search, but, owing to some business changes, we ended up moving to AWS Bedrock. This was largely due to the fact that the client had its own LLM marketplace/platform which we could leverage which was driven by Bedrock.
It’s also worth noting that the project took place before MCP (Model Context Protocol) arrived on the scene. While that has clearly had a huge impact on how easy it is to connect data sources to large language models, we had to use Databricks to facilitate database communication.
What technical challenges did you encounter?
One of the biggest challenges we faced was due to the specific and highly specialized domain knowledge — any general LLM simply doesn’t have access to the terminology used by the client.
To tackle this we created a kind of manual map of domain and client-specific terms. This was done with the input of computational biologists and other SMEs: they created a table that captured terms and what they referred to to ensure that the research assistant could cope with highly specific questions and prompts from users.
Throughout the process we also had to pay attention to the quality of responses from the AI. Early on there were some issues, so we needed to go through a few phases of retraining. To do this we used few-shot prompting, which is a technique in which the LLM is offered a number of examples in the prompt to help it complete its task. While there are a number of alternative techniques, such as fine-tuning (in which the model is essentially reconfigured to produce better outputs), it does have some advantages: for instance, it’s more efficient than fine-tuning (because it requires less labeled data) and more easily adapted to different tasks.
What was the impact of the work?
The Research Assistant significantly accelerated how domain experts in pharma R&D interact with complex datasets. Scientists could now retrieve insights across oncology, pathology, and predictive modeling pipelines using natural language—eliminating the need for SQL or programming skills.
This shift not only reduced the time-to-insight from hours to minutes but also empowered cross-functional teams (program managers,oncologists,bioscientists) to self-serve critical data, improving collaboration and decision-making.
Increased data accessibility. The research assistant allowed researchers to query both structured and unstructured data (e.g., SharePoint, Databricks SQL) inside a single interface.
Reduced bottlenecks. The tool also helped offload repetitive data requests from engineering teams thanks to its automating summarization, visualization, and SQL generation features.
Scalable agent ecosystem. The modular, multi-agent architecture allowed us to quickly onboard new use cases without refactoring the core logic.
Improved data governance awareness. Through prompt augmentation and intent recognition, researcher queries remained within business guardrails and surfaced relevant domain metadata without any compliance or privacy risks.
What did you learn?
The project wasn’t just a success for the client, it also helped us learn a lot about leveraging AI.
For instance, it became clear just how important prompt engineering is. True, it’s something that’s often said, but actually seeing how small changes in structured domain prompts impact the accuracy and relevancy of SQL code from the LLM helps you develop a stronger understanding and even sensitivity to the way the tool works.
We also found that including schema hints and business rules in the agent flow significantly improved the reliability of responses and that by decoupling intent recognition, SQL generation, and summarization into separate agents, we could rapidly evolve features and debug failures in isolation.
A final, particularly crucial bit of learning was that human-in-the-loop feedback matters. Integrating user feedback loops helped fine-tune agent behaviors over time, which was especially necessary for ambiguous prompts or non-standard query patterns.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.