Enable javascript in your browser for better experience. Need to know to enable it? Go here.

Transforming unstructured data into enterprise intelligence with AI

Disclaimer: AI-generated summaries may contain errors, omissions, or misinterpretations. For the full context please read the content below.

All enterprises sit atop an ocean of unstructured data. Reports, client interviews, support tickets and research notes — all full of insight, none of it easy to surface. These data hold the qualitative understanding that dashboards can’t show: why customers feel what they do, how projects succeed or fail and what drives change beneath the metrics.

 

Sadly, these insights don’t live neatly in rows and columns. They’re a messy narrative scattered across tools, invisible to those systems designed for precision and scale.

 

Structured data tells us what’s happening. Unstructured data, if we can make sense of it, tells us why. The challenge is bridging those worlds — giving language the same analytical depth as numbers.

 

Why unstructured data resists easy answers

 

For enterprises, unstructured data isn’t just a volume problem. It’s a meaning problem. Words are packed with nuance that machines struggle to parse. When AI tools try to interpret large bodies of text without context, the result can be convincing but nonsense — hallucinations that sound insightful but aren’t reliable.

 

Before building our own framework, we explored several enterprise tools that promised to solve this challenge. Glean, for example, offered unified search across internal systems. While it integrated easily, we found that its results often looked credible but lacked factual accuracy. In testing, responses to queries were occasionally incomplete or simply fabricated — a common issue when metadata and contextual limits aren’t tightly managed. More importantly, we had no control over how those answers were generated. The retrieval process was opaque, so we had no way of verifying their correctness. We weren’t willing to just take the answers on trust.

 

We also tried NotebookLM, which initially produced high-quality summaries and reasoning over text. However, it couldn’t integrate into our agentic workflow — the environment where autonomous agents manage retrieval and context passing between systems. NotebookLM is built for static, session-based interactions with uploaded documents and doesn’t expose APIs or orchestration hooks that would allow integration into enterprise-grade workflows. 

 

These trials confirmed that to handle enterprise-scale unstructured data, we needed a model we could trust — one capable of contextual understanding, not just search.

 

How chunking and vectorization give words meaning

 

Our next step was to explore how AI could help us interrogate unstructured data more intelligently. The first principle was that machines need structure before they can understand context.

 

That starts with chunking — dividing long documents into smaller sections that preserve meaning without overwhelming the model. Each chunk must hold enough context to make sense on its own but not so much that important details get lost. In our experience that balance is found through some trial and error.

 

Once separated, these chunks are vectorized — turned into long numerical strings known as embeddings. These vectors capture relationships between words in multi-dimensional space. For instance, the vectors for “dog” and “cat” sit close together, while “dog” and “airplane” are far apart. In this way, numbers begin to represent meaning.

 

This process is foundational to any retrieval-augmented generation (RAG) system. It allows models to measure semantic closeness rather than rely on simple keyword matches — a vital step in retrieving the right context for a given question.

 

Balancing context: how AI keeps meaning without the noise

 

The hardest part of retrieval isn’t search — it’s context. Too little, and the model gives shallow answers. Too much, and it loses focus.

 

As we explored how AI might help us navigate our unstructured data, we focused on techniques for maintaining context at scale. One of the most promising was multi-query retrieval. In this approach, a large language model analyses a user’s question and generates alternative versions before searching. A query about ‘business outcomes’ might also look for ‘cost reductions’ or ‘efficiency gains’. Each variant runs in parallel, ensuring that relevant insights aren’t missed just because someone used different phrasing.

 

The retrieved chunks are then reranked based on their semantic closeness to the original question, producing a response that feels more grounded and complete.

 

To test these ideas, we needed a dataset rich in complexity and context — something that would stretch both the technical and human sides of understanding.

 

Inside the VoC proof of concept: testing AI on real feedback data

 

We chose the Voice of the Customer (VoC) program as our testbed. It’s a dense, qualitative dataset made up of client interviews, narrative feedback and observations collected over years, scattered across platforms such as the CRM, customer feedback tool, the BI tool, as well as raw word docs and spreadsheets..

 

The VoC project’s purpose was to determine how we could make this type of qualitative data searchable and reliable using AI. To do that, we tested multiple options from the Google Cloud Platform ecosystem — Vertex AI Search, Vector Search, BigQuery Vector Index and a Retrieval-Augmented Generation (RAG) Engine — evaluating each for performance, accuracy and scalability.

 

We ran spikes for all four options, with Vertex AI Search quickly emerging as the preferred choice for this proof of concept. It offered the strongest semantic retrieval performance while allowing us to control indexing, metadata tagging and integration with agentic systems. The retrieval pipeline used Google’s Gemini 2.5 Flash as the LLM model, while the embedding model for semantic reranking was text-embedding-004. Together, these provided both speed and precision in aligning retrieved chunks with user intent.

 

We manually tagged two years of VoC reports with key metadata fields such as account, segment and industry, then indexed and vectorized them using Vertex AI Search. This metadata also powers dynamic filtering, enabling queries to be automatically scoped to specific segments, industries or accounts based on the question’s context. Because VoC interviews are long and conversational, we used larger chunk sizes to preserve narrative flow. Testing confirmed that this produced higher contextual accuracy than smaller, fragmented chunks.

 

Across several rounds of refinement — adjusting chunk size, prompt design and reranking thresholds — the system reached an average 94% context relevance using the RAGAS evaluation framework. That level of accuracy meant people could query VoC data and trust that the output reflected what clients had actually said.

 

Just as importantly, the VoC proof of concept established a repeatable method for handling unstructured data: standardized metadata, intelligent chunking, controlled vectorization and multi-query retrieval.

 

From feedback to foresight

 

The VoC pilot demonstrated that qualitative feedback could be handled with the same discipline as quantitative data. That success laid the foundation for our next phase — connecting other narrative-based sources such as research insights and competitive intelligence into a single ecosystem.

 

Listening to data differently

 

Unstructured data is where the stories live. Making sense of it demands as much empathy as engineering — chunking and vectorization to give language form, retrieval and reranking to give it precision and human curiosity to give it purpose.

 

The work that began with VoC shows that when those elements come together, enterprises can do more than store information. They can start to listen — not just to what data says, but to what it means.

 

What’s next

 

We’re tracking emerging managed RAG tools like Google’s new File Search. Our current Vertex AI Search framework remains the foundation, but staying tool-agnostic helps us adapt quickly as hyperscalers evolve.

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

Scale your marketing efforts with AI