Conferences are about more than just transferring and consuming information — they’re opportunities for community and person-to-person engagement. That means when you’re putting one together, paying attention to the experience is crucial. It’s no surprise, then, that conferences have started to integrate AI chat applications into their experiences.
Creating a chatbot is all well and good if you have a big budget and plenty of lead time, but what if you’re trying to pull something together in a matter of weeks? At Thoughtworks, for our recent XConf event in Madrid we did just that. Doing so was tough but the experience gave us a great insight on how to deliver a generative AI project incredibly fast.
The scope of the project
The fundamental problem we wanted to solve with a chatbot was to create personal, meaningful touchpoints between our attendees and key subject matter experts at the event. People can, after all, watch videos, read books or attend webinars anywhere — an in-person event should emphasize personal interaction.
This meant we needed the chatbot to do more than just inform: it needed to engage in a unique way. This is where the idea of an AI-powered treasure hunt — what we called a side quest — would be a great way to both engage attendees and showcase our AI capabilities.
Alongside a conference guide — which would feature instant answers on the event agenda, speakers and logistics — this formed the core of the product. But we only had three weeks to create it before XConf. That meant the technical requirements needed to be very clearly defined, minimizing complexity and scope creep: stability, dual functionality and GDPR compliance were the fundamental elements that drove the next steps.
Data curation
Data is perhaps the fundamental building block of an AI application — particularly one that’s customer-facing. However, although the core set of data was relatively straightforward — information about the conference and schedule, for instance — there was still significant curation required to create an assistant that met our initial requirements.
One important step was anticipating questions from attendees to create an FAQ feature for the chatbot — this involved us essentially building up a data set by sourcing common questions from attendees at other XConf events around the world. To ensure this was appropriate and relevant to our Madrid audience, we then tailored the data (with relevant responses included).
Data minimization
Part of what helped us complete the project so quickly was remaining true to a principle of data minimization — in other words, collecting and processing as little as possible. For instance, no usage analytics was collected at all. While this obviously limited what we could learn about user behavior it nevertheless meant we could do two things: first, focus on the experience of the application when launched for the event, and second, ensure full compliance with data privacy regulations (in this case GDPR).
This principle also led us to specifically avoid collecting private or personal information. For example, we:
Instructed users explicitly not to use personal information (ie., to use nicknames instead).
Decided to not store any conversations on our servers — only on the user’s device.
Gave users the option to clear all locally stored data.
Helping the LLM retrieve data effectively
If data was the foundation of the chatbot, the LLM and its ability to retrieve data was crucial in ensuring the interaction layer of the application was effective — and, to return to our goal, actually improved the attendees' experience of the event. A clunky chatbot, after all, would be worse than no chatbot at all.
We used an older model for this project: Claude-Sonnet 3.5. This was admittedly a conservative choice but it was made because of the time constraint — using something more ‘proven’ meant we could move faster as a team. We also used RAG as a retrieval technique as we believed this would improve the performance of the application. On reflection, however, this was probably unnecessary given the project didn’t consist of a huge amount of data.
In fact, as the project progressed our use of RAG declined in scope. Originally, we thought we needed to use external RAG libraries and set up a RAG database — either a custom one or on Amazon RDS for PostgreSQL. However, because the dataset was so small, we handled the embedding generations ourselves and saved the embeddings to JSON files on S3. By simplifying our solution, it reduced our infrastructure footprint — great news for both our budget and delivery timeline.
How we turned the project around in a matter of weeks
The main challenge was keeping the project as small and focused as possible.
The architecture, for instance, was as simple as we could make it. For the very first iteration we focused purely on turn-based conversation, which meant we could avoid streaming and tool-reporting and other complicated interactions between front-end and back-end. Although the complexity did increase, we nevertheless tried to decouple the backend and frontend as much as possible. As an example, the structure of the side-quest was simplified so the backend could handle tracking progress tracking with minimal demands on the front-end.
The evolution of the chatbot also tells a story about our approach: throwaway scaffolding apps from command-line chat bots, then a one-page HTML web application, and, finally, a React SPA. The leaderboard for the side quest was even deployed as a completely separate page to the main SPA, which meant every member of the team could work independently.
In another setting we may have experimented more; this project, however, required us to call on our recent experience on LLM projects. This, along with effective communication (we were a highly distributed team), sync and async, was critical to coordinating the various parts. Indeed, while we tried to keep things very simple, complexity would still rear its head!
What we learned
This project showed us what’s possible if you combine a foundation of robust principles with a commitment to working in an adaptive and iterative manner. There were, however, a few key things we learned and could have improved:
Testing. We missed having dedicated testing for acceptance and user experience (UX). This kind of testing would have really helped make sure the project was scalable and valuable.
Our architecture was still probably bigger than it needed to be. While we were eager to be as lightweight as possible, the setup may still be somewhat oversized. Going with a serverless option, like AWS Lambda, could save more money.
Missed tracking costs. We didn't keep a close eye on usage and load costs, which is super important for finding ways to optimize in the future.
We also had a great discussion on the AWS server options - kubernetes, EC2, ECS or lambda - and I still feel ECS provides the best flexibility/simplicity for this level of app. However, as mentioned below, we were surprised where the costs accrued: not from servers, but instead things like load-balancing, config changes and domain name management.
There’s a lot we can take forward from this piece of work — we’re proud of what we achieved but excited to take on the next challenge.