Enable javascript in your browser for better experience. Need to know to enable it? Go here.

Solving GenAI's great challenge: Evaluating your LLM in production

Tech Horizons executive webinar series

Can you trust the accuracy and reliability of your Large Language Model (LLM) outputs? The opaque nature of LLMs is one of the biggest challenges preventing organizations from getting great AI concepts into production. Traditional machine learning evaluation techniques simply fall short with LLMs and new LLM evaluation frameworks seem to be popping up every week. What’s the right approach for your RAG and fine-tuning use cases? In this webinar, our AI experts discuss how to evaluate LLM effectiveness and risks.


Attendees will learn:

  • Tips for defining clear objectives for what the LLM should achieve.
  • What performance metrics to assess for accuracy, relevance, response time, toxicity and user satisfaction.
  • Error analysis identification and categorization.
  • Benchmarking considerations.
  • How and when to consider qualitative user feedback.


This session has ended. Register to catch the replay on demand.

Moderators and panelists

Headshot of Aaron Erickson
Aaron Erickson

Senior Engineering Manager, NVIDIA

Headshot of Carlos Villela
Carlos Villela

Senior Software Engineer, NVIDIA

Headshot of Musa Parmaksiz
Musa Parmaksiz

Head of AI and Data Center Excellence, UBS Investment Bank

Headshot of Prasanna Pendse
Prasanna Pendse

Director of AI Strategy, Thoughtworks (Moderator)

Headshot of Shayan Mohanty
Shayan Mohanty

Head of AI Research, Thoughtworks

Register to watch on demand

Watch the session recording by completing the form below.

Marketo Form ID is invalid !!!

Thanks for registering! We'll reach out with the recording soon.

Discover more insightful sessions in our series