Technology Radar

Contextual bandits

Published : Apr 13, 2021

NOT ON THE CURRENT EDITION

This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more

Apr 2021

Trial

Contextual bandits es un tipo de aprendizaje por refuerzo muy adecuado para problemas que requieren un equilibrio entre exploración y explotación ("Exploration-Exploitation Trade-off"). Con un nombre que hace honor a las máquinas tragamonedas de los casinos (en inglés, "bandits" o "one-armed bandits"), el algoritmo explora diferentes opciones para aprender más sobre los resultados esperados y los equilibra explotando aquellas que se desempeñan bien. Hemos usado esta técnica exitosamente en escenarios donde se ha tenido muy poca información para entrenar y desplegar otros modelos de aprendizaje automático. El hecho de que es posible agregar contexto a este equilibrio entre exploración y explotación lo hace apropiado para una amplia variedad de casos de uso, como pruebas A/B, recomendaciones y optimizaciones de diseño, etc.

Download the PDF

English | Português

Sign up for the Technology Radar newsletter

Subscribe now

Industrias

Publicaciones Digitales y Herramientas

Todos los Insights

Contextual bandits

Download the PDF

Sign up for the Technology Radar newsletter

Visit our archive to read the previous volumes