Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Published : Apr 13, 2021
This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more
Apr 2021
Trial ?

Contextual bandits is a type of reinforcement learning that is well suited for problems with exploration/exploitation trade-offs. Named after "bandits," or slot machines, in casinos, the algorithm explores different options to learn more about expected outcomes and balances it by exploiting the options that perform well. We've successfully used this technique in scenarios where we've had little data to train and deploy other machine-learning models. The fact that we can add context to this explore/exploit trade-off makes it suitable for a wide variety of use cases including A/B testing, recommendations and layout optimizations.

Download the PDF



English | Español | Português | 中文

Sign up for the Technology Radar newsletter


Subscribe now

Visit our archive to read previous volumes