Contextual bandits

Published: Apr 13, 2021
Apr 2021

Contextual bandits is a type of reinforcement learning that is well suited for problems with exploration/exploitation trade-offs. Named after "bandits," or slot machines, in casinos, the algorithm explores different options to learn more about expected outcomes and balances it by exploiting the options that perform well. We've successfully used this technique in scenarios where we've had little data to train and deploy other machine-learning models. The fact that we can add context to this explore/exploit trade-off makes it suitable for a wide variety of use cases including A/B testing, recommendations and layout optimizations.