Survey of multiarmed bandit algorithms applied to recommendation systems

Elena Gangan

Milos Kudus

Eugene Ilyushin

Resumen

The main goal of this paper is to introduce the reader to the multiarmed bandit algorithms of different types and to observe how the industry leveraged them in advancing recommendation systems. We present the current state of the art in RecSys and then explain what multiarmed bandits can bring to the table. First, we present the formalization of the multiarmed bandit problem and show the most common bandit algorithms such as upper confidence bound, Thompson Sampling, epsilon greedy, EXP3. Expanding on that knowledge, we review some important contextual bandits and present their benefits and usage in different applications. In this setting, context means side information about the users or the items of the problem. We also survey various techniques in multiarmed bandits that make bandits fit to the task of recommendations better; namely we consider bandits with multiple plays, multiple objective optimizations, clustering and collaborative filtering approaches. We also assess bandit backed recommendation systems implemented in the industry at Spotify, Netflix, Yahoo and others. At the same time we discuss methods of bandit evaluation and present an empirical evaluation of some notorious algorithms. We conduct short experiments on 2 datasets to show how different policies compare to each other and observe the importance of parameter tuning. This paper is a survey of the multi armed bandit algorithms and their applications to recommendation systems.

Acceso

PÁGINAS

pp. 12 - 27

NÚMERO

Volumen: 9 Número: 4 Parte: 0 (2021)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

Survey of multi­armed bandit algorithms applied to recommendation systems

Revistas destacadas

Survey of multiarmed bandit algorithms applied to recommendation systems