Notice détaillée

A Bandit-Based Ensemble Framework for Exploration/Exploitation of Diverse Recommendation Components

An Experimental Study within E-Commerce

Article Ecrit par: Broden, Bjorn ; Paraschakis, Dimitris ; Nilsson, Bengt J. ; Hammar, Mikael ;

Résumé: This work presents an extension of Thompson Sampling bandit policy for orchestrating the collection of base recommendation algorithms for e-commerce. We focus on the problem of item-to-item recommendations, for which multiple behavioral and attribute-based predictors are provided to an ensemble learner. In addition, we detail the construction of a personalized predictor based on k-Nearest Neighbors (kNN), with temporal decay capabilities and event weighting. We show how to adapt Thompson Sampling to realistic situations when neither action availability nor reward stationarity is guaranteed. Furthermore, we investigate the effects of priming the sampler with pre-set parameters of reward probability distributions by utilizing the product catalog and/or event history, when such information is available. We report our experimental results based on the analysis of three real-world e-commerce datasets.

Langue: Anglais
Thème Informatique

Mots clés:
Reinforcement learning
Session-based recommendations
Thompson sampling
E-commerce recommender systems
Streaming recommendations
Multiarm bandit ensembles

A Bandit-Based Ensemble Framework for Exploration/Exploitation of Diverse Recommendation Components

Sommaire