Options
Whittle index based Q-learning for restless bandits with average reward
Journal
Automatica
ISSN
00051098
Date Issued
2022-05-01
Author(s)
Avrachenkov, Konstantin E.
Borkar, Vivek S.
Abstract
A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.
Volume
139
Publication link
Subjects