Options
Best Arm Identification in Sample-path Correlated Bandits
Journal
2022 National Conference on Communications, NCC 2022
Date Issued
2022-01-01
Author(s)
Prakash, R. Sri
Karamchandani, Nikhil
Moharir, Sharayu
Abstract
We consider the problem of best arm identification in the fixed confidence setting for a variant of the multi-arm bandit problem. In our problem, each arm is associated with two attributes, a known deterministic cost, and an unknown stochastic reward. In addition, it is known that arms with higher costs have higher rewards across every sample path. The net utility of each arm is defined as the difference between its expected reward and cost. We consider two information models, namely, the full information feedback and sequential bandit feedback. We derive a fundamental lower bound on the sample complexity of any policy and also propose policies with provable performance guarantees that exploit the structure of our problem. We supplement our analytical results by comparing the performance of various candidate policies via synthetic and data-driven simulations.