第68回統計的機械学習セミナー / The 68th Statistical Machine Learning Seminar (Hybrid)
- 【Date & Time】
- July 24th (Thursday), 2025 16:00 - 17:30
Admission Free
- 【Place】
- Seminar Room 5 (3rd floor), The Institute of Statistical Mathamatics
Hybrid :
Please register at the following link and get a Zoom link, if you join by Zoom
https://forms.gle/FzMwmCqgUzf8nPB18
【Speaker】
Wanteng Ma (joint work with T. Tony Cai)
(University of Pennsylvania)
【Title】
Nonparametric Contextual Bandits with Single-Indexed Rewards
【Abstract】
This work studies nonparametric contextual bandits with single-index rewards, where the expected reward of each arm is an unknown nonparametric function of a one-dimensional projection of the covariates. We first estimate this projection direction through a general approach, and then apply plug-in nonparametric regression to yield sharp estimators of the single-index reward functions and thus alleviating the curse of dimensionality. We derive a lower bound that characterizes the fundamental regret limits of single-index bandits and propose a novel algorithm that achieves the minimax-optimal regret rate. Furthermore, we establish a general impossibility result: without additional structure, no policy can adapt to unknown smoothness levels. Nevertheless, under a standard self-similarity condition, we design a policy that remains minimax-optimal while automatically adapting to the unknown smoothness of the reward functions.