AISM 52, 397-406

## Some optimal strategies for bandit problems with beta prior distributions

### Chien-Tai Lin^{1} and C.J. Shiau^{2}

^{1}Department of Mathematics, Tamkang
University, Tamsui, Taiwan 251, R.O.C.

^{2}Institute of Mathematical Statistics,
National Chung-Cheng University, Chia-Yi, Taiwan 621, R.O.C.

(Received August 3, 1998; revised November 30, 1998)

Abstract.
A bandit problem with infinitely many
Bernoulli arms is considered. The parameters of Bernoulli
arms are independent and identically distributed random
variables from a common distribution with
*beta*(*a*,*b*). We
investigate the *k*-failure strategy which is a modification
of Robbins's stay-with-a-winner/switch-on-a-loser strategy
and three other strategies proposed recently by Berry et
al. (1997, *Ann. Statist.*, **25**, 2103-2116). We show that the *k*-failure strategy performs
poorly when *b* is greater than 1, and the best strategy
among the *k*-failure strategies is the 1-failure strategy
when *b* is less than or equal to 1. Utilizing the formulas
derived by Berry et al. (1997), we obtain the
asymptotic expected failure rates of these three strategies
for beta prior distributions. Numerical estimations and
simulations for a variety of beta prior distributions are
presented to illustrate the performances of these strategies.

Key words and phrases:
Bandit problems, sequential experimentation, dynamic allocation of Bernoulli
processes, staying-with-a-winner, switching-on-a-loser,
*k*-failure strategy, *m*-run strategy, non-recalling
*m*-run strategy, *N*-learning strategy.

**Source**
( TeX , DVI )