AISM 52, 287-315

## Model selection for variable length Markov chains and tuning the context algorithm

### Peter Bühlmann

Seminar für Statistik, ETH Zentrum, CH-8092 Zürich,
Switzerland

(Received September 29, 1997; revised January 8, 1999)

Abstract.
We consider the model selection problem in the class
of stationary variable length Markov chains (VLMC) on a finite space.
The processes in this class are still Markovian of high order, but with
memory of variable length. Various aims in selecting a VLMC can be formalized
with different non-equivalent risks, such as final prediction error or
expected Kullback-Leibler information. We consider the asymptotic behavior of
different risk functions and show how they can be generally estimated with
the same resampling strategy. Such estimated risks then yield new model
selection criteria. In particular, we obtain a data-driven tuning of
Rissanen's tree structured context algorithm which is a computationally
feasible procedure for selection and estimation of a VLMC.

Key words and phrases:
Bootstrap, zero-one loss, final prediction error, finite-memory source,
FSMX model, Kullback-Leibler information, *L*_{2} loss, optimal tree
pruning, resampling, tree model.

**Source**
( TeX , DVI )