## Exact distribution of the distances between any occurrences of a set of words

### S. Robin and J.-J. Daudin

NA-PG - INRA, 16 rue Claude Bernard, 75231 Paris, France

(Received November 26, 1999; revised May 22, 2000)

Abstract.
The distribution of the distance between two (or more) successive occurrences of a specific word in a random sequence of letters is known under different models. In this paper, a more
general problem is studied: the distribution of the distance between two (or more) successive occurrences of any word of a given set under a Markov model for the sequence. The generating
function and a recurrence for obtaining the probabilities are given. These results are applied to study the distribution of the "CHI" motif in the genome sequence of *Haemophilus influenzae*.

Key words and phrases:
Distance between occurrences, genome sequence analysis, semi Markov process.

