AISM 53, 895-905
© 2001 ISM

Exact distribution of the distances between any occurrences of a set of words

S. Robin and J.-J. Daudin

NA-PG - INRA, 16 rue Claude Bernard, 75231 Paris, France

(Received November 26, 1999; revised May 22, 2000)

Abstract.    The distribution of the distance between two (or more) successive occurrences of a specific word in a random sequence of letters is known under different models. In this paper, a more general problem is studied: the distribution of the distance between two (or more) successive occurrences of any word of a given set under a Markov model for the sequence. The generating function and a recurrence for obtaining the probabilities are given. These results are applied to study the distribution of the "CHI" motif in the genome sequence of Haemophilus influenzae.

Key words and phrases:    Distance between occurrences, genome sequence analysis, semi Markov process.

Source (TeX , DVI )