AISM 53, 895-905
© 2001 ISM
(Received November 26, 1999; revised May 22, 2000)
Abstract. The distribution of the distance between two (or more) successive occurrences of a specific word in a random sequence of letters is known under different models. In this paper, a more general problem is studied: the distribution of the distance between two (or more) successive occurrences of any word of a given set under a Markov model for the sequence. The generating function and a recurrence for obtaining the probabilities are given. These results are applied to study the distribution of the "CHI" motif in the genome sequence of Haemophilus influenzae.
Key words and phrases: Distance between occurrences, genome sequence analysis, semi Markov process.
Source (TeX , DVI )