Exact distribution of the distances between

any occurences of a set of words

Stéphane ROBIN and Jean-Jacques DAUDIN

*Ann. Inst. Statist. Math.*, **36** 895-905.

**Abstract**

The distribution of the distance between two (or more) successive
occurrences of a specific word in a random sequence of letters is known under
different models. In this paper, a more general problem is studied: the
distribution of the distance between two (or more) successive occurrences of
any word of a given set under a Markov model for the sequence. The
generating function and a recurrence for obtaining the probabilities are
given. These results are applied to study the distribution of the `CHI'
motif in the genome sequence of *Haemophilus influenzae*.

**Key words and phrases**
distance between occurrences, genome sequence analysis,
semi Markov process.