#
An overview on the distribution of word counts

in Markov chains

###
Sophie SCHBATH

###
*J. Comp. Biol.* **7**, 2000.

**Abstract**

In this paper, we give an overview about the different
results existing on the statistical
distribution of word counts in a Markovian sequence of letters. Results
concerning the number of overlapping occurrences, the number of
renewals and the number of clumps will be presented.
Counts of single words but also multiple words are considered.
Most of the results are approximations as the
length of the sequence tends to infinity. We will see that Gaussian
approximations switch to (compound) Poisson approximations
for rare words.
Modeling DNA sequences or proteins by stationary Markov chains, these
results can be used to study the statistical frequency of
motifs in a given sequence.

**Key words and phrases**
word count distribution, Markovian random sequence,
overlapping occurrences, renewals, clumps.

Statistiques des Séquences Biologiques Home Page