Compound Poisson and Poisson Process Approximations
for Occurrences of Multiple Words
in Markov Chains

Gesine REINERT and Sophie SCHBATH

J. Comp. Biol., vol. 5, 223-254, 1998.


We derive a Poisson process approximation for the occurrences of clumps of multiple words, and a compound Poisson process approximation for the number of occurrences of multiple words in a sequence of letters generated by a stationary Markov chain. Using the Chen-Stein method we provide a bound on the error in the approximations. For rare words, these errors tend to zero as the length of the sequence increases to infinity. Modeling a DNA sequence as a stationary Markov chain, we show as an application that the compound Poisson approximation is efficient for the number of occurrences of rare stem-loop motifs.

Key words and phrases Chen-Stein method, stem-loop motifs, compound Poisson approximation, Poisson process approximation, occurrences of multiple words.

