Abstract
We present a compound Poisson model that describes the occurrences process of a set of words in a random sequence of letters. This model takes into account the frequency of the words and their overlapping structure. This model is compared to the Markov chain model in terms of fit and parsimony. A special attention is paid to the detection of poor or rich regions. Several applications of this model are presented and a combination of the Markov and compound Poisson models is proposed in the end.
Key words and phrases Compound Poisson process, DNA sequences, homogeneity checking, Markov chain, words occurrences.