rmes.poisson.composee

MENU
R'MES general view rmes.gaussien
rmes.poisson.composee
rmes.poisson

rmes.gfam rmes.format rmes.histo rmes.compar rmes.pyramide

rmes.poisson.composee Compound Poisson Approximation of Long Word Counts

Description

For each given word or word family, the program rmes.poisson.composee computes the quantities count,exp,exp_p,A,stat,rank under the <m>-order Markov chain model using the compound Poisson approximation recommended for rare words (no quantity A is produced for families). The statistic stat follows the Gaussian distribution N(0,1). A high positive value (resp. high negative value) of this statistic indicates that the word is over-represented (resp. under-represented).

WARNING: Because of numerical limitations, the use of rmes.poisson.composee is not recommended for frequent/short words, namely words or families expected more than 100 times. A warning message appears every time a p-value could not be correctly calculated.

Usage

rmes.poisson.composee -h
or
rmes.poisson.composee -o <prefix> -seq DNA-sequence-file [-m <m> | -max] [ [-hmin <hmin>] [-hmax <hmax>] | -fam family-file ] [-english]
Note:
This algorithm does not take into account the phase (or reading frame).

Options

-h: gives only the syntax of the command.
-o <prefix>: <prefix> is the character string that will be used to name the output file. A file named <prefix>.0 is created. Previous file is replaced, if any.
-seq: This option introduces the name of the sequence-file.
-m <m> or -max: When -m <m> is set, <m> represents the order of the model and must satisfy <m> >=0.
When -max is set, each word of length h (see -hmin and -hmax) is studied under the model of order h-2, that is the maximal model.
By default, the order of the model is 1.
-hmin <hmin>: When set, <hmin> corresponds to the minimal length of words to study.
By default, <hmin> is equal to <m> + 2 (or 3 if the option -m is not set).
-hmax <hmax>: When set, <hmax> corresponds to the maximal length of words to study.
By default, <hmax> is equal to the value of <hmin>.
Maximum value: 15.
-fam <family-file>: When this option is set, families of words are analyzed.
The family-file introduced is the name of the file where the families are described. The options -hmin and -hmax are then ignored.
When this option is not set, all the words of length between <hmin> and <hmax> are analyzed.
-english or -eng: When set, the messages are in english.
By default, they are in french.

Limits

A warning message displays the list of words for which the p-value could not have been correctly calculated (too close to 1 or 0). However, these words are exceptional and their statistic is calculated with the last correct p-value.

Example

Computation of the statistics of the 7-words and the 8-words in the lambda genome under M1. The output file lambda.pc.7-8_1.0 can then be formatted using rmes.format or displayed with the Splus functions. rmes.poisson.composee -o lambda.pc.7-8_1 -seq lambda -hmin 7 -hmax 8 -m 1