MENU
R'MES general view
rmes.gaussien
rmes.poisson.composee rmes.poisson
rmes.gfam rmes.format rmes.histo rmes.compar rmes.pyramide


rmes.gaussien
Gaussian Approximation of Word Counts



Description Usage Options Sequence file Output file Examples


Description

For each given word or family of words, the program rmes.gaussien computes the quantities
count,exp,sigma2,stat,rank under the <m>-order Markov chain model using by default the conditional asymptotic Gaussian approximation. The statistic stat follows the Gaussian distribution N(0,1). A high positive value (resp. high negative value) of this statistic indicates that the word or the family is over-represented (resp. under-represented).

Usage

rmes.gaussien [-english] -h
or
rmes.gaussien -o <prefix> 
              -seq DNA-sequence-file
              [-phase] 
              [-m <m> | -max] 
              [ [-hmin <hmin>] [-hmax <hmax>] | -fam  family-file ]
              [-mart [-rev] ]
              [-english] 

Options

-h
gives only the syntax of the command.
-o <prefix>
<prefix> is the character string that will be used to name the output files. A file named <prefix>.0 is created when -phase is not set. Otherwise, four files are created, named <prefix>.1, <prefix>.2, <prefix>.3 and <prefix>.4.
Previous files are replaced, if any.
-seq
This option introduces the name of the sequence-file.
-phase
When set, the phase (or reading frame) is taken into account.
The phase of a letter (base) in the sequence corresponds to its position in the sequence modulo 3. A given base can be in phase 1, 2 or 3. The phase of a given word in the sequence is defined to be the phase of its last letter in the sequence.
-m <m> or -max
When -m is set, <m> represents the order of the model and must satisfy <m> >=0.
When -max is set, each word of length h (see
-hmin and -hmax) is studied under the model of order h-2, that is the maximal model.
By default, the order of the model is 1.
-hmin <hmin>
When set, <hmin> corresponds to the minimal length of words to study.
By default, <hmin> is equal to
<m> + 2 (or 3 if the option -m is not set).
-hmax <hmax>
When set, <hmax> corresponds to the maximal length of words to study.
By default, <hmax> is equal to the value of
<hmin>.
Maximum value: 15.
-fam
When this option is set, families of words are analyzed.
The family-file introduced is the name of the file where the families are described. The options -hmin and -hmax are then ignored.
When this option is not set, all the words of length between <hmin> and <hmax> are analyzed.
-mart, -rev
When -mart is set, the algorithm Mart (martingale) is used; when -mart and -rev are both set, the algorithm Mart_r (martingale reverse) is used.
By default, the algorithm Cond_as (conditional asymptotic) is used.
-english or -eng
When set, the messages are in english.
By default, they are in french.

Sommaire

Examples

1- Computation of the statistics of the 3-words in the lambda genome and in the pt7 genome under M1. The output files lambda.gaussien.3_1.0 and pt7.gaussien.3_1.0 can be then formated using rmes.format or displayed with the Splus functions.

rmes.gaussien -o lambda.gaussien.3_1 -seq lambda 
	      -hmin 3 -m 1

rmes.gaussien -o pt7.gaussien.3_1 -seq pt7 
	      -hmin 3 -m 1
 
2- Computation of the statistics of the phased 4-words in a coding DNA sequence of E. coli (coli-codant) under M2_3. Four output files coli-codant.gaussien.4_2.1, .2, .3 and .4 are then created corresponding to the words in phase 1, 2, 3 and all phases together.

rmes.gaussien -o coli-codant.gaussien.4_2 -seq coli-codant 
	      -hmin 4 -m 2 -phase 
3- Computation of the statistics of the 4-words in the ecomori sequence under M1 and then under M2. The output files ecomori.gaussien.4_1.0 and ecomori.gaussien.4_2.0 are then created.

rmes.gaussien -o ecomori.gaussien.4_1 -seq ecomori 
	      -hmin 4 -m 1
 
rmes.gaussien -o ecomori.gaussien.4_2 -seq ecomori 
	      -hmin 4 -m 2 
4- Computation of the statistics of the 3-words, 4-words, 5-words and 6-words in the ecomori sequence under maximal models. We chose the Mart algorithm. The output file ecomori.gaussien.mart.3-6_max.0 is then created.

rmes.gaussien -o ecomori.gaussien.mart.3-6_max -seq ecomori 
	      -hmin 3 -hmax 6 -max -mart 
5- Computation of the statistics of the phased 4-words, 5-words and 6-words in a coding DNA sequence of E. coli (coli-codant) under M2_3. Four output files coli-codant.gaussien.4-6_2.1, .2, .3 and .4 are then created corresponding to the words in phase 1, 2, 3 and all phases together.

rmes.gaussien -o coli-codant.gaussien.4-6_2 -seq coli-codant 
	      -hmin 4 -hmax 6 -m 2 -phase 
6- Computation of the statistics of four families rny, rnr, ynr and yry (contained in the family file myfam) in the lambda genome under the model M1. The file lambda.gaussien.myfam_1.0 is created.
rmes.gaussien -o lambda.gaussien.myfam_1 -seq lambda -fam myfam -m 1
7- Computation of the statistics of families of words on the sequences hf-rep (the replication genome of H. Influenzae) and hf-rep-sans-uptake (the replication genome of H. Influenzae without the uptake sequence aagtgcggt and its complementary) under the model M1. The family file is fam.xnxxxxxx.
rmes.gaussien -o hf-rep.xnxxxxxx_1 -m 1 -fam fam.xnxxxxxx -seq hf-rep

rmes.gaussien -o hf-rep-sans-uptake.xnxxxxxx_1 -m 1 
	      -fam fam.xnxxxxxx -seq hf-rep-sans-uptake

Sommaire