MENU
R'MES general view rmes.gaussien rmes.poisson.composee rmes.poisson
rmes.gfam rmes.format rmes.histo rmes.compar rmes.pyramide


Families of Words



Families of words can be analyzed by using the programs rmes.gaussien or rmes.poisson.composee.


What kind of family?


We consider families that are sets of words with the same length. These words are written on the {a,c,g,t} alphabet. For instance, (aaa,aga,aca,ata) is a family of four 3-letters words that can be named like ana. One can analyze several families simultaneously; for this, the families have to be composed with the same number of words and all these words should be of the same length. The families of interest are described into a family file according to the following specific format.


Format of a family file


The file where families are described should be created with the following format:
  1. a title: a character string followed by the character "#"
  2. n.fams: the number of families
  3. n.words: the number of words in each family.
    (Each family must contain the same number of words)
  4. w.length: the length of the words.
    (All words must have the same length, whatever the family)
  5. For each of the n.fams families:
    1. its name (a character string) to identify the family in the results.
    2. the list of its n.words words. These words are written on the {A,C,G,T,a,c,g,t} alphabet.

Remark: The file where families are described can be generated automatically from a
pattern by using the command rmes.gfam.

Sommaire


Example of a family-file:

Familles rny, rnr, ynr et yry de 16 mots de longueur 3 #
4 16 3
rny
  aac  agc  acc  atc  
  aat  agt  act  att
  gac  ggc  gcc  gtc
  gat  ggt  gct  gtt

rnr
  aaa  aga  aca  ata
  aag  agg  acg  atg
  gaa  gga  gca  gta
  gag  ggg  gcg  gtg

ynr
  caa  cga  cca  cta
  cag  cgg  ccg  ctg
  taa  tga  tca  tta
  tag  tgg  tcg  ttg

yny
  cac  cgc  ccc  ctc
  cat  cgt  cct  ctt
  tac  tgc  tcc  ttc
  tat  tgt  tct  ttt


Warning:
Analyzing families starting or ending with n's has no interest since it is equivalent to study shorter words. (One then has to be careful by chosing the order of the Markov model). For instance, the family natc may be reduce to the single word atc, and cannot obsviously be analyzed in the M2 model.

Sommaire