Version française To get the software Last news

ISLAND:
Program to simulate the progress of an STS mapping project



Function References Usage


Fonction :



This program allows to calculate or to estimate the progress of a mapping project using the anchoring approach, by giving the (mean) number of anchored islands, the mean length of an anchored island and the (mean) proportion of genome covered by anchored islands.

The clone and anchor (STS) locations along the genome can be
- either
read from previously generated input files,
- or simulated according to a specified model.

Clones containing at least an anchor in common are assembled into anchored islands.

This program is written in C++.

Sommaire

References :

Usage :



When the clone and anchor locations are read from input files When the clone and anchor locations are simulated



A. When the clone and anchor locations are read from input files


Input files Output Usage Example

A.1 Input files

The user provides the 2 following input files:

Sommaire

A.2 Output

The results are:

  • the number of anchored islands obtained,
  • the average length of an anchored island,
  • the percentage of genome covered by anchored islands.

Sommaire

A.3 Usage

With a Unix command From a program

A.3.1 With a Unix command

island -c[lones] filename-for-clones 
       -a[nchors] filename-for-anchors
       [-f[rench]]
The clone locations file and the anchor locations file are both input files that are not modified in output.

The results are printed on the standard output.

When the -french option is chosen, all messages are in french; by default, they are in english.

Sommaire


A.3.2 From a program

It is possible to call island from a program or a host-system, after compiling the C++ source-files that are provided in the package, and link-editing the created object-files with the calling program or system.

There are two possibilities:

  1. to call a program that prints the results on the standard output,

  2. to directly call the computation program; in that case, the results are transferred in arguments.

In both cases, a global variable must be declared and initialized in the calling program:

int french=0;
the value 0 means that messages will be in english.
If one wishes french messages, french has to be set to 1.


A.3.2.1 Usage from a program
results printed:

The declaration of the program to be called is the following one:

void main_lecture(char* fileclone, char * fileanchor);
The fileclone and fileanchor arguments respectively contain the pathname of the
clone locations file and the pathname of the anchor locations file.

Sommaire

A.3.2.2 Usage from a program
input and output transferred in arguments:

The declaration of the program to be called is the following one:


void commun(int SIZE, int M, int N, int inter, double Ginit, 
            double Np, double max, double min,
            ifstream& ifileclone, ifstream& ifileancre,
            ofstream& ofile,
            double& NbMoy, double& LgMoy, double& OceanMoy,
            double& VarNbIle, double& VarLgIle, double& VarOcean) 

Remark:
Since this program can be used both in the case where the clone and anchor locations are read from input files and in the case where the clone and anchor locations are simulated, we describe the arguments in both cases.

Input arguments:

SIZE: Number of iterations when the locations are simulated; 1 otherwise.

M: Mean number of anchors when the locations are simulated; ignored otherwise.

N: Mean number of clones when the locations are simulated; ignored otherwise.

inter: Variability of the clone lengths (in bp) when the locations are simulated; ignored otherwise.

Ginit: Genome length (in bp).

Np: Number of regions along the genome when the locations are simulated; ignored otherwise.

max: Mean length of long clones (in bp) when the locations are simulated; ignored otherwise.

min: Mean length of small clones (in bp) when the locations are simulated; ignored otherwise.

ifileclone: Pointer on a file opened in read-mode when the locations are read from input files; NULL otherwise (variable used when reading the
clone locations file).
ifileancre: Pointer on a file opened in read-mode when the locations are read from input files; NULL otherwise (variable used when reading the
anchor locations file).
ofile: Pointer on a file opened in write-mode when the locations are simulated and when one wants to store the intermediate results for each iteration; NULL otherwise (variable used when writing the
detail-file).

Output arguments:

NbMoy: (Mean) number of anchored islands,

LgMoy: mean length of an anchored island,

OceanMoy: (Mean) proportion of oceans (genome not covered by anchored islands),

VarNbIle, VarLgIle, VarOcean: empirical variances of the previous quantities when several simulations have been made; 0 otherwise.

Remark:
The standard deviations calculated when
island is called by the
Unix command or by the program that prints the results, are actually equal to sqrt(Variance/SIZE).

Sommaire

A.4 Example:

First lines of the clone locations file, named CLONES:
(Comments are not part of the file)

100000          <----- genome length in basepairs
 99805 99920    <----- left-hand end followed by right-hand end locations
 99749 99894    <-----    of clones .... 
 99762 99877    <-----        .... sorted according to decreasing right end

First lines of the anchor locations files, named ANCHORS:
(Comments are not part of the file)

100000          <----- genome length in basepairs
 99927          <----- anchor locations
 99865          <-----    .... 
 99563          <-----        .... sorted by decreasing order

Unix command:

island -clones CLONES -anchors ANCHORS

What appears on the standard output:

This is a program to calculate some properties of the physical
map of a genome of length 100000 constructed by the anchoring approach. 
 
Clone and anchor locations are respectively read in the files
CLONES and ANCHORS.
 
Number of clones taken into account: 2334
Number of anchors taken into account: 514
 
 
 
Here are the results:

One obtains 193 anchored islands,
with an average length of 519.47 bases
and covering 88.07 percent of the genome.

Sommaire


B. When the clone and anchor locations are simulated


Model Input parameters Output Usage Example


B.1 Model
  • Anchor and clone locations are independently and homogeneously simulated along the genome (according to Poisson processes).
  • Clones have independent but not necessarily identically distributed lengths: their lengths have a distribution that depends on the location of the clone. For definiteness, we use the right-hand end of a clone to define its location. The following model is used: one assumes the genome is split in regions in which long and short clones alternate. We assume the length of a long clone is uniformly distributed between L-v and L+v, whereas the length of a short clone is uniformly distributed between l-v and l+v (l, L and v are input parameters).
  • Regions of long clones and regions of short clones have the same fixed length.

B.2 Input parameters

A dialogue is established with the user to set the following input parameters:

  • the number of iterations required to calculate the empirical mean and standard deviation of the quantities of interest,
  • the genome length in basepairs,
  • the mean number of anchors studied,
  • the mean number of clones in the library,
  • the total number of regions of long and short clones,
  • the mean length L of long clones,
  • the mean length l of short clones,
  • the variability v of the clone lengths.

Sommaire

B.3 Output

The results are:
  • the average number of anchored islands and the associated standard deviation,
  • the average length of an anchored island and the associated standard deviation,
  • the average proportion of oceans (genome not covered by anchored islands) and the associated standard deviation.

Sommaire

B.4 Usage

With a Unix command From a program

B.4.1 With a Unix command
island [-d[etail] detail-file] [-f[rench]]
The -detail option is to write, in the associated file, the value of the three quantities of interest obtained at each iteration.
Warning: if the detail-file already exists, it will be overwritten.

When the -french option is chosen, all messages are in french; by default, they are in english.

Results are printed on the standard output.

Sommaire


B.4.2 From a program

As in the case where the clone and anchor locations are read from input files, it is possible to call island from a program or a host-system: see paragraph A.3.2.

  • In the case where the clone and anchor locations are simulated, the program that prints the results on the standard output is main_simul.
  • However, the program commun that computes the results is identical in both cases.


Usage from a program
results printed:

The declaration of the program to be called is the following one:

void main_simul(char* fic);
The fic argument contains the pathname of the file in which one wants to store the three quantities of interest at each iteration. If one does not want to store these intermediate results, fic has to be set to NULL.

Remark: fic corresponds to the detail-file when one uses the -detail option of the island command.

Sommaire

B.5 Example

Unix command:

island
What appears on the standard output:

This is a program to calculate some properties of the physical
map of a genome constructed by the anchoring approach 
from simulated data.
 
Type in the data required for these simulations:
 
 
How many simulations do you want to do (>0)? 
100
What is the genome length (in basepairs)?
100000000
What is the mean number of anchors (>0)?
500
What is the mean number of clones (>0)?
2300
How many regions do you want to consider along the genome (>0)?
20
What is the mean length of long clones (in basepairs)?
350000
What is the mean length of small clones (in basepairs)?
150000
How much variability do you allow for the clone lengths (in basepairs)?
100000
 
 
Genome length  (bp): 1e+08
Mean number of anchors: 500
Mean number of clones: 2300
Number of regions: 20
Mean length of long clones (bp): 350000
Mean length of small clones (bp): 150000
Variability of clone lengths (bp): 100000
 
 
 
 
RESULTS:
Mean number of anchored islands:        175.02 (+/- 0.7970) 
Mean length of an anchored island (bp): 540575.02 (+/- 1934.8311) 
Mean proportion of oceans:              0.16 (+/- 0.0018) 
 

Sommaire


Last release: June 26, 1998