Mapping Reads on a Genomic Sequence: an Algorithmic Overwiew and a Practical Comparative Analysis

Data

2012-12-21: We updated the fastq files of the dataset to correct a mistake in the score attributed at each position (!-quality O- of instead of J -quality 40-). This has no impact on the published results but could impact your benchmarks.

Bacteria :

Reference database

Reads generated without [fasta/fastq]and with mismatches [fasta/fastq].


Human :

Reference database

Reads generated without [fasta/fastq]and with mismatches [fasta/fastq].


Software evaluated :

We have evaluated the performance of the 8 following mapping tools: BWA, Novoalign [Novocraft (2010)], Bowtie [Langmead et al. (2009)], SOAP2 [Li et al. (2009)], BFAST [Homer et al. (2009)], SSAHA2 [Ning et al. (2001)], GASSST [Rizk and Lavenier (2010)], PerM [Chen et al. (2009)] and MPscan [Rivals et al. (2009)]

Bacteria

Software
Version
Command line 0m
Command line 3m
BWA
0.5.8 bwa index -p Bactos.noZ -a bwtsw Bactos.noZ.fa
bwa aln -o 0 -n 0 Bactos.noZ Bactos.noZ_read_0m_oct.fa > Bactos.noZ_read_0m_oct.sai
bwa samse -n 2000 Bactos.noZ Bactos.noZ_read_0m_oct.sai Bactos.noZ_read_0m_oct.fa > Bactos.noZ_read_0m_oct.sam
bwa index -p Bactos.noZ -a bwtsw Bactos.noZ.fa
bwa aln -o 0 -n 3 -k 3 -N  Bactos.noZ Bactos.noZ_read_3m_oct.fa > Bactos.noZ_read_3m_oct.sai
bwa samse -n 2000 Bactos.noZ Bactos.noZ_read_3m_oct.sai Bactos.noZ_read_3m_oct.fa > Bactos.noZ_read_3m_oct.sam
Novoalign
2.06.09 novoindex Bactos.noZ.idx Bactos.noZ.fa
novoalign -f Bactos.noZ_read_0m_oct.fa -d Bactos.noZ.idx - oSAM -r A -g 99 > Bactos.noZ_read_0m.sam.novoalign
novoindex Bactos.noZ.idx Bactos.noZ.fa
novoalign -f Bactos.noZ_read_3m_oct.fa -d Bactos.noZ.idx -oSAM -r A -g 99 > Bactos.noZ_read_3m.sam.novoalign
Bowtie
0.12.7 bowtie-build -f Bactos.noZ.fa Bactos.noZ
bowtie --sam --sam-nohead --sam-nosq -v 0 -k 2000 -t -f Bactos.noZ Bactos.noZ_read_0m_oct.fa Bactos.noZ_read_0m.sam
bowtie-build -f Bactos.noZ.fa Bactos.noZ
bowtie --sam --sam-nohead --sam-nosq -v 3 -k 2000 -t -f Bactos.noZ Bactos.noZ_read_3m_oct.fa Bactos.noZ_read_3m.sam
SOAP2
2.20 2bwt-builder Bactos.noZ.fa
soap -a Bactos.noZ_read_0m_oct.fa -D Bactos.noZ.fa.index -o Bactos.noZ_read_0m_oct.fa-0v0m.soap -u Bactos_unmappedreads_0m_oct.fa-v0m0.soap -r 2 -v 0 -M 0 -p 1
N/A
BFAST
0.6.5a bfast match -f Bactos.noZ.fa -r Bactos.noZ_read_0m_oct.fq > Bactos.noZ_read_0m_oct.bmf

bfast localalign -f Bactos.noZ.fa -m Bactos.noZ_read_0m_oct.bmf -u > Bactos.noZ_read_0m_oct.baf

bfast postprocess -f Bactos.noZ.fa -i Bactos.noZ_read_0m_oct.baf -a 4 > Bactos.noZ_read_0m_oct.sam
 bfast match -f Bactos.noZ.fa -r Bactos.noZ_read_3m_oct.fq > Bactos.noZ_read_3m_oct.bmf

bfast localalign -f Bactos.noZ.fa -m Bactos.noZ_read_3m_oct.bmf -u > Bactos.noZ_read_3m_oct.baf

bfast postprocess -f Bactos.noZ.fa -i Bactos.noZ_read_3m_oct.baf -a 1 > Bactos.noZ_read_3m_oct.sam

SSAHA2
2.5.2 ssaha2Build -solexa -save Bactos.noZ Bactos.noZ.fa
ssaha2 -solexa -best 1 -output sam -identity 100 -outfile Bactos.noZ_0m.sam -save Bactos.noZ Bactos.noZ_read_0m_oct.fq
ssaha2Build -solexa -save Bactos.noZ Bactos.noZ.fa
ssaha2 -solexa -best 1 -output sam -identity 92 -outfile Bactos.noZ_3m_nov.sam -save Bactos.noZ Bactos.noZ_read_3m_oct.fq
GASSST
1.28 Gassst -d Bactos.noZ.fa -i Bactos.noZ_read_0m_oct.fa -p 100 -h 0 -l 0 -s 5 -o Bactos.noZ_read_0m.gassst
gassst_to_sam Bactos.noZ_read_0m.gassst Bactos.noZ_read_0m.gassst.sam
 Gassst -d Bactos.noZ.fa -i Bactos.noZ_read_3m_oct.fa -p 92.5 -h 0 -l 0 -s 5 -o Bactos.noZ_read_3m.gassst

gassst_to_sam Bactos.noZ_read_3m.gassst Bactos.noZ_read_3m.gassst.sam
PerM
0.3.9 perm Bactos.noZ.fa Bactos.noZ_read_0m_oct.fa -A -v 0 -k 2000 -o Bactos.noZ_0m.sam -u unmappedReads.Bactos.noZ_0m  perm Bactos.noZ.fa Bactos.noZ_read_3m_oct.fa -A -v 3 -k 2000 -o Bactos.noZ_3m.sam -u unmappedReads.Bactos.noZ_3m
MPScan

mpscan -p Bactos.noZ_read_0m_oct.fa -t Bactos.noZ.fa -d Bactos.noZ_read_0m_forward.log -r Bactos.noZ_read_0m_forward.mpscan -ro 1
mpscan -rev -ac -p Bactos.noZ_read_0m_oct.fa -t Bactos.noZ.fa -d Bactos.noZ_read_0m_reverse.log -r Bactos.noZ_read_0m_reverse.mpscan -ro 1
N/A

Human
Software
Version
Command line 0m
Command line 3m
BWA
0.5.8 bwa index -p CHR.noZ -a bwtsw CHR.noZ.fa
bwa aln -o 0 -n 0 CHR.noZ CHR.noZ_read_0m_oct.fa > CHR.noZ_read_0m_oct.sai
bwa samse -n 54000 CHR.noZ CHR.noZ_read_0m_oct.sai CHR.noZ_read_0m_oct.fa > CHR.noZ_read_0m_oct.sam
bwa index -p CHR.noZ -a bwtsw CHR.noZ.fa
bwa aln -o 0 -n 3 -k 3 -N CHR.noZ CHR.noZ_read_3m_oct.fa > CHR.noZ_read_3m_oct.sai
bwa samse -n 54000 CHR.noZ CHR.noZ_read_3m_oct.sai CHR.noZ_read_3m_oct.fa > CHR.noZ_read_3m_oct.sam
Novoalign
2.06.09 novoindex CHR.noZ.idx CHR.noZ.fa
novoalign -f CHR.noZ_read_0m_oct.fa -d CHR.noZ.idx -oSAM -r A -g 99 > CHR.noZ_read_0m.sam.novoalign
novoindex CHR.noZ.idx CHR.noZ.fa
novoalign -f CHR.noZ_read_3m_oct.fa -d CHR.noZ.idx -oSAM -r A -g 99 > CHR.noZ_read_3m.sam.novoalign
Bowtie
0.12.7 bowtie-build -f CHR.noZ.fa CHR.noZ
bowtie --sam --sam-nohead --sam-nosq -v 0 -k 54000 -t -f CHR.noZ CHR.noZ_read_0m_oct.fa CHR.noZ_read_0m.sam
bowtie-build -f CHR.noZ.fa CHR.noZ
bowtie --sam --sam-nohead --sam-nosq -v 3 -k 400000-t -f CHR.noZ CHR.noZ_read_3m_oct.fa CHR.noZ_read_3m.sam
SOAP2
2.20 2bwt-builder CHR.noZ.fa
soap -a CHR.noZ_read_0m_oct.fa -D CHR.noZ.fa.index -o CHR.noZ_read_0m_oct.fa-0v0m.soap -u CHR_unmappedreads_0m_oct.fa-v0m0.soap -r 2 -v 0 -M 0 -p 1
N/A
BFAST
0.6.5a bfast match -f CHR.noZ.fa -r CHR.noZ_read_0m_oct.fq > CHR.noZ_read_0m_oct.bmf

bfast localalign -f CHR.noZ.fa -m CHR.noZ_read_0m_oct.bmf -u > CHR.noZ_read_0m_oct.baf

bfast postprocess -f CHR.noZ.fa -i CHR.noZ_read_0m_oct.baf -a 4 > CHR.noZ_read_0m_oct.sam
bfast match -f CHR.noZ.fa -r CHR.noZ_read_3m_oct.fq > CHR.noZ_read_3m_oct.bmf

bfast localalign -f CHR.noZ.fa -m CHR.noZ_read_3m_oct.bmf -u > CHR.noZ_read_3m_oct.baf

bfast postprocess -f CHR.noZ.fa -i CHR.noZ_read_3m_oct.baf -a 1 > CHR.noZ_read_3m_oct.sam

SSAHA2
2.5.2 ssaha2Build -solexa -save CHR.noZ CHR.noZ.fa
ssaha2 -solexa -best 1 -output sam -identity 100 -outfile CHR.noZ_0m.sam -save CHR.noZ CHR.noZ_read_0m_oct.fq
ssaha2Build -solexa -save CHR.noZ CHR.noZ.fa
ssaha2 -solexa -best 1 -output sam -identity 92 -outfile CHR.noZ_3m_nov.sam -save CHR.noZ CHR.noZ_read_3m_oct.fq
GASSST
1.28 Gassst -d CHR.noZ.fa -i CHR.noZ_read_0m_oct.fa -p 100 -h 0 -l 0 -s 5 -o CHR.noZ_read_0m.gassst
gassst_to_sam CHR.noZ_read_0m.gassst CHR.noZ_read_0m.gassst.sam
Gassst -d CHR.noZ.fa -i $DATA/CHR.noZ_read_3m_oct.fa -p 92.5 -h 0 -l 0 -s 5 -o CHR.noZ_read_3m.gassst

gassst_to_sam CHR.noZ_read_3m.gassst CHR.noZ_read_3m.gassst.sam
PerM
0.3.9 perm CHR.noZ.fa CHR.noZ_read_0m_oct.fa -A -v 0 -k 54000 -o CHR.noZ_0m.sam -u unmappedReads.CHR.noZ_0m perm CHR.noZ.fa CHR.noZ_read_3m_oct.fa -A -v 3 -k 54000 -o CHR.noZ_3m.sam -u unmappedReads.CHR.noZ_3m
MPScan

mpscan -p CHR.noZ_read_0m_oct.fa -t CHR.noZ.fa -d CHR.noZ_read_0m_forward.log -r CHR.noZ_read_0m_forward.mpscan -ro 1
mpscan -rev -ac -p CHR.noZ_read_0m_oct.fa -t CHR.noZ.fa -d CHR.noZ_read_0m_reverse.log -r CHR.noZ_read_0m_reverse.mpscan -ro 1
N/A