About FLASH


FLASH (Fast Length Adjustment of SHort reads) is a
very fast and accurate software tool to merge
paired-end reads from next-generation sequencing experiments. FLASH is designed to merge
pairs of reads when the original DNA fragments are shorter than twice the length of reads.
The resulting longer reads can significantly improve genome assemblies. They can also
improve transcriptome assembly when FLASH is used to merge RNA-seq data.

 


Accuracy


FLASH merges reads from paired-end sequencing runs with very high accuracy.

FLASH accuracy on one million 100bp long
synthetic pairs generated from fragments with a mean length of 180bp, normaly distributed
with a standard deviation of 20bp:

 




     No error   1% error rate   2% error rate   3% error rate   5% error rate 
 default parameters    99.73%    99.68%    98.43%    94.76%    77.91%  
 more aggressive parameters    99.73%    99.68%    99.06%    98.30%    93.65%  



Simulated reads used in the experiments are available here:
No error
1% error
2% error
3% error
5% error

FLASH accuracy on real data:
 



647,052 pairs of 101bp long reads from Staphylococcus aureus 90.77%
18,252,400 pairs of 101bp long reads from human 91.02%



The reads are available at the GAGE site:
Reads from GAGE

Time requirements


The latest version of FLASH includes a multi-threaded mode.

When run in single threaded mode:

  • FLASH takes 120 seconds to process one million 100-bp long pairs on a server with 256GB of RAM
    and a six-core 2.4GHz AMD Opteron CPU.
  • FLASH takes 129 seconds to process one million 100-bp long pairs on a desktop with 2GB of RAM
    and a dual-core Intel Xeon 3.00GHz CPU.

  • Time is linearly proportional to the read length and the number of reads.


Impact of FLASH on genome assemblies


Merging mate pairs by FLASH as a pre-processor for genome assembly yields
singificantly higher N50 value of contigs and scaffolds. It also reduces the number of missassembled contigs.


Publication


FLASH: Fast length adjustment of short reads to improve
genome assemblies.
T. Magoc and S. Salzberg. Bioinformatics 27:21 (2011), 2957-63.


Obtaining the Software


This software is OSI Certified Open
Source Software
.   


 


FLASH code or executable can be downloaded from
Sourceforge. Release
packages can also be directly downloaded from here:


Questions/Comments/Requests


Send an e-mail to flash.comment@gmail.com.


Funding


This work has been supported in part by NIH grants R01-LM006845, R01-GM083873,
and R01-HG006677 to S.L. Salzberg.


Source Article