FLASH (Fast Length Adjustment of SHort reads) is a
very fast and accurate software tool to merge
paired-end reads from next-generation sequencing experiments. FLASH is designed to merge
pairs of reads when the original DNA fragments are shorter than twice the length of reads.
The resulting longer reads can significantly improve genome assemblies. They can also
improve transcriptome assembly when FLASH is used to merge RNA-seq data.
FLASH merges reads from paired-end sequencing runs with very high accuracy.
FLASH accuracy on one million 100bp long
synthetic pairs generated from fragments with a mean length of 180bp, normaly distributed
with a standard deviation of 20bp:
|No error||1% error rate||2% error rate||3% error rate||5% error rate|
|more aggressive parameters||99.73%||99.68%||99.06%||98.30%||93.65%|
Simulated reads used in the experiments are available here:
FLASH accuracy on real data:
|647,052 pairs of 101bp long reads from Staphylococcus aureus||90.77%|
|18,252,400 pairs of 101bp long reads from human||91.02%|
The reads are available at the GAGE site:
Reads from GAGE
The latest version of FLASH includes a multi-threaded mode.
When run in single threaded mode:
- FLASH takes 120 seconds to process one million 100-bp long pairs on a server with 256GB of RAM
and a six-core 2.4GHz AMD Opteron CPU.
- FLASH takes 129 seconds to process one million 100-bp long pairs on a desktop with 2GB of RAM
and a dual-core Intel Xeon 3.00GHz CPU.
Time is linearly proportional to the read length and the number of reads.
Impact of FLASH on genome assemblies
Merging mate pairs by FLASH as a pre-processor for genome assembly yields
singificantly higher N50 value of contigs and scaffolds. It also reduces the number of missassembled contigs.
FLASH: Fast length adjustment of short reads to improve
genome assemblies. T. Magoc and S. Salzberg. Bioinformatics 27:21 (2011), 2957-63.
Obtaining the Software
This software is OSI Certified Open
FLASH code or executable can be downloaded from
packages can also be directly downloaded from here:
Send an e-mail to email@example.com.
This work has been supported in part by NIH grants R01-LM006845, R01-GM083873,
and R01-HG006677 to S.L. Salzberg.