Some tips for getting better assembling results:
- Compared with other major assemblers, Platanus assembler was designed to provide good results when using higher coverage data. The optimal coverage depth for Platanus is approximately >80. In some procedures, Platanus attempts to assemble each haplotype sequence separately. In other words, Platanus requires twice as high coverage sequences as other assemblers. This is the main reason why Platanus requires high coverage. You can find more details on Supplemental Materials page 68–74 of our Genome Research publication.
- To get good statistical results, mate-pair library sequences are indispensable. We received many claims and questions of poor assembling results. However, in almost all cases, only paired-end sequences were inputted. Except in the case of assembling very simple and small size genomes, it is impossible to get good results without using a mate-pair library.
Platanus Tutorial:
In this tutorial, we demonstrate how to assemble the genome sequence using Papilio xuthus raw sequencing data (“A genetic mechanism for female-limited Batesian mimicry in Papilio butterfly.” Nat Genet. (2015) 47 pp 405–409; doi: 10.1038/ng.3241). In this study, the P. xuthus genome was built using Platanus assembler from the following raw sequencing data. (Assembling details are described in the Supplementary Text and Figures.)
Getting started:
- Download Platanus as well as Platanus_trim and Platanus_internal_trim from this homepage and install them.
- We provide Linux 64 bit binary files as well as Platanus source code and pre-processing programs. You can download Platanus choosing either binary or source code according to your environment and install it. (Do not forget to give the execute permission when you download the binary files.)
- Download and prepare xuthus raw sequencing data from DDBJ, NCBI, or EBI.
- When downloading from DDBJ, you should download the data from the following link:
- After downloading and decompressing these data, you can prepare the following fastq files for assembling:
Library type Insert size Read1 Read2 Pair-end 300 bp DRR021673_1.fastq DRR021673_2.fastq Pair-end 500 bp DRR021674_1.fastq DRR021674_2.fastq Mate-pair 3 kb DRR021675_1.fastq DRR021675_2.fastq DRR021676_1.fastq DRR021676_2.fastq Mate-pair 5 kb DRR021677_1.fastq DRR021677_2.fastq DRR021678_1.fastq DRR021678_2.fastq Mate-pair 8 kb DRR021679_1.fastq DRR021679_2.fastq
- These data are already trimmed adaptor sequences. If you want to treat the raw data, you can trim adaptor sequences and low quality regions from paired-end sequences as follows:
Platanus_trim xxx_1.fastq xxx_2.fastq
- You can then obtain the trimmed files named xxx_1.fastq.trimmed and xxx_2.fastq.trimmed.
- Similarly, mate-pair sequences were treated using Platanus_internal_trim program and you can obtain the trimmed internal adaptor sequences, adaptor sequences, and low quality regions named xxx_1.fastq.int_trimmed and xxx_2.fastq.int_trimmed.
Platanus_internal_trim xxx_1.fastq xxx_2.fastq
- Contig assembling
- From the trimmed fastq sequences, the contig assembling procedure should be performed at the beginning. An example command would be:
Platanus assemble –o Pxut –f ./DRR02167[34]_[12].fastq –t 16 –m 128 2> assemble.log
- Note that the wild card (*, ?, and [ ]) is available to specify files. For example:
Platanus assemble –o Pxut –f ./DRR02167[34]_[12].fastq –t 16 –m 128 2> assemble.log
- You can also input mate-pair sequences for contig assembling, but it most often leads to the production of many misassemblings from our experience. It may result from the existence of chimeric (Artificially connected) reads in mate-pair sequences. After successful termination, in the above example, you can get contig assembling results named: Pxut_contig.fa (assembled contiguous sequences), Pxut_contigBubble.fa (merged and removed bubble sequences), and Pxut_32merFrq.tsv (occurrence distribution of 32-mers).
- Pxut_32merFrq.tsv file can be opened by using spreadsheet software, which will be helpful to understand the tendency of heterozygosity.
- Scaffolding
- After contig assembling, you can perform scaffolding using paired-end or mate-paired sequences. An example command would be:
Platanus scaffold –o Pxut –c Pxut_contig.fa –b Pxut_contigBubble.fa -IP1 ./DRR021673_1.fastq ./DRR021673_2.fastq –IP2 ./DRR021674_1.fastq ./DRR021674_2.fastq -OP3 ./DRR021675_1.fastq ./DRR021675_2.fastq ./DRR021676_1.fastq ./DRR021676_2.fastq -t 16 2> scaffold.log
- Note that wild card (*, ?, and [ ]) is also available to specify files.
- After scaffolding, you can get Pxut_scaffold.fa (assembled sequences that include gaps (‘N’s mean gaps)) and Pxut_scaffoldBubble.fa (removed bubble sequences).
- Gap-closing
- You can then perform gap-closing against scaffolding results. An example command would be:
Platanus gap_close –o Pxut –c Pxut_scaffold.fa -IP1 ./DRR021673_1.fastq ./DRR021673_2.fastq –IP2 ./DRR021674_1.fastq ./DRR021674_2.fastq -OP3 ./DRR021675_1.fastq ./DRR021675_2.fastq ./DRR021676_1.fastq ./DRR021676_2.fastq -t 16 2> gapclose.log
- Finally, you can get Pxut_gapClosed.fa (gap-closed scaffold sequences) as a final assembling result. The N50 value of this gap-closed sequence may be 5.4 Mb (≧500 bp) and this is slightly smaller than that described in the manuscript, probably due to the difference of pre-processing steps with those described in the manuscript and the difference in Platanus assembler’s version. Platanus v1.2.1 was used in the manuscript.