De novo assembly and sequence clustering of metagenomic data enable the construction ofmultiple draft genomes including those of uncultured organisms. For that purpose, wedeveloped the novel tool, MetaPlatanus, with the following features:
- contig-assemblycapable of handling uneven sequence coverage reflecting the abundance of each species
- prevention of inter-species misjoinings during scaffolding
- ability to handle long-insert mate-pairs for scaffolding
- di-codon-based clustering of sequences
- seamless combination of assembly and clustering to improve the individual results of each procedure.
The benchmark was performed using three synthetic human gut datasets for which the genomicDNA of 20 known bacteria were mixed at different abundance ratios and sequenced by Illumina sequencers. Application of MetaPlatanus resulted in few inter-species misjoinings, high contiguity of scaffolds (most draft genomes consisted of mega-order-length scaffolds), and high cluster precision compared to results from other commonly utilized tools. In addition, the results verified that long-insert mate-pairs were effective for metagenomic assembly. Upon assembly of previously published actual cow rumen metagenomic data (Hess M et al. (2011) Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science, 331, 463–7), we demonstrated that MetaPlatanus constructed draft genomes including species that were not reported in the original paper. We expect that use of our proposed method will permit the automation of metagenomic assemblies.