Metagenomics Methods And Protocols Pdf

Posted on 26.08.2019 by admin

nowbotstartup.netlify.com › Metagenomics Methods And Protocols Pdf ►

Metagenomics Methods And Protocols Pdf 4,7/5 5639 reviews

Metagenomics Methods And Protocols Pdf Format
Metagenomics Methods And Protocols Pdf Answers

Metagenomics: Methods and Protocols (Methods in Molecular Biology) Content This second edition explores up-to-date tools in various function-based technologies currently used in metagenomics. The chapters in this book discuss all of the working steps involved in these technologies, such as: DNA isolation from soils and marine samples followed by the construction and screening of libraries. The field of metagenomics offers unique 12 perspectives on culturable and non-culturable microorganisms and their biosynthetic products a- 14. The met genomics has been applied to study a range of soil environments, and comparisons with cultivation techniques should include biases in the methods used to extract DNA from soil. Download full-text PDF. Viral Metagenomics: Methods and Protocols is a valuable resource for researchers and specialists who are interested in learning more about this evolving field.

(Redirected from Metagenome)

Metagenomics allows the study of microbial communities like those present in this stream receiving acid drainage from surface coal mining.

Metagenomics is the study of genetic material recovered directly from environmental samples. The broad field may also be referred to as environmental genomics, ecogenomics or community genomics.

While traditional microbiology and microbial genome sequencing and genomics rely upon cultivated clonalcultures, early environmental gene sequencing cloned specific genes (often the 16S rRNA gene) to produce a profile of diversity in a natural sample. Such work revealed that the vast majority of microbial biodiversity had been missed by cultivation-based methods.^[1]

Because of its ability to reveal the previously hidden diversity of microscopic life, metagenomics offers a powerful lens for viewing the microbial world that has the potential to revolutionize understanding of the entire living world.^[2] As the price of DNA sequencing continues to fall, metagenomics now allows microbial ecology to be investigated at a much greater scale and detail than before. Recent studies use either 'shotgun' or PCR directed sequencing to get largely unbiased samples of all genes from all the members of the sampled communities.^[3]

3Sequencing
4Bioinformatics
5Data analysis
6Applications

Etymology[edit]

The term 'metagenomics' was first used by Jo Handelsman, Jon Clardy, Robert M. Goodman, Sean F. Brady, and others, and first appeared in publication in 1998.^[4] The term metagenome referenced the idea that a collection of genes sequenced from the environment could be analyzed in a way analogous to the study of a single genome. In 2005, Kevin Chen and Lior Pachter (researchers at the University of California, Berkeley) defined metagenomics as 'the application of modern genomics technique without the need for isolation and lab cultivation of individual species'.^[5]

History[edit]

Conventional sequencing begins with a culture of identical cells as a source of DNA. However, early metagenomic studies revealed that there are probably large groups of microorganisms in many environments that cannot be cultured and thus cannot be sequenced. These early studies focused on 16S ribosomalRNA sequences which are relatively short, often conserved within a species, and generally different between species. Many 16S rRNA sequences have been found which do not belong to any known cultured species, indicating that there are numerous non-isolated organisms. These surveys of ribosomal RNA (rRNA) genes taken directly from the environment revealed that cultivation based methods find less than 1% of the bacterial and archaeal species in a sample.^[1] Much of the interest in metagenomics comes from these discoveries that showed that the vast majority of microorganisms had previously gone unnoticed.

Early molecular work in the field was conducted by Norman R. Pace and colleagues, who used PCR to explore the diversity of ribosomal RNA sequences.^[6] The insights gained from these breakthrough studies led Pace to propose the idea of cloning DNA directly from environmental samples as early as 1985.^[7] This led to the first report of isolating and cloning bulk DNA from an environmental sample, published by Pace and colleagues in 1991^[8] while Pace was in the Department of Biology at Indiana University. Considerable efforts ensured that these were not PCR false positives and supported the existence of a complex community of unexplored species. Although this methodology was limited to exploring highly conserved, non-protein coding genes, it did support early microbial morphology-based observations that diversity was far more complex than was known by culturing methods. Soon after that, Healy reported the metagenomic isolation of functional genes from 'zoolibraries' constructed from a complex culture of environmental organisms grown in the laboratory on dried grasses in 1995.^[9] After leaving the Pace laboratory, Edward DeLong continued in the field and has published work that has largely laid the groundwork for environmental phylogenies based on signature 16S sequences, beginning with his group's construction of libraries from marine samples.^[10]

In 2002, Mya Breitbart, Forest Rohwer, and colleagues used environmental shotgun sequencing (see below) to show that 200 liters of seawater contains over 5000 different viruses.^[11] Subsequent studies showed that there are more than a thousand viral species in human stool and possibly a million different viruses per kilogram of marine sediment, including many bacteriophages. Essentially all of the viruses in these studies were new species. In 2004, Gene Tyson, Jill Banfield, and colleagues at the University of California, Berkeley and the Joint Genome Institute sequenced DNA extracted from an acid mine drainage system.^[12] This effort resulted in the complete, or nearly complete, genomes for a handful of bacteria and archaea that had previously resisted attempts to culture them.^[13]

Flow diagram of a typical metagenome project^[14]

Beginning in 2003, Craig Venter, leader of the privately funded parallel of the Human Genome Project, has led the Global Ocean Sampling Expedition (GOS), circumnavigating the globe and collecting metagenomic samples throughout the journey. All of these samples are sequenced using shotgun sequencing, in hopes that new genomes (and therefore new organisms) would be identified. The pilot project, conducted in the Sargasso Sea, found DNA from nearly 2000 different species, including 148 types of bacteria never before seen.^[15] Venter has circumnavigated the globe and thoroughly explored the West Coast of the United States, and completed a two-year expedition to explore the Baltic, Mediterranean and Black Seas. Analysis of the metagenomic data collected during this journey revealed two groups of organisms, one composed of taxa adapted to environmental conditions of 'feast or famine', and a second composed of relatively fewer but more abundantly and widely distributed taxa primarily composed of plankton.^[16]

In 2005 Stephan C. Schuster at Penn State University and colleagues published the first sequences of an environmental sample generated with high-throughput sequencing, in this case massively parallel pyrosequencing developed by 454 Life Sciences.^[17] Another early paper in this area appeared in 2006 by Robert Edwards, Forest Rohwer, and colleagues at San Diego State University.^[18]

Sequencing[edit]

Recovery of DNA sequences longer than a few thousand base pairs from environmental samples was very difficult until recent advances in molecular biological techniques allowed the construction of libraries in bacterial artificial chromosomes (BACs), which provided better vectors for molecular cloning.^[19]

Environmental Shotgun Sequencing (ESS). (A) Sampling from habitat; (B) filtering particles, typically by size; (C) Lysis and DNA extraction; (D) cloning and library construction; (E) sequencing the clones; (F) sequence assembly into contigs and scaffolds.

Shotgun metagenomics[edit]

Advances in bioinformatics, refinements of DNA amplification, and the proliferation of computational power have greatly aided the analysis of DNA sequences recovered from environmental samples, allowing the adaptation of shotgun sequencing to metagenomic samples (known also as whole metagenome shotgun or WMGS sequencing). The approach, used to sequence many cultured microorganisms and the human genome, randomly shears DNA, sequences many short sequences, and reconstructs them into a consensus sequence. Shotgun sequencing reveals genes present in environmental samples. Historically, clone libraries were used to facilitate this sequencing. However, with advances in high throughput sequencing technologies, the cloning step is no longer necessary and greater yields of sequencing data can be obtained without this labour-intensive bottleneck step. Shotgun metagenomics provides information both about which organisms are present and what metabolic processes are possible in the community.^[20] Because the collection of DNA from an environment is largely uncontrolled, the most abundant organisms in an environmental sample are most highly represented in the resulting sequence data. To achieve the high coverage needed to fully resolve the genomes of under-represented community members, large samples, often prohibitively so, are needed. On the other hand, the random nature of shotgun sequencing ensures that many of these organisms, which would otherwise go unnoticed using traditional culturing techniques, will be represented by at least some small sequence segments.^[12] An emerging approach combines shotgun sequencing and chromosome conformation capture (Hi-C), which measures the proximity of any two DNA sequences within the same cell, to guide microbial genome assembly.^[21]

High-throughput sequencing[edit]

The first metagenomic studies conducted using high-throughput sequencing used massively parallel 454 pyrosequencing.^[17] Three other technologies commonly applied to environmental sampling are the Ion Torrent Personal Genome Machine, the Illumina MiSeq or HiSeq and the Applied Biosystems SOLiD system.^[22] These techniques for sequencing DNA generate shorter fragments than Sanger sequencing; Ion Torrent PGM System and 454 pyrosequencing typically produces ~400 bp reads, Illumina MiSeq produces 400-700bp reads (depending on whether paired end options are used), and SOLiD produce 25-75 bp reads.^[23] Historically, these read lengths were significantly shorter than the typical Sanger sequencing read length of ~750 bp, however the Illumina technology is quickly coming close to this benchmark. However, this limitation is compensated for by the much larger number of sequence reads. In 2009, pyrosequenced metagenomes generate 200–500 megabases, and Illumina platforms generate around 20–50 gigabases, but these outputs have increased by orders of magnitude in recent years.^[24] An additional advantage to high throughput sequencing is that this technique does not require cloning the DNA before sequencing, removing one of the main biases and bottlenecks in environmental sampling.

Bioinformatics[edit]

The data generated by metagenomics experiments are both enormous and inherently noisy, containing fragmented data representing as many as 10,000 species.^[25] The sequencing of the cow rumen metagenome generated 279 gigabases, or 279 billion base pairs of nucleotide sequence data,^[26] while the human gut microbiome gene catalog identified 3.3 million genes assembled from 567.7 gigabases of sequence data.^[27] Collecting, curating, and extracting useful biological information from datasets of this size represent significant computational challenges for researchers.^[20]^[28]^[29]^[30]

Sequence pre-filtering[edit]

The first step of metagenomic data analysis requires the execution of certain pre-filtering steps, including the removal of redundant, low-quality sequences and sequences of probable eukaryotic origin (especially in metagenomes of human origin).^[31]^[32] The methods available for the removal of contaminating eukaryotic genomic DNA sequences include Eu-Detect and DeConseq.^[33]^[34]

Assembly[edit]

DNA sequence data from genomic and metagenomic projects are essentially the same, but genomic sequence data offers higher coverage while metagenomic data is usually highly non-redundant.^[29] Furthermore, the increased use of second-generation sequencing technologies with short read lengths means that much of future metagenomic data will be error-prone. Taken in combination, these factors make the assembly of metagenomic sequence reads into genomes difficult and unreliable. Misassemblies are caused by the presence of repetitive DNA sequences that make assembly especially difficult because of the difference in the relative abundance of species present in the sample.^[35] Misassemblies can also involve the combination of sequences from more than one species into chimeric contigs.^[35]

There are several assembly programs, most of which can use information from paired-end tags in order to improve the accuracy of assemblies. Some programs, such as Phrap or Celera Assembler, were designed to be used to assemble single genomes but nevertheless produce good results when assembling metagenomic data sets.^[25] Other programs, such as Velvet assembler, have been optimized for the shorter reads produced by second-generation sequencing through the use of de Bruijn graphs. The use of reference genomes allows researchers to improve the assembly of the most abundant microbial species, but this approach is limited by the small subset of microbial phyla for which sequenced genomes are available.^[35] After an assembly is created, an additional challenge is 'metagenomic deconvolution', or determining which sequences come from which species in the sample.^[36]

Gene prediction[edit]

Metagenomic analysis pipelines use two approaches in the annotation of coding regions in the assembled contigs.^[35] The first approach is to identify genes based upon homology with genes that are already publicly available in sequence databases, usually by BLAST searches. This type of approach is implemented in the program MEGAN4.^[37] The second, ab initio, uses intrinsic features of the sequence to predict coding regions based upon gene training sets from related organisms. This is the approach taken by programs such as GeneMark^[38] and GLIMMER. The main advantage of ab initio prediction is that it enables the detection of coding regions that lack homologs in the sequence databases; however, it is most accurate when there are large regions of contiguous genomic DNA available for comparison.^[25]

Species diversity[edit]

A 2016 representation of the tree of life^[39]

Gene annotations provide the 'what', while measurements of species diversity provide the 'who'.^[40] In order to connect community composition and function in metagenomes, sequences must be binned. Binning is the process of associating a particular sequence with an organism.^[35] In similarity-based binning, methods such as BLAST are used to rapidly search for phylogenetic markers or otherwise similar sequences in existing public databases. This approach is implemented in MEGAN.^[41] Another tool, PhymmBL, uses interpolated Markov models to assign reads.^[25]MetaPhlAn and AMPHORA are methods based on unique clade-specific markers for estimating organismal relative abundances with improved computational performances.^[42] Other tools, like mOTUs^[43]^[44] and MetaPhyler^[45], use universal marker genes to profile prokaryotic species. With the mOTUs profiler is possible to profile species without a reference genome, improving the estimation of microbial community diversity.^[44] Recent methods, such as SLIMM, use read coverage landscape of individual reference genomes to minimize false-positive hits and get reliable relative abundances.^[46] In composition based binning, methods use intrinsic features of the sequence, such as oligonucleotide frequencies or codon usage bias.^[25] Once sequences are binned, it is possible to carry out comparative analysis of diversity and richness.

Data integration[edit]

The massive amount of exponentially growing sequence data is a daunting challenge that is complicated by the complexity of the metadata associated with metagenomic projects. Metadata includes detailed information about the three-dimensional (including depth, or height) geography and environmental features of the sample, physical data about the sample site, and the methodology of the sampling.^[29] This information is necessary both to ensure replicability and to enable downstream analysis. Because of its importance, metadata and collaborative data review and curation require standardized data formats located in specialized databases, such as the Genomes OnLine Database (GOLD).^[47]

Several tools have been developed to integrate metadata and sequence data, allowing downstream comparative analyses of different datasets using a number of ecological indices. In 2007, Folker Meyer and Robert Edwards and a team at Argonne National Laboratory and the University of Chicago released the Metagenomics Rapid Annotation using Subsystem Technology server (MG-RAST) a community resource for metagenome data set analysis.^[48] As of June 2012 over 14.8 terabases (14x10¹² bases) of DNA have been analyzed, with more than 10,000 public data sets freely available for comparison within MG-RAST. Over 8,000 users now have submitted a total of 50,000 metagenomes to MG-RAST. The Integrated Microbial Genomes/Metagenomes (IMG/M) system also provides a collection of tools for functional analysis of microbial communities based on their metagenome sequence, based upon reference isolate genomes included from the Integrated Microbial Genomes (IMG) system and the Genomic Encyclopedia of Bacteria and Archaea (GEBA) project.^[49]

One of the first standalone tools for analysing high-throughput metagenome shotgun data was MEGAN (MEta Genome ANalyzer).^[37]^[41] A first version of the program was used in 2005 to analyse the metagenomic context of DNA sequences obtained from a mammoth bone.^[17] Based on a BLAST comparison against a reference database, this tool performs both taxonomic and functional binning, by placing the reads onto the nodes of the NCBI taxonomy using a simple lowest common ancestor (LCA) algorithm or onto the nodes of the SEED or KEGG classifications, respectively.^[50]

With the advent of fast and inexpensive sequencing instruments, the growth of databases of DNA sequences is now exponential (e.g., the NCBI GenBank database ^[51]). Faster and efficient tools are needed to keep pace with the high-throughput sequencing, because the BLAST-based approaches such as MG-RAST or MEGAN run slowly to annotate large samples (e.g., several hours to process a small/medium size dataset/sample ^[52]). Thus, ultra-fast classifiers have recently emerged, thanks to more affordable powerful servers. These tools can perform the taxonomic annotation at extremely high speed, for example CLARK ^[53] (according to CLARK's authors, it can classify accurately '32 million metagenomic short reads per minute'). At such a speed, a very large dataset/sample of a billion short reads can be processed in about 30 minutes.

With the increasing availability of samples containing ancient DNA and due to the uncertainty associated with the nature of those samples (ancient DNA damage), FALCON,^[54] a fast tool capable of producing conservative similarity estimates has been made available. According to FALCON's authors, it can use relaxed thresholds and edit distances without affecting the memory and speed performance.

Comparative metagenomics[edit]

Comparative analyses between metagenomes can provide additional insight into the function of complex microbial communities and their role in host health.^[55] Pairwise or multiple comparisons between metagenomes can be made at the level of sequence composition (comparing GC-content or genome size), taxonomic diversity, or functional complement. Comparisons of population structure and phylogenetic diversity can be made on the basis of 16S and other phylogenetic marker genes, or—in the case of low-diversity communities—by genome reconstruction from the metagenomic dataset.^[56] Functional comparisons between metagenomes may be made by comparing sequences against reference databases such as COG or KEGG, and tabulating the abundance by category and evaluating any differences for statistical significance.^[50] This gene-centric approach emphasizes the functional complement of the community as a whole rather than taxonomic groups, and shows that the functional complements are analogous under similar environmental conditions.^[56] Consequently, metadata on the environmental context of the metagenomic sample is especially important in comparative analyses, as it provides researchers with the ability to study the effect of habitat upon community structure and function.^[25]

Additionally, several studies have also utilized oligonucleotide usage patterns to identify the differences across diverse microbial communities. Examples of such methodologies include the dinucleotide relative abundance approach by Willner et al.^[57] and the HabiSign approach of Ghosh et al.^[58] This latter study also indicated that differences in tetranucleotide usage patterns can be used to identify genes (or metagenomic reads) originating from specific habitats. Additionally some methods as TriageTools^[59] or Compareads^[60] detect similar reads between two read sets. The similarity measure they apply on reads is based on a number of identical words of length k shared by pairs of reads.

A key goal in comparative metagenomics is to identify microbial group(s) which are responsible for conferring specific characteristics to a given environment. However, due to issues in the sequencing technologies artifacts need to be accounted for like in metagenomeSeq.^[28] Others have characterized inter-microbial interactions between the resident microbial groups. A GUI-based comparative metagenomic analysis application called Community-Analyzer has been developed by Kuntal et al. ^[61] which implements a correlation-based graph layout algorithm that not only facilitates a quick visualization of the differences in the analyzed microbial communities (in terms of their taxonomic composition), but also provides insights into the inherent inter-microbial interactions occurring therein. Notably, this layout algorithm also enables grouping of the metagenomes based on the probable inter-microbial interaction patterns rather than simply comparing abundance values of various taxonomic groups. In addition, the tool implements several interactive GUI-based functionalities that enable users to perform standard comparative analyses across microbiomes.

Data analysis[edit]

Community metabolism[edit]

In many bacterial communities, natural or engineered (such as bioreactors), there is significant division of labor in metabolism (Syntrophy), during which the waste products of some organisms are metabolites for others.^[62] In one such system, the methanogenic bioreactor, functional stability requires the presence of several syntrophic species (Syntrophobacterales and Synergistia) working together in order to turn raw resources into fully metabolized waste (methane).^[63] Using comparative gene studies and expression experiments with microarrays or proteomics researchers can piece together a metabolic network that goes beyond species boundaries. Such studies require detailed knowledge about which versions of which proteins are coded by which species and even by which strains of which species. Therefore, community genomic information is another fundamental tool (with metabolomics and proteomics) in the quest to determine how metabolites are transferred and transformed by a community.^[64]

Metatranscriptomics[edit]

Metagenomics allows researchers to access the functional and metabolic diversity of microbial communities, but it cannot show which of these processes are active.^[56] The extraction and analysis of metagenomic mRNA (the metatranscriptome) provides information on the regulation and expression profiles of complex communities. Because of the technical difficulties (the short half-life of mRNA, for example) in the collection of environmental RNA there have been relatively few in situ metatranscriptomic studies of microbial communities to date.^[56] While originally limited to microarray technology, metatranscriptomics studies have made use of transcriptomics technologies to measure whole-genome expression and quantification of a microbial community,^[56] first employed in analysis of ammonia oxidation in soils.^[65]

Viruses[edit]

Metagenomic sequencing is particularly useful in the study of viral communities. As viruses lack a shared universal phylogenetic marker (as 16S RNA for bacteria and archaea, and 18S RNA for eukarya), the only way to access the genetic diversity of the viral community from an environmental sample is through metagenomics. Viral metagenomes (also called viromes) should thus provide more and more information about viral diversity and evolution ^[66]^[67]^[68].^[69]^[70] For example, a metagenomic pipeline called Giant Virus Finder showed the first evidence of existence of giant viruses in a saline desert ^[71] and in Antarctic dry valleys .^[72]

Applications[edit]

Metagenomics has the potential to advance knowledge in a wide variety of fields. It can also be applied to solve practical challenges in medicine, engineering, agriculture, sustainability and ecology.^[29]

Agriculture[edit]

The soils in which plants grow are inhabited by microbial communities, with one gram of soil containing around 10⁹-10¹⁰ microbial cells which comprise about one gigabase of sequence information.^[73]^[74] The microbial communities which inhabit soils are some of the most complex known to science, and remain poorly understood despite their economic importance.^[75] Microbial consortia perform a wide variety of ecosystem services necessary for plant growth, including fixing atmospheric nitrogen, nutrient cycling, disease suppression, and sequesteriron and other metals.^[76] Functional metagenomics strategies are being used to explore the interactions between plants and microbes through cultivation-independent study of these microbial communities.^[77]^[78] By allowing insights into the role of previously uncultivated or rare community members in nutrient cycling and the promotion of plant growth, metagenomic approaches can contribute to improved disease detection in crops and livestock and the adaptation of enhanced farming practices which improve crop health by harnessing the relationship between microbes and plants.^[29]

Biofuel[edit]

Bioreactors allow the observation of microbial communities as they convert biomass into cellulosic ethanol.

Biofuels are fuels derived from biomass conversion, as in the conversion of cellulose contained in corn stalks, switchgrass, and other biomass into cellulosic ethanol.^[29] This process is dependent upon microbial consortia(association) that transform the cellulose into sugars, followed by the fermentation of the sugars into ethanol. Microbes also produce a variety of sources of bioenergy including methane and hydrogen.^[29]

The efficient industrial-scale deconstruction of biomass requires novel enzymes with higher productivity and lower cost.^[26] Metagenomic approaches to the analysis of complex microbial communities allow the targeted screening of enzymes with industrial applications in biofuel production, such as glycoside hydrolases.^[79] Furthermore, knowledge of how these microbial communities function is required to control them, and metagenomics is a key tool in their understanding. Metagenomic approaches allow comparative analyses between convergent microbial systems like biogas fermenters^[80] or insectherbivores such as the fungus garden of the leafcutter ants.^[81]

Biotechnology[edit]

Microbial communities produce a vast array of biologically active chemicals that are used in competition and communication.^[76] Many of the drugs in use today were originally uncovered in microbes; recent progress in mining the rich genetic resource of non-culturable microbes has led to the discovery of new genes, enzymes, and natural products.^[56]^[82] The application of metagenomics has allowed the development of commodity and fine chemicals, agrochemicals and pharmaceuticals where the benefit of enzyme-catalyzedchiral synthesis is increasingly recognized.^[83]

Metagenomics Methods And Protocols Pdf Format

Two types of analysis are used in the bioprospecting of metagenomic data: function-driven screening for an expressed trait, and sequence-driven screening for DNA sequences of interest.^[84] Function-driven analysis seeks to identify clones expressing a desired trait or useful activity, followed by biochemical characterization and sequence analysis. This approach is limited by availability of a suitable screen and the requirement that the desired trait be expressed in the host cell. Moreover, the low rate of discovery (less than one per 1,000 clones screened) and its labor-intensive nature further limit this approach.^[85] In contrast, sequence-driven analysis uses conserved DNA sequences to design PCR primers to screen clones for the sequence of interest.^[84] In comparison to cloning-based approaches, using a sequence-only approach further reduces the amount of bench work required. The application of massively parallel sequencing also greatly increases the amount of sequence data generated, which require high-throughput bioinformatic analysis pipelines.^[85] The sequence-driven approach to screening is limited by the breadth and accuracy of gene functions present in public sequence databases. In practice, experiments make use of a combination of both functional and sequence-based approaches based upon the function of interest, the complexity of the sample to be screened, and other factors.^[85]^[86] An example of success using metagenomics as a biotechnology for drug discovery is illustrated with the malacidin antibiotics.^[87]

Ecology[edit]

Metagenomics can provide valuable insights into the functional ecology of environmental communities.^[88] Metagenomic analysis of the bacterial consortia found in the defecations of Australian sea lions suggests that nutrient-rich sea lion faeces may be an important nutrient source for coastal ecosystems. This is because the bacteria that are expelled simultaneously with the defecations are adept at breaking down the nutrients in the faeces into a bioavailable form that can be taken up into the food chain.^[89]

DNA sequencing can also be used more broadly to identify species present in a body of water,^[90] debris filtered from the air, or sample of dirt. This can establish the range of invasive species and endangered species, and track seasonal populations.

Environmental remediation[edit]

Metagenomics can improve strategies for monitoring the impact of pollutants on ecosystems and for cleaning up contaminated environments. Increased understanding of how microbial communities cope with pollutants improves assessments of the potential of contaminated sites to recover from pollution and increases the chances of bioaugmentation or biostimulation trials to succeed.^[91]

Gut Microbe Characterization[edit]

Microbial communities play a key role in preserving human health, but their composition and the mechanism by which they do so remains mysterious.^[92] Metagenomic sequencing is being used to characterize the microbial communities from 15-18 body sites from at least 250 individuals. This is part of the Human Microbiome initiative with primary goals to determine if there is a core human microbiome, to understand the changes in the human microbiome that can be correlated with human health, and to develop new technological and bioinformatics tools to support these goals.^[93]

Another medical study as part of the MetaHit (Metagenomics of the Human Intestinal Tract) project consisted of 124 individuals from Denmark and Spain consisting of healthy, overweight, and irritable bowel disease patients. The study attempted to categorize the depth and phylogenetic diversity of gastrointestinal bacteria. Using Illumina GA sequence data and SOAPdenovo, a de Bruijn graph-based tool specifically designed for assembly short reads, they were able to generate 6.58 million contigs greater than 500 bp for a total contig length of 10.3 Gb and a N50 length of 2.2 kb.

The study demonstrated that two bacterial divisions, Bacteroidetes and Firmicutes, constitute over 90% of the known phylogenetic categories that dominate distal gut bacteria. Using the relative gene frequencies found within the gut these researchers identified 1,244 metagenomic clusters that are critically important for the health of the intestinal tract. There are two types of functions in these range clusters: housekeeping and those specific to the intestine. The housekeeping gene clusters are required in all bacteria and are often major players in the main metabolic pathways including central carbon metabolism and amino acid synthesis. The gut-specific functions include adhesion to host proteins and the harvesting of sugars from globoseries glycolipids. Patients with irritable bowel syndrome were shown to exhibit 25% fewer genes and lower bacterial diversity than individuals not suffering from irritable bowel syndrome indicating that changes in patients’ gut biome diversity may be associated with this condition.

While these studies highlight some potentially valuable medical applications, only 31-48.8% of the reads could be aligned to 194 public human gut bacterial genomes and 7.6-21.2% to bacterial genomes available in GenBank which indicates that there is still far more research necessary to capture novel bacterial genomes.^[94]

Infectious disease diagnosis[edit]

Differentiating between infectious and non-infectious illness, and identifying the underlying etiology of infection, can be quite challenging. For example, more than half of cases of encephalitis remain undiagnosed, despite extensive testing using state-of-the-art clinical laboratory methods. Metagenomic sequencing shows promise as a sensitive and rapid method to diagnose infection by comparing genetic material found in a patient's sample to a database of thousands of bacteria, viruses, and other pathogens

References[edit]

^ ^a^bHugenholz, P; Goebel BM; Pace NR (1 September 1998). 'Impact of Culture-Independent Studies on the Emerging Phylogenetic View of Bacterial Diversity'. J. Bacteriol. 180 (18): 4765–74. PMC107498. PMID9733676.
^Marco, D, ed. (2011). Metagenomics: Current Innovations and Future Trends. Caister Academic Press. ISBN978-1-904455-87-5.
^Eisen, JA (2007). 'Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes'. PLoS Biology. 5 (3): e82. doi:10.1371/journal.pbio.0050082. PMC1821061. PMID17355177.
^Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998). 'Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products'. Chemistry & Biology. 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9. PMID9818143..
^Chen, K.; Pachter, L. (2005). 'Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities'. PLoS Computational Biology. 1 (2): 106–12. Bibcode:2005PLSCB..1..24C. doi:10.1371/journal.pcbi.0010024. PMC1185649. PMID16110337.
^Lane, DJ; Pace B; Olsen GJ; Stahl DA; Sogin ML; Pace NR (1985). 'Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses'. Proceedings of the National Academy of Sciences. 82 (20): 6955–9. Bibcode:1985PNAS..82.6955L. doi:10.1073/pnas.82.20.6955. PMC391288. PMID2413450.
^Pace, NR; DA Stahl; DJ Lane; GJ Olsen (1985). 'Analyzing natural microbial populations by rRNA sequences'. ASM News. 51: 4–12. Archived from the original on 4 April 2012.
^Pace, NR; Delong, EF; Pace, NR (1991). 'Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing'. Journal of Bacteriology. 173 (14): 4371–4378. doi:10.1128/jb.173.14.4371-4378.1991. PMC208098. PMID2066334.
^Healy, FG; RM Ray; HC Aldrich; AC Wilkie; LO Ingram; KT Shanmugam (1995). 'Direct isolation of functional genes encoding cellulases from the microbial consortia in a thermophilic, anaerobic digester maintained on lignocellulose'. Appl. Microbiol. Biotechnol. 43 (4): 667–74. doi:10.1007/BF00164771. PMID7546604.
^Stein, JL; TL Marsh; KY Wu; H Shizuya; EF DeLong (1996). 'Characterization of uncultivated prokaryotes: isolation and analysis of a 40-kilobase-pair genome fragment from a planktonic marine archaeon'. Journal of Bacteriology. 178 (3): 591–599. doi:10.1128/jb.178.3.591-599.1996. PMC177699. PMID8550487.
^Breitbart, M; Salamon P; Andresen B; Mahaffy JM; Segall AM; Mead D; Azam F; Rohwer F (2002). 'Genomic analysis of uncultured marine viral communities'. Proceedings of the National Academy of Sciences of the United States of America. 99 (22): 14250–14255. Bibcode:2002PNAS..9914250B. doi:10.1073/pnas.202488399. PMC137870. PMID12384570.
^ ^a^bTyson, GW; Chapman J; Hugenholtz P; Allen EE; Ram RJ; Richardson PM; Solovyev VV; Rubin EM; Rokhsar DS; Banfield JF (2004). 'Insights into community structure and metabolism by reconstruction of microbial genomes from the environment'. Nature. 428 (6978): 37–43. Bibcode:2004Natur.428..37T. doi:10.1038/nature02340. PMID14961025.(subscription required)
^Hugenholz, P (2002). 'Exploring prokaryotic diversity in the genomic era'. Genome Biology. 3 (2): 1–8. doi:10.1186/gb-2002-3-2-reviews0003. PMC139013. PMID11864374.
^Thomas, T.; Gilbert, J.; Meyer, F. (2012). 'Metagenomics - a guide from sampling to data analysis'. Microbial Informatics and Experimentation. 2 (1): 3. doi:10.1186/2042-5783-2-3. PMC3351745. PMID22587947.
^Venter, JC; Remington K; Heidelberg JF; Halpern AL; Rusch D; Eisen JA; Wu D; Paulsen I; Nelson KE; Nelson W; Fouts DE; Levy S; Knap AH; Lomas MW; Nealson K; White O; Peterson J; Hoffman J; Parsons R; Baden-Tillson H; Pfannkoch C; Rogers Y; Smith HO (2004). 'Environmental Genome Shotgun Sequencing of the Sargasso Sea'. Science. 304 (5667): 66–74. Bibcode:2004Sci..304..66V. CiteSeerX10.1.1.124.1840. doi:10.1126/science.1093857. PMID15001713.
^Yooseph, Shibu; Kenneth H. Nealson; Douglas B. Rusch; John P. McCrow; Christopher L. Dupont; Maria Kim; Justin Johnson; Robert Montgomery; Steve Ferriera; Karen Beeson; Shannon J. Williamson; Andrey Tovchigrechko; Andrew E. Allen; Lisa A. Zeigler; Granger Sutton; Eric Eisenstadt; Yu-Hui Rogers; Robert Friedman; Marvin Frazier; J. Craig Venter (4 November 2010). 'Genomic and functional adaptation in surface ocean planktonic prokaryotes'. Nature. 468 (7320): 60–66. Bibcode:2010Natur.468..60Y. doi:10.1038/nature09530. ISSN0028-0836. PMID21048761.(subscription required)
^ ^a^b^cPoinar, HN; Schwarz, C; Qi, J; Shapiro, B; Macphee, RD; Buigues, B; Tikhonov, A; Huson, D; Tomsho, LP; Auch, A; Rampp, M; Miller, W; Schuster, SC (2006). 'Metagenomics to Paleogenomics: Large-Scale Sequencing of Mammoth DNA'. Science. 311 (5759): 392–394. Bibcode:2006Sci..311.392P. doi:10.1126/science.1123360. PMID16368896.
^Edwards, RA; Rodriguez-Brito B; Wegley L; Haynes M; Breitbart M; Peterson DM; Saar MO; Alexander S; Alexander EC; Rohwer F (2006). 'Using pyrosequencing to shed light on deep mine microbial ecology'. BMC Genomics. 7: 57. doi:10.1186/1471-2164-7-57. PMC1483832. PMID16549033.
^Beja, O.; Suzuki, MT; Koonin, EV; Aravind, L; Hadd, A; Nguyen, LP; Villacorta, R; Amjadi, M; Garrigues, C (2000). 'Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage'. Environmental Microbiology. 2 (5): 516–29. doi:10.1046/j.1462-2920.2000.00133.x. PMID11233160.
^ ^a^bNicola, Segata; Daniela Boernigen; Timothy L Tickle; Xochitl C Morgan; Wendy S Garrett; Curtis Huttenhower (2013). 'Computational meta'omics for microbial community studies'. Molecular Systems Biology. 9 (666): 666. doi:10.1038/msb.2013.22. PMC4039370. PMID23670539.
^Watson, Mick; Roehe, Rainer; Walker, Alan W.; Dewhurst, Richard J.; Snelling, Timothy J.; Ivan Liachko; Langford, Kyle W.; Press, Maximilian O.; Wiser, Andrew H. (28 February 2018). 'Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen'. Nature Communications. 9 (1): 870. doi:10.1038/s41467-018-03317-6. ISSN2041-1723. PMC5830445. PMID29491419.
^Rodrigue, S. B.; Materna, A. C.; Timberlake, S. C.; Blackburn, M. C.; Malmstrom, R. R.; Alm, E. J.; Chisholm, S. W. (2010). Gilbert, Jack Anthony (ed.). 'Unlocking Short Read Sequencing for Metagenomics'. PLoS ONE. 5 (7): e11840. Bibcode:2010PLoSO..511840R. doi:10.1371/journal.pone.0011840. PMC2911387. PMID20676378.
^Schuster, S. C. (2007). 'Next-generation sequencing transforms today's biology'. Nature Methods. 5 (1): 16–18. doi:10.1038/nmeth1156. PMID18165802.
^'Metagenomics versus Moore's law'. Nature Methods. 6 (9): 623. 2009. doi:10.1038/nmeth0909-623.
^ ^a^b^c^d^e^fWooley, J. C.; Godzik, A.; Friedberg, I. (2010). Bourne, Philip E. (ed.). 'A Primer on Metagenomics'. PLoS Computational Biology. 6 (2): e1000667. Bibcode:2010PLSCB..6E0667W. doi:10.1371/journal.pcbi.1000667. PMC2829047. PMID20195499.
^ ^a^bHess, Matthias; Alexander Sczyrba; Rob Egan; Tae-Wan Kim; Harshal Chokhawala; Gary Schroth; Shujun Luo; Douglas S Clark; Feng Chen; Tao Zhang; Roderick I Mackie; Len A Pennacchio; Susannah G Tringe; Axel Visel; Tanja Woyke; Zhong Wang; Edward M Rubin (28 January 2011). 'Metagenomic discovery of biomass-degrading genes and genomes from cow rumen'. Science. 331 (6016): 463–467. Bibcode:2011Sci..331.463H. doi:10.1126/science.1200387. ISSN1095-9203. PMID21273488.
^Qin, Junjie; Ruiqiang Li; Jeroen Raes; Manimozhiyan Arumugam; Kristoffer Solvsten Burgdorf; Chaysavanh Manichanh; Trine Nielsen; Nicolas Pons; Florence Levenez; Takuji Yamada; Daniel R. Mende; Junhua Li; Junming Xu; Shaochuan Li; Dongfang Li; Jianjun Cao; Bo Wang; Huiqing Liang; Huisong Zheng; Yinlong Xie; Julien Tap; Patricia Lepage; Marcelo Bertalan; Jean-Michel Batto; Torben Hansen; Denis Le Paslier; Allan Linneberg; H. Bjorn Nielsen; Eric Pelletier; Pierre Renault; Thomas Sicheritz-Ponten; Keith Turner; Hongmei Zhu; Chang Yu; Shengting Li; Min Jian; Yan Zhou; Yingrui Li; Xiuqing Zhang; Songgang Li; Nan Qin; Huanming Yang; Jian Wang; Soren Brunak; Joel Dore; Francisco Guarner; Karsten Kristiansen; Oluf Pedersen; Julian Parkhill; Jean Weissenbach; Peer Bork; S. Dusko Ehrlich; Jun Wang (4 March 2010). 'A human gut microbial gene catalogue established by metagenomic sequencing'. Nature. 464 (7285): 59–65. Bibcode:2010Natur.464..59. doi:10.1038/nature08821. ISSN0028-0836. PMC3779803. PMID20203603.(subscription required)
^ ^a^bPaulson, Joseph; O Colin Stine; Hector Corrada Bravo; Mihai Pop (2013). 'Differential abundance analysis for microbial marker-gene surveys'. Nature Methods. 10 (12): 1200–1202. doi:10.1038/nmeth.2658. PMC4010126. PMID24076764.
^ ^a^b^c^d^e^f^gCommittee on Metagenomics: Challenges and Functional Applications, National Research Council (2007). The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, D.C.: The National Academies Press. doi:10.17226/11902. ISBN978-0-309-10676-4. PMID21678629.
^Oulas, A; Pavloudi, C; Polymenakou, P; Pavlopoulos, GA; Papanikolaou, N; Kotoulas, G; Arvanitidis, C; Iliopoulos, I (2015). 'Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies'. Bioinformatics and Biology Insights. 9: 75–88. doi:10.4137/BBI.S12462. PMC4426941. PMID25983555.
^Mende, Daniel R.; Alison S. Waller; Shinichi Sunagawa; Aino I. Järvelin; Michelle M. Chan; Manimozhiyan Arumugam; Jeroen Raes; Peer Bork (23 February 2012). 'Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data'. PLoS ONE. 7 (2): e31386. Bibcode:2012PLoSO..731386M. doi:10.1371/journal.pone.0031386. ISSN1932-6203. PMC3285633. PMID22384016.
^Balzer, S.; Malde, K.; Grohme, M. A.; Jonassen, I. (2013). 'Filtering duplicate reads from 454 pyrosequencing data'. Bioinformatics. 29 (7): 830–836. doi:10.1093/bioinformatics/btt047. PMC3605598. PMID23376350.
^Mohammed, MH; Sudha Chadaram; Dinakar Komanduri; Tarini Shankar Ghosh; Sharmila S Mande (2011). 'Eu-Detect: an algorithm for detecting eukaryotic sequences in metagenomic data sets'. Journal of Biosciences. 36 (4): 709–717. doi:10.1007/s12038-011-9105-2. PMID21857117.
^R, Schmeider; R Edwards (2011). 'Fast identification and removal of sequence contamination from genomic and metagenomic datasets'. PLoS ONE. 6 (3): e17288. Bibcode:2011PLoSO..617288S. doi:10.1371/journal.pone.0017288. PMC3052304. PMID21408061.
^ ^a^b^c^d^eKunin, V.; Copeland, A.; Lapidus, A.; Mavromatis, K.; Hugenholtz, P. (2008). 'A Bioinformatician's Guide to Metagenomics'. Microbiology and Molecular Biology Reviews. 72 (4): 557–578, Table 578 Contents. doi:10.1128/MMBR.00009-08. PMC2593568. PMID19052320.
^Burton, J. N.; Liachko, I.; Dunham, M. J.; Shendure, J. (2014). 'Species-Level Deconvolution of Metagenome Assemblies with Hi-C-Based Contact Probability Maps'. G3: Genes, Genomes, Genetics. 4 (7): 1339–1346. doi:10.1534/g3.114.011825. PMC4455782. PMID24855317.
^ ^a^bHuson, Daniel H; S. Mitra; N. Weber; H. Ruscheweyh; Stephan C. Schuster (June 2011). 'Integrative analysis of environmental sequences using MEGAN4'. Genome Research. 21 (9): 1552–1560. doi:10.1101/gr.120618.111. PMC3166839. PMID21690186.
^Zhu, Wenhan; Lomsadze Alex; Borodovsky Mark (2010). 'Ab initio gene identification in metagenomic sequences'. Nucleic Acids Research. 38 (12): e132. doi:10.1093/nar/gkq275. PMC2896542. PMID20403810.
^Hug, Laura A.; Baker, Brett J.; Anantharaman, Karthik; Brown, Christopher T.; Probst, Alexander J.; Castelle, Cindy J.; Butterfield, Cristina N.; Hernsdorf, Alex W.; Amano, Yuki; Ise, Kotaro; Suzuki, Yohey; Dudek, Natasha; Relman, David A.; Finstad, Kari M.; Amundson, Ronald; Thomas, Brian C.; Banfield, Jillian F. (11 April 2016). 'A new view of the tree of life'. Nature Microbiology. 1 (5): 16048. doi:10.1038/nmicrobiol.2016.48. PMID27572647.
^Konopka, A. (2009). 'What is microbial community ecology?'. The ISME Journal. 3 (11): 1223–1230. doi:10.1038/ismej.2009.88. PMID19657372.
^ ^a^bHuson, Daniel H; A. Auch; Ji Qi; Stephan C Schuster (January 2007). 'MEGAN Analysis of Metagenomic Data'. Genome Research. 17 (3): 377–386. doi:10.1101/gr.5969107. PMC1800929. PMID17255551.
^Nicola, Segata; Levi Waldron; Annalisa Ballarini; Vagheesh Narasimhan; Olivier Jousson; Curtis Huttenhower (2012). 'Metagenomic microbial community profiling using unique clade-specific marker genes'. Nature Methods. 9 (8): 811–814. doi:10.1038/nmeth.2066. PMC3443552. PMID22688413.
^Sunagawa, Shinichi; et al. (2013). 'Metagenomic species profiling using universal phylogenetic marker genes'. Nature Methods. 10 (12): 1196–1199. doi:10.1038/nmeth.2693. PMID24141494.
^ ^a^bMilanese, Alessio; et al. (2019). 'Microbial abundance, activity and population genomic profiling with mOTUs2'. Nature Communications. 10 (1): 1014. doi:10.1038/s41467-019-08844-4. PMC6399450. PMID30833550.
^Liu, Bo; et al. (2011). 'Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences'. BMC Genomics. 12: S4. doi:10.1186/1471-2164-12-S2-S4. PMC3194235. PMID21989143.
^Dadi, Temesgen Hailemariam; Renard, Bernhard Y.; Wieler, Lothar H.; Semmler, Torsten; Reinert, Knut (2017). 'SLIMM: species level identification of microorganisms from metagenomes'. PeerJ. 5: e3138. doi:10.7717/peerj.3138. ISSN2167-8359. PMC5372838. PMID28367376.
^Pagani, Ioanna; Konstantinos Liolios; Jakob Jansson; I-Min A Chen; Tatyana Smirnova; Bahador Nosrat; Victor M Markowitz; Nikos C Kyrpides (1 December 2011). 'The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata'. Nucleic Acids Research. 40 (1): D571–9. doi:10.1093/nar/gkr1100. ISSN1362-4962. PMC3245063. PMID22135293.
^Meyer, F; Paarmann D; D'Souza M; Olson R; Glass EM; Kubal M; Paczian T; Rodriguez A; Stevens R; Wilke A; Wilkening J; Edwards RA (2008). 'The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes'. BMC Bioinformatics. 9: 0. doi:10.1186/1471-2105-9-386. PMC2563014. PMID18803844.
^Markowitz, V. M.; Chen, I. -M. A.; Chu, K.; Szeto, E.; Palaniappan, K.; Grechkin, Y.; Ratner, A.; Jacob, B.; Pati, A.; Huntemann, M.; Liolios, K.; Pagani, I.; Anderson, I.; Mavromatis, K.; Ivanova, N. N.; Kyrpides, N. C. (2011). 'IMG/M: The integrated metagenome data management and comparative analysis system'. Nucleic Acids Research. 40 (Database issue): D123–D129. doi:10.1093/nar/gkr975. PMC3245048. PMID22086953.
^ ^a^bMitra, Suparna; Paul Rupek; Daniel C Richter; Tim Urich; Jack A Gilbert; Folker Meyer; Andreas Wilke; Daniel H Huson (2011). 'Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG'. BMC Bioinformatics. 12 Suppl 1: S21. doi:10.1186/1471-2105-12-S1-S21. ISSN1471-2105. PMC3044276. PMID21342551.
^Benson, Dennis; Mark Cavanaugh; Karen Clark; et al. (2013). 'Genbank'. Nucleic Acids Research. 41 (Database issue): D36–D42. doi:10.1093/nar/gks1195. PMC3531190. PMID23193287.
^Bazinet, Adam; Michael Cummings (2012). 'A comparative evaluation of sequence classification programs'. BMC Bioinformatics. 13: 92. doi:10.1186/1471-2105-13-92. PMC3428669. PMID22574964.
^Ounit, Rachid; Steve Wanamaker; Timothy Close; Stefano Lonardi (2015). 'CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers'. BMC Genomics. 16: 236. doi:10.1186/s12864-015-1419-2. PMC4428112. PMID25879410.
^Pratas D; Pinho AJ; Silva RM; Rodrigues JMOS; Hosseini M; Caetano T; Ferreira PJSG (February 2018). 'FALCON: a method to infer metagenomic composition of ancient DNA'. bioRxiv267179.
^Kurokawa, Ken; Takehiko Itoh; Tomomi Kuwahara; Kenshiro Oshima; Hidehiro Toh; Atsushi Toyoda; Hideto Takami; Hidetoshi Morita; Vineet K. Sharma; Tulika P. Srivastava; Todd D. Taylor; Hideki Noguchi; Hiroshi Mori; Yoshitoshi Ogura; Dusko S. Ehrlich; Kikuji Itoh; Toshihisa Takagi; Yoshiyuki Sakaki; Tetsuya Hayashi; Masahira Hattori (1 January 2007). 'Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes'. DNA Research. 14 (4): 169–181. doi:10.1093/dnares/dsm018. PMC2533590. PMID17916580. Retrieved 18 December 2011.
^ ^a^b^c^d^e^fSimon, C.; Daniel, R. (2010). 'Metagenomic Analyses: Past and Future Trends'. Applied and Environmental Microbiology. 77 (4): 1153–1161. doi:10.1128/AEM.02345-10. PMC3067235. PMID21169428.
^Willner, D; RV Thurber; F Rohwer (2009). 'Metagenomic signatures of 86 microbial and viral metagenomes'. Environmental Microbiology. 11 (7): 1752–66. doi:10.1111/j.1462-2920.2009.01901.x. PMID19302541.
^Ghosh, Tarini Shankar; Monzoorul Haque Mohammed; Hannah Rajasingh; Sudha Chadaram; Sharmila S Mande (2011). 'HabiSign: a novel approach for comparison of metagenomes and rapid identification of habitat-specific sequences'. BMC Bioinformatics. 12 (Supplement 13): S9. doi:10.1186/1471-2105-12-s13-s9. PMC3278849. PMID22373355.
^Fimereli, D.; Detours, V.; Konopka, T. (13 February 2013). 'TriageTools: tools for partitioning and prioritizing analysis of high-throughput sequencing data'. Nucleic Acids Research. 41 (7): e86. doi:10.1093/nar/gkt094. PMC3627586. PMID23408855.
^Maillet, Nicolas; Lemaitre, Claire; Chikhi, Rayan; Lavenier, Dominique; Peterlongo, Pierre (2012). 'Compareads: comparing huge metagenomic experiments'. BMC Bioinformatics. 13 (Suppl 19): S10. doi:10.1186/1471-2105-13-S19-S10. PMC3526429. PMID23282463.
^Bhusan, Kuntal Kumar; Tarini Shankar Ghosh; Sharmila S Mande (2013). 'Community-analyzer: a platform for visualizing and comparing microbial community structure across microbiomes'. Genomics. 102 (4): 409–418. doi:10.1016/j.ygeno.2013.08.004. PMID23978768.
^Werner, Jeffrey J.; Dan Knights; Marcelo L. Garcia; Nicholas B. Scalfone; Samual Smith; Kevin Yarasheski; Theresa A. Cummings; Allen R. Beers; Rob Knight; Largus T. Angenent (8 March 2011). 'Bacterial community structures are unique and resilient in full-scale bioenergy systems'. Proceedings of the National Academy of Sciences of the United States of America. 108 (10): 4158–4163. Bibcode:2011PNAS.108.4158W. doi:10.1073/pnas.1015676108. ISSN0027-8424. PMC3053989. PMID21368115.
^McInerney, Michael J.; Jessica R. Sieber; Robert P. Gunsalus (December 2009). 'Syntrophy in Anaerobic Global Carbon Cycles'. Current Opinion in Biotechnology. 20 (6): 623–632. doi:10.1016/j.copbio.2009.10.001. ISSN0958-1669. PMC2790021. PMID19897353.
^Klitgord, N.; Segrè, D. (2011). 'Ecosystems biology of microbial metabolism'. Current Opinion in Biotechnology. 22 (4): 541–546. doi:10.1016/j.copbio.2011.04.018. PMID21592777.
^Leininger, S.; Urich, T.; Schloter, M.; Schwark, L.; Qi, J.; Nicol, G. W.; Prosser, J. I.; Schuster, S. C.; Schleper, C. (2006). 'Archaea predominate among ammonia-oxidizing prokaryotes in soils'. Nature. 442 (7104): 806–809. Bibcode:2006Natur.442.806L. doi:10.1038/nature04983. PMID16915287.
^Paez-Espino, D; Eloe-Fadrosh, EA; Pavlopoulos, GA; Thomas, AD; Huntemann, M; Mikhailova, N; Rubin, E; Ivanova, NN; Kyrpides, NC (25 August 2016). 'Uncovering Earth's virome'. Nature. 536 (7617): 425–30. Bibcode:2016Natur.536.425P. doi:10.1038/nature19094. PMID27533034.
^Paez-Espino, D; Chen, IA; Palaniappan, K; Ratner, A; Chu, K; Szeto, E; Pillay, M; Huang, J; Markowitz, VM; Nielsen, T; Huntemann, M; K Reddy, TB; Pavlopoulos, GA; Sullivan, MB; Campbell, BJ; Chen, F; McMahon, K; Hallam, SJ; Denef, V; Cavicchioli, R; Caffrey, SM; Streit, WR; Webster, J; Handley, KM; Salekdeh, GH; Tsesmetzis, N; Setubal, JC; Pope, PB; Liu, WT; Rivers, AR; Ivanova, NN; Kyrpides, NC (4 January 2017). 'IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses'. Nucleic Acids Research. 45 (D1): D457–D465. doi:10.1093/nar/gkw1030. PMC5210529. PMID27799466.
^Paez-Espino D, Roux S, Chen IA, Palaniappan K, Ratner A, Chu K, et al. (2018). 'IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes'. Nucleic Acids Res. 47 (D1): D678–D686. doi:10.1093/nar/gky1127. PMC6323928. PMID30407573.
^Paez-Espino, D; Pavlopoulos, GA; Ivanova, NN; Kyrpides, NC (August 2017). 'Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data'. Nature Protocols. 12 (8): 1673–1682. doi:10.1038/nprot.2017.063. PMID28749930.
^Kristensen, DM; Mushegian AR; Dolja VV; Koonin EV (2009). 'New dimensions of the virus world discovered through metagenomics'. Trends in Microbiology. 18 (1): 11–19. doi:10.1016/j.tim.2009.11.003. PMC3293453. PMID19942437.
^Kerepesi, Csaba; Grolmusz, Vince (2016). 'Giant Viruses of the Kutch Desert'. Archives of Virology. 161 (3): 721–724. arXiv:1410.1278. doi:10.1007/s00705-015-2720-8. PMID26666442.
^Kerepesi, Csaba; Grolmusz, Vince (2017). 'The 'Giant Virus Finder' Discovers an Abundance of Giant Viruses in the Antarctic Dry Valleys'. Archives of Virology. 162 (6): 1671–1676. arXiv:1503.05575. doi:10.1007/s00705-017-3286-4. PMID28247094.
^Jansson, Janet (2011). 'Towards 'Tera-Terra': Terabase Sequencing of Terrestrial Metagenomes Print E-mail'. Microbe. 6 (7). p. 309. Archived from the original on 31 March 2012.
^Vogel, T. M.; Simonet, P.; Jansson, J. K.; Hirsch, P. R.; Tiedje, J. M.; Van Elsas, J. D.; Bailey, M. J.; Nalin, R.; Philippot, L. (2009). 'TerraGenome: A consortium for the sequencing of a soil metagenome'. Nature Reviews Microbiology. 7 (4): 252. doi:10.1038/nrmicro2119.
^'TerraGenome Homepage'. TerraGenome international sequencing consortium. Retrieved 30 December 2011.
^ ^a^bCommittee on Metagenomics: Challenges and Functional Applications, National Research Council (2007). Understanding Our Microbial Planet: The New Science of Metagenomics(PDF). The National Academies Press.
^Charles T (2010). 'The Potential for Investigation of Plant-microbe Interactions Using Metagenomics Methods'. Metagenomics: Theory, Methods and Applications. Caister Academic Press. ISBN978-1-904455-54-7.
^Bringel, Françoise; Couée, Ivan (22 May 2015). 'Pivotal roles of phyllosphere microorganisms at the interface between plant functioning and atmospheric trace gas dynamics'. Frontiers in Microbiology. 6: 486. doi:10.3389/fmicb.2015.00486. PMC4440916. PMID26052316.
^Li, Luen-Luen; Sean R McCorkle; Sebastien Monchy; Safiyh Taghavi; Daniel van der Lelie (18 May 2009). 'Bioprospecting metagenomes: glycosyl hydrolases for converting biomass'. Biotechnology for Biofuels. 2: 10. doi:10.1186/1754-6834-2-10. ISSN1754-6834. PMC2694162. PMID19450243.
^Jaenicke, Sebastian; Christina Ander; Thomas Bekel; Regina Bisdorf; Marcus Dröge; Karl-Heinz Gartemann; Sebastian Jünemann; Olaf Kaiser; Lutz Krause; Felix Tille; Martha Zakrzewski; Alfred Pühler; Andreas Schlüter; Alexander Goesmann (26 January 2011). Aziz, Ramy K (ed.). 'Comparative and Joint Analysis of Two Metagenomic Datasets from a Biogas Fermenter Obtained by 454-Pyrosequencing'. PLoS ONE. 6 (1): e14519. Bibcode:2011PLoSO..614519J. doi:10.1371/journal.pone.0014519. PMC3027613. PMID21297863.
^Suen, Garret; Jarrod J Scott; Frank O Aylward; Sandra M Adams; Susannah G Tringe; Adrián A Pinto-Tomás; Clifton E Foster; Markus Pauly; Paul J Weimer; Kerrie W Barry; Lynne A Goodwin; Pascal Bouffard; Lewyn Li; Jolene Osterberger; Timothy T Harkins; Steven C Slater; Timothy J Donohue; Cameron R Currie (September 2010). Sonnenburg, Justin (ed.). 'An insect herbivore microbiome with high plant biomass-degrading capacity'. PLoS Genetics. 6 (9): e1001129. doi:10.1371/journal.pgen.1001129. ISSN1553-7404. PMC2944797. PMID20885794.
^Simon, C.; Daniel, R. (2009). 'Achievements and new knowledge unraveled by metagenomic approaches'. Applied Microbiology and Biotechnology. 85 (2): 265–276. doi:10.1007/s00253-009-2233-z. PMC2773367. PMID19760178.
^Wong D (2010). 'Applications of Metagenomics for Industrial Bioproducts'. Metagenomics: Theory, Methods and Applications. Caister Academic Press. ISBN978-1-904455-54-7.
^ ^a^bSchloss, Patrick D; Jo Handelsman (June 2003). 'Biotechnological prospects from metagenomics'(PDF). Current Opinion in Biotechnology. 14 (3): 303–310. doi:10.1016/S0958-1669(03)00067-3. ISSN0958-1669. PMID12849784. Retrieved 3 January 2012.
^ ^a^b^cKakirde, Kavita S.; Larissa C. Parsley; Mark R. Liles (1 November 2010). 'Size Does Matter: Application-driven Approaches for Soil Metagenomics'. Soil Biology & Biochemistry. 42 (11): 1911–1923. doi:10.1016/j.soilbio.2010.07.021. ISSN0038-0717. PMC2976544. PMID21076656.
^Parachin, Nádia Skorupa; Marie F Gorwa-Grauslund (2011). 'Isolation of xylose isomerases by sequence- and function-based screening from a soil metagenomic library'. Biotechnology for Biofuels. 4 (1): 9. doi:10.1186/1754-6834-4-9. ISSN1754-6834. PMC3113934. PMID21545702.
^Hover BM, Kim S, Katz M, Charlop-Powers Z, Owen JG, Ternei MA, et al. (12 February 2018). 'Culture-independent discovery of the malacidins as calcium-dependent antibiotics with activity against multidrug-resistant Gram-positive pathogens'. Nature Microbiology. 3 (4): 415–422. doi:10.1038/s41564-018-0110-1. PMC5874163. PMID29434326.
^Raes, J.; Letunic, I.; Yamada, T.; Jensen, L. J.; Bork, P. (2011). 'Toward molecular trait-based ecology through integration of biogeochemical, geographical and metagenomic data'. Molecular Systems Biology. 7: 473. doi:10.1038/msb.2011.6. PMC3094067. PMID21407210.
^Lavery, T. J.; Roudnew, B.; Seymour, J.; Mitchell, J. G.; Jeffries, T. (2012). Steinke, Dirk (ed.). 'High Nutrient Transport and Cycling Potential Revealed in the Microbial Metagenome of Australian Sea Lion (Neophoca cinerea) Faeces'. PLoS ONE. 7 (5): e36478. Bibcode:2012PLoSO..736478L. doi:10.1371/journal.pone.0036478. PMC3350522. PMID22606263.
^'What's Swimming In The River? Just Look For DNA'. NPR.org. 24 July 2013. Retrieved 10 October 2014.
^George I; et al. (2010). 'Application of Metagenomics to Bioremediation'. Metagenomics: Theory, Methods and Applications. Caister Academic Press. ISBN978-1-904455-54-7.
^Zimmer, Carl (13 July 2010). 'How Microbes Defend and Define Us'. New York Times. Retrieved 29 December 2011.
^Nelson KE and White BA (2010). 'Metagenomics and Its Applications to the Study of the Human Microbiome'. Metagenomics: Theory, Methods and Applications. Caister Academic Press. ISBN978-1-904455-54-7.
^Qin, Junjie; Ruiqiang Li; Jeroen Raes; Manimozhiyan Arumugam; Kristoffer Solvesten Burgdorf (March 2010). 'A human gut microbial gene catalogue established by metagenomic sequencing'. Nature. 464 (7285): 59–65. Bibcode:2010Natur.464..59. doi:10.1038/nature08821. PMC3779803. PMID20203603.

External links[edit]

Focus on Metagenomics at Nature Reviews Microbiology journal website
The “Critical Assessment of Metagenome Interpretation” (CAMI) initiative to evaluate methods in metagenomics

Retrieved from 'https://en.wikipedia.org/w/index.php?title=Metagenomics&oldid=899005851'

Published online 2012 Feb 9. doi: 10.1186/2042-5783-2-3

PMID: 22587947

This article has been cited by other articles in PMC.

Abstract

Metagenomics applies a suite of genomic technologies and bioinformatics tools to directly access the genetic content of entire communities of organisms. The field of metagenomics has been responsible for substantial advances in microbial ecology, evolution, and diversity over the past 5 to 10 years, and many research laboratories are actively engaged in it now. With the growing numbers of activities also comes a plethora of methodological knowledge and expertise that should guide future developments in the field. This review summarizes the current opinions in metagenomics, and provides practical guidance and advice on sample processing, sequencing technology, assembly, binning, annotation, experimental design, statistical analysis, data storage, and data sharing. As more metagenomic datasets are generated, the availability of standardized procedures and shared data storage and analysis becomes increasingly important to ensure that output of individual projects can be assessed and compared.

Keywords: sampling, sequencing, assembly, binning, annotation, data storage, data sharing, DNA extraction, microbial ecology, microbial diversity

Introduction

Arguably, one of the most remarkable events in the field of microbial ecology in the past decade has been the advent and development of metagenomics. Metagenomics is defined as the direct genetic analysis of genomes contained with an environmental sample. The field initially started with the cloning of environmental DNA, followed by functional expression screening [], and was then quickly complemented by direct random shotgun sequencing of environmental DNA [,]. These initial projects not only showed proof of principle of the metagenomic approach, but also uncovered an enormous functional gene diversity in the microbial world around us [].

Metagenomics provides access to the functional gene composition of microbial communities and thus gives a much broader description than phylogenetic surveys, which are often based only on the diversity of one gene, for instance the 16S rRNA gene. On its own, metagenomics gives genetic information on potentially novel biocatalysts or enzymes, genomic linkages between function and phylogeny for uncultured organisms, and evolutionary profiles of community function and structure. It can also be complemented with metatranscriptomic or metaproteomic approaches to describe expressed activities [,]. Metagenomics is also a powerful tool for generating novel hypotheses of microbial function; the remarkable discoveries of proteorhodopsin-based photoheterotrophy or ammonia-oxidizing Archaea attest to this fact [,].

The rapid and substantial cost reduction in next-generation sequencing has dramatically accelerated the development of sequence-based metagenomics. In fact, the number of metagenome shotgun sequence datasets has exploded in the past few years. In the future, metagenomics will be used in the same manner as 16S rRNA gene fingerprinting methods to describe microbial community profiles. It will therefore become a standard tool for many laboratories and scientists working in the field of microbial ecology.

This review gives an overview of the field of metagenomics, with particular emphasis on the steps involved in a typical sequence-based metagenome project (Figure (Figure1).1). We describe and discuss sample processing, sequencing technology, assembly, binning, annotation, experimental design, statistical analysis, and data storage and sharing. Clearly, any kind of metagenomic dataset will benefit from the rich information available from other metagenome projects, and it is hoped that common, yet flexible, standards and interactions among scientists in the field will facilitate this sharing of information. This review article summarizes the current thinking in the field and introduces current practices and key issues that those scientists new to the field need to consider for a successful metagenome project.

Flow diagram of a typical metagenome projects. Dashed arrows indicate steps that can be omitted.

Sampling and processing

Sample processing is the first and most crucial step in any metagenomics project. The DNA extracted should be representative of all cells present in the sample and sufficient amounts of high-quality nucleic acids must be obtained for subsequent library production and sequencing. Processing requires specific protocols for each sample type, and various robust methods for DNA extraction are available (e.g. [,]). Initiatives are also under way to explore the microbial biodiversity from tens of thousands of ecosystems using a single DNA extraction technology to ensure comparability [11].

If the target community is associated with a host (e.g. an invertebrate or plant), then either fractionation or selective lysis might be suitable to ensure that minimal host DNA is obtained (e.g. [,]). This is particularly important when the host genome is large and hence might 'overwhelm' the sequences of the microbial community in the subsequent sequencing effort. Physical fractionation is also applicable when only a certain part of the community is the target of analysis, for example, in viruses seawater samples. Here a range of selective filtration or centrifugation steps, or even flow cytometry, can be used to enrich the target fraction [,]. Fractionation steps should be checked to ensure that sufficient enrichment of the target is achieved and that minimal contamination of non-target material occurs.

Physical separation and isolation of cells from the samples might also be important to maximize DNA yield or avoid coextraction of enzymatic inhibitors (such as humic acids) that might interfere with subsequent processing. This situation is particularly relevant for soil metagenome projects, and substantial work has been done in this field to address the issue ([] and references therein). Direct lysis of cells in the soil matrix versus indirect lysis (i.e. after separation of cells from the soil) has a quantifiable bias in terms of microbial diversity, DNA yield, and resulting sequence fragment length []. The extensive work on soil highlights the need to ensure that extraction procedures are well benchmarked and that multiple methods are compared to ensure representative extraction of DNA.

Certain types of samples (such as biopsies or ground-water) often yield only very small amounts of DNA []. Library production for most sequencing technologies require high nanograms or micrograms amounts of DNA (see below), and hence amplification of starting material might be required. Multiple displacement amplification (MDA) using random hexamers and phage phi29 polymerase is one option employed to increase DNA yields. This method can amplify femtograms of DNA to produce micrograms of product and thus has been widely used in single-cell genomics and to a certain extent in metagenomics [,]. As with any amplification method, there are potential problems associated with reagent contaminations, chimera formation and sequence bias in the amplification, and their impact will depend on the amount and type of starting material and the required number of amplification rounds to produce sufficient amounts of nucleic acids. These issues can have significant impact on subsequent metagenomic community analysis [], and so it will be necessary to consider whether amplification is permissible.

Sequencing technology

Over the past 10 years metagenomic shotgun sequencing has gradually shifted from classical Sanger sequencing technology to next-generation sequencing (NGS). Sanger sequencing, however, is still considered the gold standard for sequencing, because of its low error rate, long read length (> 700 bp) and large insert sizes (e.g. > 30 Kb for fosmids or bacterial artificial chromosomes (BACs)). All of these aspects will improve assembly outcomes for shotgun data, and hence Sanger sequencing might still be applicable if generating close-to-complete genomes in low-diversity environments is the objective []. A drawback of Sanger sequencing is the labor-intensive cloning process in its associated bias against genes toxic for the cloning host [] and the overall cost per gigabase (appr. USD 400,000).

Of the NGS technologies, both the 454/Roche and the Illumina/Solexa systems have now been extensively applied to metagenomic samples. Excellent reviews of these technologies are available [,], but a brief summary is given here with particular attention to metagenomic applications.

The 454/Roche system applies emulsion polymerase chain reaction (ePCR) to clonally amplify random DNA fragments, which are attached to microscopic beads. Beads are deposited into the wells of a picotitre plate and then individually and in parallel pyrosequenced. The pyrosequencing process involves the sequential addition of all four deoxynucleoside triphosphates, which, if complementary to the template strand, are incorporated by a DNA polymerase. This polymerization reaction releases pyrophosphate, which is converted via two enzymatic reactions to produce light. Light production of ~ 1.2 million reactions is detected in parallel via a charge-coupled device (CCD) camera and converted to the actual sequence of the template. Two aspects are important in this process with respect to metagenomic applications. First, the ePCR has been shown to produce artificial replicate sequences, which will impact any estimates of gene abundance. Understanding the amount of replicate sequences is crucial for the data quality of sequencing runs, and replicates can be identified and filtered out with bioinformatics tools [,]. Second, the intensity of light produced when the polymerase runs through a homopolymer is often difficult to correlate to the actual number of nucleotide positions. Typically, this results in insertion or deletion errors in homopolymers and can hence cause reading frameshifts, if protein coding sequences (CDSs) are called on a single read. This type of error can however be incorporated into models of CDS prediction thus resulting in high, albeit not perfect, accuracy []. Despite these disadvantages, the much cheaper cost of ~ USD 20,000 per gigabase pair has made 454/Roche pyrosequencing a popular choice for shotgun-sequencing metagenomics. In addition, the 454/Roche technology produces an average read length between 600-800 bp, which is long enough to cause only minor loss in the number of reads that can be annotated []. Sample preparation has also been optimized so that tens of nanograms of DNA are sufficient for sequencing single-end libraries [,], although pair-end sequencing might still require micrograms quantities. Moreover, the 454/Roche sequencing platform offers multiplexing allowing for up to 12 samples to be analyzed in a single run of ~500 Mbp.

The Illumina/Solexa technology immobilizes random DNA fragments on a surface and then performs solid-surface PCR amplification, resulting in clusters of identical DNA fragments. These are then sequenced with reversible terminators in a sequencing-by-synthesis process []. The cluster density is enormous, with hundreds of millions of reads per surface channel and 16 channels per run on the HiSeq2000 instrument. Read length is now approaching 150 bp, and clustered fragments can be sequenced from both ends. Continuous sequence information of nearly 300 bp can be obtained from two overlapping 150 bp paired-reads from a single insert. Yields of ~60 Gbp can therefore be typically expected in a single channel. While Illumina/Solexa has limited systematic errors, some datasets have shown high error rates at the tail ends of reads []. In general, clipping reads has proven to be a good strategy for eliminating the error in 'bad' datasets, however, sequence quality values should also be used to detect 'bad' sequences. The lower costs of this technology (~ USD 50 per Gbp) and recent success in its application to metagenomics, and even the generation of draft genomes from complex dataset [,], are currently making the Illumina technology an increasingly popular choice. As with 454/Roche sequencing, starting material can be as low as a 20 nanograms, but larger amounts (500-1000 ng) are required when matepair-libraries for longer insert libraries are made. The limited read length of the Illumina/Solexa technology means that a greater proportion of unassembled reads might be too short for functional annotation than are with 454/Roche technology []. While assembly might be advisable in such a case, potential bias, such as the suppression of low-abundance species (which can not be assembled) should be considered, as should the fact that some current software packages (e.g. MG-RAST) are capable of analyzing unassembled Illumina reads of 75 bp and longer. Multiplexing of samples is also available for individual sequencing channels, with more than 500 samples multiplexed per lane. Another important factor to consider is run time, with a 2 × 100 bp paired-end sequencing analysis taking approx. 10 days HiSeq2000 instrument time, in contrast to 1 day for the 454/ Roche technology. However, faster runtime (albeit at higher cost per Gbp of approx. USD 600) can be achieved with the new Illumina MiSeq instrument. This smaller version of Illumina/Solexa technology can also be used to test-run sequencing libraries, before analysis on HiSeq instrument for deeper sequencing.

A few additional sequencing technologies are available that might prove useful for metagenomic applications, now or in the near future. The Applied Biosystems SOLiD sequencer has been extensively used, for example, in genome resequencing []. SOLiD arguably provides the lowest error rate of any current NGS sequencing technology, however it does not achieve reliable read length beyond 50 nucleotides. This will limit its applicability for direct gene annotation of unassembled reads or for assembly of large contigs. Nevertheless, for assembly or mapping of metagenomic data against a reference genome, recent work showed encouraging outcomes []. Roche is also marketing a smaller-scale sequencer based on pyrosequencing with about 100 Mbp output and low per run costs. This system might be useful, because relatively low coverage of metagenomes can establish meaningful gene profiles []. Ion Torrent (and more recently Ion Proton) is another emerging technology and is based on the principle that protons released during DNA polymerization can detect nucleotide incorporation. This system promises read lengths of > 100 bp and throughput on the order of magnitude of the 454/Roche sequencing systems. Pacific Biosciences (PacBio) has released a sequencing technology based on single-molecule, real-time detection in zero-mode waveguide wells. Theoretically, this technology on its RS1 platform should provide much greater read lengths than the other technologies mentioned, which would facilitate annotation and assembly. In addition, a process called strobing will mimic pair-end reads. However, accuracy of single reads with PacBio is currently only at 85%, and random reads are 'dropped,' making the instrument unusable in its current form for metagenomic sequencing []. Complete Genomics is offering a technology based on sequencing DNA nanoballs with combinatorial probe-anchor ligation []. Its read length of 35 nucleotides is rather limited and so might be its utility for de novo assemblies. While none of the emerging sequencing technologies have been thoroughly applied and tested with metagenomics samples, they offer promising alternatives and even further cost reduction.

Assembly

If the research aims at recovering the genome of uncultured organisms or obtain full-length CDS for subsequent characterization rather than a functional description of the community, then assembly of short read fragments will be performed to obtain longer genomic contigs. The majority of current assembly programs were designed to assemble single, clonal genomes and their utility for complex pan-genomic mixtures should be approached with caution and critical evaluation.

Two strategies can be employed for metagenomics samples: reference-based assembly (co-assembly) and de novo assembly.

Reference-based assembly can be done with software packages such as Newbler (Roche), AMOS http://sourceforge.net/projects/amos/, or MIRA [37]. These software packages include algorithms that are fast and memory-efficient and hence can often be performed on laptop-sized machines in a couple of hours. Reference-based assembly works well, if the metagenomic dataset contains sequences where closely related reference genomes are available. However, differences in the true genome of the sample to the reference, such as a large insertion, deletion, or polymorphisms, can mean that the assembly is fragmented or that divergent regions are not covered.

De novo assembly typically requires larger computational resources. Thus, a whole class of assembly tools based on the de Bruijn graphs was specifically created to handle very large amounts of data [,]. Machine requirements for the de Bruijn assemblers Velvet [] or SOAP [] are still significantly higher than for reference-based assembly (co-assembly), often requiring hundreds of gigabytes of memory in a single machine and run times frequently being days.

The fact that most (if not all) microbial communities include significant variation on a strain and species level makes the use of assembly algorithms that assume clonal genomes less suitable for metagenomics. The 'clonal' assumptions built into many assemblers might lead to suppression of contig formation for certain heterogeneous taxa at specific parameter settings. Recently, two de Bruijn-type assemblers, MetaVelvet and Meta-IDBA [] have been released that deal explicitly with the non-clonality of natural populations. Both assemblers aim to identify within the entire de Bruijn graph a sub-graph that represents related genomes. Alternatively, the metagenomic sequence mix can be partition into 'species bins' via k-mer binning (Titus Brown, personal communications). Those subgraphs or subsets are then resolved to build a consensus sequence of the genomes. For Meta-IDBA a improvement in terms of N50 and maximum contig length has been observed when compared to 'classical' de Bruijn assembler (e.g. Velvet or SOAP; results from the personal experience of the authors; data not shown here). The development of 'metagenomic assemblers' is however still at an early stage, and it is difficult to access their accuracy for real metagenomic data as typically no references exist to compare the results to. A true gold standard (i.e. a real dataset for a diverse microbial community with known reference sequences) that assemblers can be evaluated against is thus urgently required.

Several factors need to be considered when exploring the reasons for assembling metagenomic data; these can be condensed to two important questions. First, what is the length of the sequencing reads used to generate the metagenomic dataset, and are longer sequences required for annotation? Some approaches, e.g. IMG/M, prefer assembled contigs, other pipelines such as MG-RAST [] require only 75 bp or longer for gene prediction or similarity analysis that provides taxonomic binning and functional classification. On the whole, however, the longer the sequence information, the better is the ability to obtain accurate information. One obvious impact is on annotation: the longer the sequence, the more information provided, making it easier to compare with known genetic data (e.g. via homology searches []). Annotation issues will be discussed in the next section. Binning and classification of DNA fragments for phylogenetic or taxonomic assignment also benefits from long, contiguous sequences and certain tools (e.g. Phylopythia) work reliably only over a specific cut-off point (e.g. 1 Kb) []. Second, is the dataset assembled to reduce fragments can either be derived from assembled data or from sequenced fosmids and should ideally contain a phylogenetic marker (such as a rRNA gene) that can be used for high-resolution, taxonomic assignment of the binned fragments [].

Short reads may contain similarity to a known gene and this information can be used to putatively assign the read to a specific taxon. This taxonomic assignment obviously requires the availability of reference data. If the query sequence is only distantly related to known reference genomes, only a taxonomic assignment at a very high level (e.g. phylum) is possible. If the metagenomic dataset, however, contains two or more genomes that would fall into this high taxon assignment, then 'chimeric' bins might be produced. In this case, the two genomes might be separated by additional binning based on compositional features. In general, however this might again require that the unknown fragments have a certain length.

Binning algorithm will obviously in the future benefit from the availability of a greater number and phylogenetic breadth of reference genomes, in particular for similarity-based assignment to low taxonomic levels. Post-assembly the binning of contigs can lead to the generation of partial genomes of yet-uncultured or unknown organisms, which in turn can be used to perform similarity-based binning of other metagenomic datasets. Caution should however been taken to ensure the validity of any newly created genome bin, as 'contaminating' fragments can rapidly propagate into false assignments in subsequent binning efforts. Prior to assembly with clonal assemblers binning can be used to reduce the complexity of an assembly effort and might reduce computational requirement.

As major annotation pipelines like IMG/M or MG-RAST also perform taxonomic assignments of reads, one needs to carefully weigh the additional computational demands of the particular binning algorithm chosen against the added value they provide.

Annotation

For the annotation of metagenomes two different initial pathways can be taken. First, if reconstructed genomes are the objective of the study and assembly has produced large contigs, it is preferable to use existing pipelines for genome annotation, such as RAST [] or IMG []. For this approach to be successful, minimal contigs length of 30,000 bp or longer are required. Second, annotation can be performed on the entire community and relies on unassembled reads or short contigs. Here the tools for genome annotation are significantly less useful than those specifically developed for metagenomic analyses. Annotation of metagenomic sequence data has in general two steps. First, features of interest (genes) are identified (feature prediction) and, second, putative gene functions and taxonomic neighbors are assigned (functional annotation).

Feature prediction is the process of labeling sequences as genes or genomic elements. For completed genome sequences a number of algorithms have been developed [,] that identify CDS with more than 95% accuracy and a low false negative ratio. A number of tools were specifically designed to handle metagenomic prediction of CDS, including FragGeneScan [], MetaGeneMark [], MetaGeneAnnotator (MGA)/ Metagene [] and Orphelia [,]. All of these tools use internal information (e.g. codon usage) to classify sequence stretches as either coding or non-coding, however they distinguish themselves from each other by the quality of the training sets used and their usefulness for short or error-prone sequences. FragGeneScan is currently the only algorithm known to the authors that explicitly models sequencing errors and thus results in gene prediction errors of only 1-2%. True positive rates of FragGeneScan are around 70% (better than most other methods), which means that even this tool still misses a significant subset of genes. These missing genes can potentially be identified by BLAST-based searches, however the size of current metagenomic datasets makes this computational expensive step often prohibitive.

There exists also a number of tools for the prediction of non-protein coding genes such as tRNAs [,], signal peptides [] or CRISPRs [,], however they might require significant computational resources or long contiguous sequences. Clearly subsequent analysis depends on the initial identification of features and users of annotation pipelines need to be aware of the specific prediction approaches used. MG-RAST uses a two-step approach for feature identification, FGS and a similarity search for ribosomal RNAs against a non-redundant integration of the SILVA [], Greengenes [] and RDP [] databases. CAMERA's RAMCAPP pipeline [] uses FGA and MGA, while IMG/M employs a combination of tools, including FGS and MGA [,].

Metagenomics Methods And Protocols Pdf Answers

Functional annotation represents a major computational challenge for most metagenomic projects and therefore deserves much attention now and over the next years. Current estimates are that only 20 to 50% of a metagenomic sequences can be annotated [], leaving the immediate question of importance and function of the remaining genes. We note that annotation is not done de novo, but via mapping to gene or protein libraries with existing knowledge (i.e., a non-redundant database). Any sequences that cannot be mapped to the known sequence space are referred to as ORFans. These ORFans are responsible for the seemingly never-ending genetic novelty in microbial metagenomics (e.g. []. Three hypotheses exist for existence of this unknown fraction. First, ORFans might simply reflect erroneous CDS calls caused by imperfect detection algorithms. Secondly, these ORFans are real genes, but encode for unknown biochemical functions. Third, ORFan genes have no sequence homology with known genes, but might have structural homology with known proteins, thus representing known protein families or folds. Future work will likely reveal that the truth lies somewhere between these hypotheses []. For improving the annotation of ORFan genes, we will rely on the challenging and labor-intensive task of protein structure analysis (e.g. via NMR and x-ray crystallography) and on biochemical characterization.

Currently, metagenomic annotation relies on classifying sequences to known functions or taxonomic units based on homology searches against available 'annotated' data. Conceptually, the annotation is relatively simple and for small datasets (< 10,000 sequences) manual curation can be used increase the accuracy of any automated annotation. Metagenomic datasets are typically very large, so manual annotation is not possible. Automated annotation therefore has to become more accurate and computationally inexpensive. Currently, running a BLASTX similarity search is computationally expensive; as much as ten times the cost of sequencing [78]. Unfortunately, computationally less demanding methods involving detecting feature composition in genes [] have limited success for short reads. With growing dataset sizes, faster algorithms are urgently needed, and several programs for similarity searches have been developed to resolve this issue [,-].

Many reference databases are available to give functional context to metagenomic datasets, such as KEGG [], eggNOG [], COG/KOG [], PFAM [], and TIGRFAM []. However, since no reference database covers all biological functions, the ability to visualize and merge the interpretations of all database searches within a single framework is important, as implemented in the most recent versions of MG-RAST and IMG/M. It is essential that metagenome analysis platforms be able to share data in ways that map and visualize data in the framework of other platforms. These metagenomic exchange languages should also reduce the burden associated with re-processing large datasets, minimizing, the redundancy of searching and enabling the sharing of annotations that can be mapped to different ontologies and nomenclatures, thereby allowing multifaceted interpretations. The Genomic Standards Consortium (GSC) with the M5 project is providing a prototypical standard for exchange of computed metagenome analysis results, one cornerstone of these exchange languages.

Imperial Guard Helmet Adult Star Wars Supreme Edition Collector's Costume Mask See more like this. Imperial Guard Helmet. Astra Militarum Tempestus Scions Imperial Guard 7 x HELMET HEADS 40K. From Australia. Buy It Now +$3.09 shipping. Lego mini figure 1 Black Shako Imperial Guard hat helmet plain. May 06, 2016 Handcrafted Imperial Guard helmet. Check out my facebook page to see more work of this Kasrkin build! Kasrkin Helmet: https://www.etsy.com/listing/270758261/. Warhammer 40k imperial guard helmet. Save imperial guard helmet 40k to get e-mail alerts and updates on your eBay Feed. + Items in search results. Imperial Guard Cadian Pouches, Grenades, and Helmet Bits. Save up to 25% when you buy more. Buy It Now +$4.25 shipping. Free Returns.

Several large-scale databases are available that process and deposit metagenomic datasets. MG-RAST, IMG/M, and CAMERA are three prominent systems [,]. MG-RAST is a data repository, an analysis pipeline and a comparative genomics environment. Its fully automated pipeline provides quality control, feature prediction and functional annotation and has been optimized for achieving a trade-off between accuracy and computational efficiency for short reads using BLAT {Kent, 2002 #64}. Results are expressed in the form of abundance profiles for specific taxa or functional annotations. Supported are the comparison of NCBI taxonomies derived from 16S rRNA gene or whole genome shotgun data and the comparison of relative abundance for KEGG, eggNOG, COG and SEED subsystems on multiple levels of resolution. Users can also download all data products generated by MG-RAST, share them and publish within the portal. The MG-RAST web interface allows comparison using a number of statistical techniques and allows for the incorporation of metadata into the statistics. MG-RAST has more than 7000 users, > 38,000 uploaded and analyzed metagenomes (of which 7000 are publicly accessible) and 9 Terabases analyzed as of December 2011. These statistics demonstrate a move by the scientific community to centralize resources and standardize annotation.

IMG/M also provides a standardized pipeline, but with 'higher' sensitivity as it performs, for example, hidden Markov model (HMM) and BLASTX searches at substantial computational cost. In contrast to MG-RAST, comparisons in IMG/M are not performed on an abundance table level, but are based on an all vs. all genes comparison. Therefore IMG/M is the only system that integrates all datasets into a single protein level abstraction. Both IMG/M and MG-RAST provide the ability to use stored computational results for comparison, enabling comparison of novel metagenomes with a rich body of other datasets without requiring the end-user to provide the computational means for reanalysis of all datasets involved in their study. Other systems, such as CAMERA [], offer more flexible annotation schema but require that individual researchers understand the annotation of data and analytical pipelines well enough to be confident in their interpretation. Also for comparison, all datasets need to be analyzed using the same workflow, thus adding additional computational requirements. CAMERA allows the publication of datasets and was the first to support the Genomic Standards Consortium's Minimal Information checklists for metadata in their web interface [].

MEGAN is another tool used for visualizing annotation results derived from BLAST searches in a functional or taxonomic dendrogram []. The use of dendrograms to display metagenomic data provides a collapsible network of interpretation, which makes analysis of particular functional or taxonomic groups visually easy.

Experimental Design and Statistical Analysis

Owing to the high costs, many of the early metagenomic shotgun-sequencing projects were not replicated or were focused on targeted exploration of specific organisms (e.g. uncultured organisms in low-diversity acid mine drainage []). Reduction of sequencing cost (see above) and a much wider appreciation of the utility of metagenomics to address fundamental questions in microbial ecology now require proper experimental designs with appropriate replication and statistical analysis. These design and statistical aspects, while obvious, are often not properly implemented in the field of microbial ecology []. However, many suitable approaches and strategies are readily available from the decades of research in quantitative ecology of higher organisms (e.g. animals, plants). In a simplistic way, the data from multiple metagenomic shotgun-sequencing projects can be reduced to tables, where the columns represent samples and the rows indicate either a taxonomic group or a gene function (or groups thereof) and the fields containing abundance or presence/absence data. This is analogous to species-sample matrices in ecology of higher organisms, and hence many for the statistical tools available to identify correlations and statistically significant patterns are transferable. As metagenomic data however often contain many more species or gene functions then the number of samples taken, appropriate corrections for multiple hypothesis testing have to be implemented (e.g. Bonferroni correction for t-test based analyses).

The Primer-E package [89] is a well-established tool, allowing for a range of multivariate statistical analyses, including the generation of multidimensional scaling (MDS) plots, analysis of similarities (ANOSIM), and identification of the species or functions that contribute to the difference between two samples (SIMPER). Recently, multivariate statistics was also incorporated in a web-based tools called Metastats [], which revealed with high confidence discriminatory functions between the replicated metagenome dataset of the gut microbiota of lean and obese mice []. In addition, the ShotgunFunctionalizeR package provides several statistical procedures for assessing functional differences between samples, both for individual genes and for entire pathways using the popular R statistical package [].

Ideally, and in general, experimental design should be driven by the question asked (rather than technical or operational restriction). For example, if a project aims to identify unique taxa or functions in a particular habitat, then suitable reference samples for comparison should be taken and processed in consistent manner. In addition, variation between sample types can be due to true biological variation, (something biologist would be most interested in) and technical variation and this should be carefully considered when planning the experiment. One should also be aware that many microbial systems are highly dynamic, so temporal aspects of sampling can have a substantial impact on data analysis and interpretation. While the question of the number of replicates is often difficult to predict prior to the final statistical analysis, small-scale experiments are often useful to understand the magnitude of variation inherent in a system. For example, a small number of samples could be selected and sequenced to shallower depth, then analyzed to determine if a larger sampling size or greater sequencing effort are required to obtain statistically meaningful results []. Also, the level at which replication takes place is something that should not lead to false interpretation of the data. For example, if one is interested in the level of functional variation of the microbial community in habitat A, then multiple samples from this habitat should be taken and processed completely separately, but in the same manner. Taking just one sample and splitting it up prior to processing will provide information only about technical, but not biological, variation in habitat A. Taking multiple samples and then pooling them will lose all information on variability and hence will be of little use for statistical purposes. Ultimately, good experimental design of metagenomic projects will facilitate integration of datasets into new or existing ecological theories [].

As metagenomics gradually moves through a range of explorative biodiversity surveys, it will also prove itself extremely valuable for manipulative experiments. These will allow for observation of treatment impact on the functional and phylogenetic composition of microbial communities. Initial experiments already showed promising results []. However, careful experimental planning and interpretations should be paramount in this field.

One of the ultimate aims of metagenomics is to link functional and phylogenetic information to the chemical, physical, and other biological parameters that characterize an environment. While measuring all these parameters can be time-consuming and cost-intensive, it allows retrospective correlation analysis of metagenomic data that was perhaps not part of the initial aim of the project or might be of interest for other research questions. The value of such metadata cannot be overstated and, in fact, has become mandatory or optional for deposition of metagenomic data into some databases [,].

Sharing and Storage of Data

Data sharing has a long tradition in the field of genome research, but for metagenomic data this will require a whole new level of organization and collaboration to provide metadata and centralized services (e.g., IMG/M, CAMERA and MG-RAST) as well as sharing of both data and computational results. In order to enable sharing of computed results, some aspects of the various analytical pipelines mentioned above will need to be coordinated - a process currently under way under the auspices of the GSC. Once this has been achieved, researchers will be able to download intermediate and processed results from any one of the major repositories for local analysis or comparison.

A suite of standard languages for metadata is currently provided by the Minimum Information about any (x) Sequence checklists (MIxS) []. MIxS is an umbrella term to describe MIGS (the Minimum Information about a Genome Sequence), MIMS (the Minimum Information about a Metagenome Sequence) and MIMARKS (Minimum Information about a MARKer Sequence)[] and contains standard formats for recording environmental and experimental data. The latest of these checklists, MIMARKS builds on the foundation of the MIGS and MIMS checklists, by including an expansion of the rich contextual information about each environmental sample.

The question of centralized versus decentralized storage is also one of 'who pays for the storage,' which is a matter with no simple answer. The US National Center for Biotechnology Information (NCBI) is mandated to store all metagenomic data, however, the sheer volume of data being generated means there is an urgent need for appropriate ways of storing vast amounts of sequences. As the cost of sequencing continues to drop while the cost for analysis and storing remains more or less constant, selection of data storage in either biological (i.e. the sample that was sequenced) or digital form in (de-) centralized archives might be required. Ongoing work and successes in compression of (meta-) genomic data [], however, might mean that digital information can still be stored cost-efficiently in the near future.

Conclusion

Metagenomics has benefited in the past few years from many visionary investments in both financial and intellectual terms. To ensure that those investments are utilized in the best possible way, the scientific community should aim to share, compare, and critically evaluate the outcomes of metagenomic studies. As datasets become increasingly more complex and comprehensive, novel tools for analysis, storage, and visualization will be required. These will ensure the best use of the metagenomics as a tool to address fundamental question of microbial ecology, evolution and diversity and to derive and test new hypotheses. Metagenomics will be employed as commonly and frequently as any other laboratory method, and 'metagenomizing' a sample might become as colloquial as 'PCRing.' It is therefore also important that metagenomics be taught to students and young scientists in the same way that other techniques and approaches have been in the past.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

All authors contributed to the conception and writing of the review article. All authors have read and approved the final manuscript.

Acknowledgements

This work was supported by the Australian Research Council and the U.S. Dept. of Energy under Contract DE-AC02-06CH11357.

The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory ('Argonne'). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.

References

Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM. Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem Biol. 1998;5(10):R245–249. doi: 10.1016/S1074-5521(98)90108-9. [PubMed] [CrossRef] [Google Scholar]
Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, Solovyev VV, Rubin EM, Rokhsar DS, Banfield JF. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004;428(6978):37–43. doi: 10.1038/nature02340. [PubMed] [CrossRef] [Google Scholar]
Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304(5667):66–74. doi: 10.1126/science.1093857. [PubMed] [CrossRef] [Google Scholar]
Simon C, Daniel R. Metagenomic analyses: past and future trends. Appl Environ Microbiol. 2011;77(4):1153–1161. doi: 10.1128/AEM.02345-10.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Wilmes P, Bond PL. Metaproteomics: studying functional gene expression in microbial ecosystems. Trends Microbiol. 2006;14(2):92–97. doi: 10.1016/j.tim.2005.12.006. [PubMed] [CrossRef] [Google Scholar]
Gilbert JA, Field D, Huang Y, Edwards R, Li W, Gilna P, Joint I. Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS One. 2008;3(8):e3042. doi: 10.1371/journal.pone.0003042.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Beja O, Aravind L, Koonin EV, Suzuki MT, Hadd A, Nguyen LP, Jovanovich SB, Gates CM, Feldman RA, Spudich JL, Spudich EN, DeLong EF. Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science. 2000;289(5486):1902–1906. doi: 10.1126/science.289.5486.1902. [PubMed] [CrossRef] [Google Scholar]
Nicol GW, Schleper C. Ammonia-oxidising Crenarchaeota: important players in the nitrogen cycle? Trends Microbiol. 2006;14(5):207–212. doi: 10.1016/j.tim.2006.03.004. [PubMed] [CrossRef] [Google Scholar]
Burke C, Kjelleberg S, Thomas T. Selective extraction of bacterial DNA from the surfaces of macroalgae. Appl Environ Microbiol. 2009;75(1):252–256. doi: 10.1128/AEM.01630-08.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Delmont TO, Robe P, Clark I, Simonet P, Vogel TM. Metagenomic comparison of direct and indirect soil DNA extraction approaches. J Microbiol Methods. 2011;86(3):397–400. doi: 10.1016/j.mimet.2011.06.013. [PubMed] [CrossRef] [Google Scholar]
Knight R, Desai N, Field D, Fierer N, Fuhrman J, Gordon J, Hu B, Hugenholtz P, Jansson J, Meyer F, Stevens R, Bailey M, Kowalchuk G, Gilbert J. Designing Better Metagenomic Surveys: The role of experimental design and metadata capture in making useful metagenomic datasets for ecology and biotechnology. Nature Biotechnology. in review.
Thomas T, Rusch D, DeMaere MZ, Yung PY, Lewis M, Halpern A, Heidelberg KB, Egan S, Steinberg PD, Kjelleberg S. Functional genomic signatures of sponge bacteria reveal unique and shared features of symbiosis. ISME J. 2010;4(12):1557–1567. doi: 10.1038/ismej.2010.74. [PubMed] [CrossRef] [Google Scholar]
Palenik B, Ren Q, Tai V, Paulsen IT. Coastal Synechococcus metagenome reveals major roles for horizontal gene transfer and plasmids in population diversity. Environ Microbiol. 2009;11(2):349–359. doi: 10.1111/j.1462-2920.2008.01772.x. [PubMed] [CrossRef] [Google Scholar]
Angly FE, Felts B, Breitbart M, Salamon P, Edwards RA, Carlson C, Chan AM, Haynes M, Kelley S, Liu H, Mahaffy JM, Mueller JE, Nulton J, Olson R, Parsons R, Rayhawk S, Suttle CA, Rohwer F. The marine viromes of four oceanic regions. PLoS Biol. 2006;4(11):e368. doi: 10.1371/journal.pbio.0040368.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Abbai NS, Govender A, Shaik R, Pillay B. Pyrosequence analysis of unamplified and whole genome amplified DNA from hydrocarbon-contaminated groundwater. Mol Biotechnol. 2011. [PubMed]
Lasken RS. Genomic DNA amplification by the multiple displacement amplification (MDA) method. Biochem Soc Trans. 2009;37(Pt 2):450–453. [PubMed] [Google Scholar]
Ishoey T, Woyke T, Stepanauskas R, Novotny M, Lasken RS. Genomic sequencing of single microbial cells from environmental samples. Curr Opin Microbiol. 2008;11(3):198–204. doi: 10.1016/j.mib.2008.05.006.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Goltsman DS, Denef VJ, Singer SW, VerBerkmoes NC, Lefsrud M, Mueller RS, Dick GJ, Sun CL, Wheeler KE, Zemla A, Baker BJ, Hauser L, Land M, Shah MB, Thelen MP, Hettich RL, Banfield JF. Community genomic and proteomic analyses of chemoautotrophic iron-oxidizing 'Leptospirillum rubarum' (Group II) and ' Leptospirillum ferrodiazotrophum' (Group III) bacteria in acid mine drainage biofilms. Appl Environ Microbiol. 2009;75(13):4599–4615. doi: 10.1128/AEM.02943-08.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Sorek R, Zhu Y, Creevey CJ, Francino MP, Bork P, Rubin EM. Genome-wide experimental determination of barriers to horizontal gene transfer. Science. 2007;318(5855):1449–1452. doi: 10.1126/science.1147112. [PubMed] [CrossRef] [Google Scholar]
Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11(1):31–46. doi: 10.1038/nrg2626. [PubMed] [CrossRef] [Google Scholar]
Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008;24(3):133–141. doi: 10.1016/j.tig.2007.12.007. [PubMed] [CrossRef] [Google Scholar]
Niu B, Fu L, Sun S, Li W. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics. 2010;11:187. doi: 10.1186/1471-2105-11-187.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Teal TK, Schmidt TM. Identifying and removing artificial replicates from 454 pyrosequencing data. Cold Spring Harb Protoc. 2010;2010(4):pdb prot5409. [PubMed] [Google Scholar]
Rho M, Tang H, Ye Y. FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 2010;38(20):e191. doi: 10.1093/nar/gkq747.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Wommack KE, Bhavsar J, Ravel J. Metagenomics: read length matters. Appl Environ Microbiol. 2008;74(5):1453–1463. doi: 10.1128/AEM.02181-07.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
White RA, Blainey PC, Fan HC, Quake SR. Digital PCR provides sensitive and absolute calibration for high throughput sequencing. BMC Genomics. 2009;10:116. doi: 10.1186/1471-2164-10-116.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Adey A, Morrison HG, Asan Xun X, Kitzman JO, Turner EH, Stackhouse B, MacKenzie AP, Caruccio NC, Zhang X, Shendure J. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 2010;11(12):R119. doi: 10.1186/gb-2010-11-12-r119.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR. et al.Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456(7218):53–59. doi: 10.1038/nature07517.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, Ishikawa S, Linak MC, Hirai A, Takahashi H, Altaf-Ul-Amin M, Ogasawara N, Kanaya S. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 2011;39(13):e90. doi: 10.1093/nar/gkr344.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Hess M, Sczyrba A, Egan R, Kim TW, Chokhawala H, Schroth G, Luo S, Clark DS, Chen F, Zhang T, Mackie RI, Pennacchio LA, Tringe SG, Visel A, Woyke T, Wang Z, Rubin EM. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science. 2011;331(6016):463–467. doi: 10.1126/science.1200387. [PubMed] [CrossRef] [Google Scholar]
Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P. et al.A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464(7285):59–65. doi: 10.1038/nature08821.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Gulig PA, de Crecy-Lagard V, Wright AC, Walts B, Telonis-Scott M, McIntyre LM. SOLiD sequencing of four Vibrio vulnificus genomes enables comparative genomic analysis and identification of candidate clade-specific virulence genes. BMC Genomics. 2010;11:512. doi: 10.1186/1471-2164-11-512.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Tyler HL, Roesch LF, Gowda S, Dawson WO, Triplett EW. Confirmation of the sequence of 'Candidatus Liberibacter asiaticus' and assessment of microbial diversity in Huanglongbing-infected citrus phloem using a metagenomic approach. Mol Plant Microbe Interact. 2009;22(12):1624–1634. doi: 10.1094/MPMI-22-12-1624. [PubMed] [CrossRef] [Google Scholar]
Kunin V, Raes J, Harris JK, Spear JR, Walker JJ, Ivanova N, von Mering C, Bebout BM, Pace NR, Bork P, Hugenholtz P. Millimeter-scale genetic gradients and community-level molecular convergence in a hypersaline microbial mat. Mol Syst Biol. 2008;4:198.[PMC free article] [PubMed] [Google Scholar]
Rasko DA, Webster DR, Sahl JW, Bashir A, Boisen N, Scheutz F, Paxinos EE, Sebra R, Chin CS, Iliopoulos D, Klammer A, Peluso P, Lee L, Kislyuk AO, Bullard J, Kasarskis A, Wang S, Eid J, Rank D, Redman JC, Steyert SR, Frimodt-Moller J, Struve C, Petersen AM, Krogfelt KA, Nataro JP, Schadt EE, Waldor MK. Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N Engl J Med. 2011;365(8):709–717. doi: 10.1056/NEJMoa1106920.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG, Carnevali P, Nazarenko I, Nilsen GB, Yeung G, Dahl F, Fernandez A, Staker B, Pant KP, Baccash J, Borcherding AP, Brownley A, Cedeno R, Chen L, Chernikoff D, Cheung A, Chirita R, Curson B, Ebert JC, Hacker CR, Hartlage R, Hauser B, Huang S, Jiang Y, Karpinchyk V. et al.Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science. 2010;327(5961):78–81. doi: 10.1126/science.1181498. [PubMed] [CrossRef] [Google Scholar]
Chevreux B, Wetter T, Suhai S. Genome Sequence Assembly Using Trace Signals and Additional Sequence Information Computer Science and Biology. Proceedings of the German Conference on Bioinformatics. 1999;99:45–56.[Google Scholar]
Miller JR, Koren S, Sutton G. Assembly algorithms for next-generation sequencing data. Genomics. 2010;95(6):315–327. doi: 10.1016/j.ygeno.2010.03.001.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Pevzner PA, Tang H, Waterman MS. An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci USA. 2001;98(17):9748–9753. doi: 10.1073/pnas.171285098.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18(5):821–829. doi: 10.1101/gr.074492.107.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Li R, Li Y, Kristiansen K, Wang J. SOAP: short oligonucleotide alignment program. Bioinformatics. 2008;24(5):713–714. doi: 10.1093/bioinformatics/btn025. [PubMed] [CrossRef] [Google Scholar]
Peng Y, Leung HC, Yiu SM, Chin FY. Meta-IDBA: a de Novo assembler for metagenomic data. Bioinformatics. 2011;27(13):i94–101. doi: 10.1093/bioinformatics/btr216.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Glass EM, Wilkening J, Wilke A, Antonopoulos D, Meyer F. Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc. 2010;2010(1) pdb prot5368. [PubMed] [Google Scholar]
McHardy AC, Martin HG, Tsirigos A, Hugenholtz P, Rigoutsos I. Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2007;4(1):63–72. doi: 10.1038/nmeth976. [PubMed] [CrossRef] [Google Scholar]
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–1659. doi: 10.1093/bioinformatics/btl158. [PubMed] [CrossRef] [Google Scholar]
Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26(19):2460–2461. doi: 10.1093/bioinformatics/btq461. [PubMed] [CrossRef] [Google Scholar]
Chan CK, Hsu AL, Halgamuge SK, Tang SL. Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics. 2008;9:215. doi: 10.1186/1471-2105-9-215.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Zheng H, Wu H. Short prokaryotic DNA fragment binning using a hierarchical classifier based on linear discriminant analysis and principal component analysis. J Bioinform Comput Biol. 2010;8(6):995–1011. doi: 10.1142/S0219720010005051. [PubMed] [CrossRef] [Google Scholar]
Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW. TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics. 2009;10:56. doi: 10.1186/1471-2105-10-56.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Markowitz VM, Ivanova NN, Szeto E, Palaniappan K, Chu K, Dalevi D, Chen IM, Grechkin Y, Dubchak I, Anderson I, Lykidis A, Mavromatis K, Hugenholtz P, Kyrpides NC. IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Res. 2008. pp. D534–538. [PMC free article] [PubMed]
Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Res. 2007;17(3):377–386. doi: 10.1101/gr.5969107.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Krause L, Diaz NN, Goesmann A, Kelley S, Nattkemper TW, Rohwer F, Edwards RA, Stoye J. Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res. 2008;36(7):2230–2239. doi: 10.1093/nar/gkn038.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Monzoorul Haque M, Ghosh TS, Komanduri D, Mande SS. SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences. Bioinformatics. 2009;25(14):1722–1730. doi: 10.1093/bioinformatics/btp317. [PubMed] [CrossRef] [Google Scholar]
Liu B, Gibbons T, Ghodsi M, Treangen T, Pop M. Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics. 2011;12(Suppl 2):S4. doi: 10.1186/1471-2164-12-S2-S4.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Brady A, Salzberg SL. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods. 2009;6(9):673–676. doi: 10.1038/nmeth.1358.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Leung HC, Yiu SM, Yang B, Peng Y, Wang Y, Liu Z, Chen J, Qin J, Li R, Chin FY. A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics. 2011;27(11):1489–1495. doi: 10.1093/bioinformatics/btr186. [PubMed] [CrossRef] [Google Scholar]
Yung PY, Burke C, Lewis M, Egan S, Kjelleberg S, Thomas T. Phylogenetic screening of a bacterial, metagenomic library using homing endonuclease restriction and marker insertion. Nucleic Acids Res. 2009;37(21):e144. doi: 10.1093/nar/gkp746.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O. The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 2008;9:75. doi: 10.1186/1471-2164-9-75.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Markowitz VM, Mavromatis K, Ivanova NN, Chen IM, Chu K, Kyrpides NC. IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics. 2009;25(17):2271–2278. doi: 10.1093/bioinformatics/btp393. [PubMed] [CrossRef] [Google Scholar]
Lukashin AV, Borodovsky M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 1998;26(4):1107–1115. doi: 10.1093/nar/26.4.1107.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Delcher AL, Harmon D, Kasif S, White O, Salzberg SL. Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999;27(23):4636–4641. doi: 10.1093/nar/27.23.4636.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
McHardy ACZ, Wenhan Martin HGL, Alexandre Tsirigos A, Hugenholtz P, Rigoutsos IB, Mark. Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2007;4(1):63–72. doi: 10.1038/nmeth976. [PubMed] [CrossRef] [Google Scholar]
Noguchi H, Taniguchi T, Itoh T. MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res. 2008;15(6):387–396. doi: 10.1093/dnares/dsn027.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Hoff KJ, Lingner T, Meinicke P, Tech M. Orphelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res. 2009. pp. W101–105. [PMC free article] [PubMed]
Yok NG, Rosen GL. Combining gene prediction methods to improve metagenomic gene annotation. BMC Bioinformatics. 2011;12:20. doi: 10.1186/1471-2105-12-20.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A. Rfam: updates to the RNA families database. Nucleic Acids Res. 2009. pp. D136–140. [PMC free article] [PubMed]
Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25(5):955–964. doi: 10.1093/nar/25.5.955.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Bendtsen JD, Nielsen H, von Heijne G, Brunak S. Improved prediction of signal peptides: SignalP 3.0. J Molec Biol. 2004;340(4):783–795. doi: 10.1016/j.jmb.2004.05.028. [PubMed] [CrossRef] [Google Scholar]
Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, Hugenholtz P. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics. 2007;8:209. doi: 10.1186/1471-2105-8-209.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Grissa I, Vergnaud G, Pourcel C. CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res. 2007. pp. W52–57. [PMC free article] [PubMed]
Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glöckner FO. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 2007;35(21):7188–7196. doi: 10.1093/nar/gkm864.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006;72(7):5069–5072. doi: 10.1128/AEM.03006-05.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 2009. pp. D141–145. [PMC free article] [PubMed]
Sun S, Chen J, Li W, Altintas I, Lin A, Peltier S, Stocks K, Allen EE, Ellisman M, Grethe J, Wooley J. Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource. Nucleic Acids Res. 2011. pp. D546–551. [PMC free article] [PubMed]
Gilbert JA, Field D, Swift P, Thomas S, Cummings D, Temperton B, Weynberg K, Huse S, Hughes M, Joint I, Somerfield PJ, Muhling M. The taxonomic and functional diversity of microbes at a temperate coastal site: a 'multi-omic' study of seasonal and diel temporal variation. PLoS One. 2010;5(11):e15545. doi: 10.1371/journal.pone.0015545.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, Jaroszewski L, Cieplak P, Miller CS, Li H, Mashiyama ST, Joachimiak MP, van Belle C, Chandonia JM, Soergel DA, Zhai Y, Natarajan K, Lee S, Raphael BJ, Bafna V, Friedman R, Brenner SE, Godzik A, Eisenberg D, Dixon JE, Taylor SS. et al.The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol. 2007;5(3):e16. doi: 10.1371/journal.pbio.0050016.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Godzik A. Metagenomics and the protein universe. Curr Opin Struct Biol. 2011;21(3):398–403. doi: 10.1016/j.sbi.2011.03.010.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Wilkening J, Desai N, Meyer F, A W. Using clouds for metagenomics - case study. IEEE Cluster. 2009.
Ye Y, Choi JH, Tang H. RAPSearch: a fast protein similarity search tool for short reads. BMC Bioinformatics. 2011;12:159. doi: 10.1186/1471-2105-12-159.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Kent WJ. BLAT-the BLAST-like alignment tool. Genome Res. 2002;12(4):656–664.[PMC free article] [PubMed] [Google Scholar]
Wang W, Zhang P, Liu X. Short read DNA fragment anchoring algorithm. BMC Bioinformatics. 2009;10(Suppl 1):S17. doi: 10.1186/1471-2105-10-S1-S17.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004. pp. D277–280. [PMC free article] [PubMed]
Muller J, Szklarczyk D, Julien P, Letunic I, Roth A, Kuhn M, Powell S, von Mering C, Doerks T, Jensen LJ, Bork P. eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 2010. pp. D190–195. [PMC free article] [PubMed]
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41. doi: 10.1186/1471-2105-4-41.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer EL, Eddy SR, Bateman A. The Pfam protein families database. Nucleic Acids Res. 2010. pp. D211–222. [PMC free article] [PubMed]
Selengut JD, Haft DH, Davidsen T, Ganapathy A, Gwinn-Giglio M, Nelson WC, Richter AR, White O. TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 2007. pp. D260–264. [PMC free article] [PubMed]
Field D, Amaral-Zettler L, Cochrane G, Cole JR, Dawyndt P, Garrity GM, Gilbert J, Glockner FO, Hirschman L, Karsch-Mizrachi I, Klenk HP, Knight R, Kottmann R, Kyrpides N, Meyer F, San Gil I, Sansone SA, Schriml LM, Sterk P, Tatusova T, Ussery DW, White O, Wooley J, Yilmaz P, Gilbert JA, Johnston A, Vaughan R, Hunter C, Park J, Morrison N. et al.The Genomic Standards Consortium: Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. PLoS Biol. 2011;9(6):e1001088. doi: 10.1371/journal.pbio.1001088.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Prosser JI. Replicate or lie. Environ Microbiol. 2010;12(7):1806–1810. doi: 10.1111/j.1462-2920.2010.02201.x. [PubMed] [CrossRef] [Google Scholar]
Clarke KR. Non-parametric multivariate analyses of changes in community structure. Australian J Ecology. 1993. pp. 117–143.
White JR, Nagarajan N, Pop M. Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol. 2009;5(4):e1000352. doi: 10.1371/journal.pcbi.1000352.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, Sogin ML, Jones WJ, Roe BA, Affourtit JP, Egholm M, Henrissat B, Heath AC, Knight R, Gordon JI. A core gut microbiome in obese and lean twins. Nature. 2009;457(7228):480–484. doi: 10.1038/nature07540.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Kristiansson E, Hugenholtz P, Dalevi D. ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes. Bioinformatics. 2009;25(20):2737–2738. doi: 10.1093/bioinformatics/btp508. [PubMed] [CrossRef] [Google Scholar]
Burke C, Steinberg P, Rusch D, Kjelleberg S, Thomas T. Bacterial community assembly based on functional genes rather than species. Proc Natl Acad Sci USA. 2011;108(34):14288–14293. doi: 10.1073/pnas.1101591108.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Mou X, Sun S, Edwards RA, Hodson RE, Moran MA. Bacterial carbon processing by generalist species in the coastal ocean. Nature. 2008;451(7179):708–711. doi: 10.1038/nature06513. [PubMed] [CrossRef] [Google Scholar]
Yilmaz P, Kottmann R, Field D, Knight R, Cole JR, Amaral-Zettler L, Gilbert JA, Karsch-Mizrachi I, Johnston A, Cochrane G, Vaughan R, Hunter C, Park J, Morrison N, Rocca-Serra P, Sterk P, Arumugam M, Bailey M, Baumgartner L, Birren BW, Blaser MJ, Bonazzi V, Booth T, Bork P, Bushman FD, Buttigieg PL, Chain PS, Charlson E, Costello EK, Huot-Creasy H. et al.Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol. 2011;29(5):415–420. doi: 10.1038/nbt.1823.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Hsi-Yang Fritz M, Leinonen R, Cochrane G, Birney E. Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. 2011;21(5):734–740. doi: 10.1101/gr.114819.110.[PMC free article] [PubMed] [CrossRef] [Google Scholar]

Articles from Microbial Informatics and Experimentation are provided here courtesy of BioMed Central