Guidelines and recommendations for Genome/Transcriptome projects

To facilitate the establishment of project-wide standards and best practices to obtain optimal invertebrate genome sequences, we provide a checklist describing concerns and recommendations common to many genome and transcriptome sequencing projects in approximate chronological order.

Permissions

Prior to acquiring specimens (here defined as individual organisms, colonies, or stable symbiotic associations) or samples (here defined as components, aliquots or extracts of specimens), rigorous effort must be made to determine whether the specimens or samples have been or will be collected in compliance with all applicable laws, rules and regulations established by the appropriate local, regional, and national authorities. Be aware that the process of obtaining permits may be slow, often requiring months to years for completion. In addition, rigorous attempts should be made to comply with applicable international agreements, such as the Nagoya Protocol (NP) of the Convention on Biological Diversity (UNEP 2011). If samples are collected on private land, permission must be obtained from the property owner(s). Be aware that any of the previously mentioned persons, authorities and jurisdictions may be stakeholders with explicit rights to the materials being collected and to all derivatives and progeny of those materials and all intellectual property derived from them.

Collection

Methods for field collection and preservation of specimens and tissues should be compatible with recovery of high quality genomic DNA and RNA (Dawson et al, 1998). In general, highest quality nucleic acid (NA) may be obtained from living or freshly captured and freshly sacrificed animals and tissues. Avoid long storage of living or preserved specimens. Keep non-living materials at the lowest temperature compatible with your collection workflow. Acceptable tissue storage methods include:
  • (DNA or RNA sample) Snap freeze and ship on liquid nitrogen or dry ice.
  • (DNA or RNA sample) grind directly in Trizol reagent or lysis solution containing a strong denaturant such as guanidinium isothyocyanate (5M) and ship on ice packs.
  • (DNA sample) place tissue in OGLFix tissue stabilization reagent or DMSO/EDTA solutions and ship on ice packs or at room temperature (Seutin and White, 1991).
  • (RNA sample) place tissue in RNALater or an equivalent RNA stabilization product, ship frozen or on ice packs. RNAlater has often been used as a preservative. However, some recent results with certain invertebrate taxa have been less than optimal, and so caution is suggested, as well as applying >1 preservation method for precious and costly samples.
  • (DNA sample) place tissue in 75-95% EtOH, replace EtOH at least twice- once after 15 min, then another time after 1-4 hours, ship following appropriate precautions for hazardous and inflammable materials (see IATA website for restrictions on shipping ethanol).

  Specimen preparation and dissection

  • Take precautions to avoid tissues that may contain:
    1. High concentration of nucleases (hepatopancreas, digestive glands, digestive organs)
    2. Foreign nucleic acids (digestive tract and contents, symbiont bearing structures, parasitic or commensal organisms)
    3. High mucus, lipid, fat, wax or glycogen content (mucus glands, eggs, female gonads, fatty or waxy tissues)
  • Avoid insoluble, chitinous and mineralized tissues when possible
  • When gut cannot be excluded, try brief starvation or purging gut by feeding with DNA free food items prior to DNA extraction
  • If the organism cannot be separated from a commensal or symbiotic population attempt to collect pure samples of the commensal or symbionts so that the sequences can later be deconvoluted
  • Isolate internal tissue, i.e tissue not exposed to the external environment, whenever possible

  Nucleic Acid (NA) Quality Criteria

Quality may be defined differently for different purposes and sequencing methods. For this reason, GIGA makes recommendations for best practices rather than setting explicit quality standards.
  • Molecular weight: Choose extraction/purification methods that maintain high molecular weight. Although low molecular weight DNA is useful for many commonly used short read sequencing platforms, high molecular weight is preferable for large insert library construction, for single molecule sequencing methods, for genome mapping, and for whole genome amplification. A recommended goal is to maintain a significant fraction of the total genomic DNA in fragments of greater than 20-50 kb. High molecular weight DNA will form a sharp band above a 12-20 kb molecular weight standard after electrophoretic separation in a 0.8% agarose gel and staining with an appropriate nucleic acid stain. The average molecular weight and quantity of high molecular weight DNA may be estimated by visual or photometric comparison to appropriate molecular weight weight standards. Alternatively, similar measurements may be made using a dedicated NA analysis instrument such as the Agilent Bioanalyzer.
  • NA Purity: Attempt to eliminate contaminants including enzymes that may cut, nick, hydrolyze or bind DNA or RNA and other compounds that may inhibit downstream applications. The UV absorbance of a purified NA sample over a spectrum from 200-320 nm is a good indicator of sample purity. At minimum, absorbance should be measured at 230, 260 and 280nm. Absorbance ratios of A260/A280 = 1.8-2.2, and A260/A230 = 1.5-2.0 indicate high purity nucleic acid solutions free of protein and non-protein contaminants respectively. Spectrophotometric analysis may be impractical for very small samples or samples with low concentration, in which case it more effective to assess quality at a later step, e.g. library construction.
  • Inhibitors:The presence of many types of contaminants may inhibit the activity of enzymes commonly used for downstream applications such as library preparation and NA sequencing. Presence of inhibitors may be assayed by digestion with methylation insensitive frequent cutting restriction enzymes such as Alu I, Hind III, or Hae III. If the average molecular weight of the NA is not reduced (as evidenced by gel electrophoresis or bioanalyzer) after restriction digestion, the sample may be contaminated with an inhibitory compound and may require further purification.
  • Taxonomic identity of the NA source:Prior to library construction or NA sequencing the taxonomic origin of the DNA sample to be sequenced should be verified by PCR amplification and sequencing of at least one diagnostic gene. Reference sequences and taxonomic group specific PCR primer sequences are available for many taxa in curated collections (e.g. BOLD, RDP, European RNA database) for small- and large-subunit ribosomal RNA genes, mitochondrial cytochrome oxidase 1, and a variety of other nuclear or mitochondrial biomarkers useful for taxonomic identification. Ambiguous or unreadable sequence results may be an indicator of heterogeneity in the NA sample and may indicate contamination by eukaryotic or prokaryotic organisms.
  • Quantity:The quantity of NA required for sequencing varies widely depending on the sequencing platform and library construction methods to be employed. For library construction, larger inserts generally require a greater mass of starting DNA. For current eukaryotic genome sequencing protocols, several libraries of varying insert sizes may be needed, requiring a total of up to 1mg of purified high molecular weight DNA. A good conservative rule of thumb is that an average biological solid tissue contains 0.5-1.0 mg of extractable high quality DNA per gram of wet weight.
  • Heterozygosity and polymorphism:Many taxa display a considerable degree of genomic variation among individuals. For these taxa, pooling DNA samples may create problems for downstream analysis. Therefore, whenever possible, it is best to avoid combining DNA from multiple specimens. However, for many small taxa it may not be possible to obtain sufficient quantity of DNA from individual specimens. In such cases it may be possible to reduce heterozygosity by inbreeding of cultivable taxa. Alternatively, it may be possible to use whole genome amplification to obtain sufficient quantities of DNA for sequencing from a single individual.

  Documentation of specimens and samples

All GIGA related samples should be accompanied by sufficient descriptive and contextual data to ensure their adequate taxonomic identification (or later reassessment thereof) and to describe the geospatial and temporal origins, sex (when applicable), size, life stage, tissue type, collection date and collection method for the specimens. Contact information should be provided for the collector(s), all persons responsible for taxonomic identification, and appropriate representatives of all responsible institution(s).
Voucher specimens should be deposited in a public collection or made publicly available as a prerequisite for inclusion as a GIGA sample, such as the Smithsonian Institution or the Ocean Genomic Legacy. Both morphological and DNA vouchers are required for each sequencing project. Each voucher specimen must have an assigned unique alphanumeric identifier (accession number) from the holding institution and this identifier must be included in the sample metadata at the time of sample submission.

  Data standards

In the future, a single database with a web interface will be established to accommodate metadata and browsing for all GIGA specimens and samples.
  • Central data storage: All raw sequence reads generated as part of the GIGA project should be submitted to the NCBI Sequence Read Archive or equivalent public repository.
  • Sequencing Standards: Standards for sequencing are platform- and taxon-specific, and sensitive to the requirements of individual sequencing facilities. For these reasons, best practices and standards will be established for individual applications. In general, coverage with high quality raw sequence data should be sufficient to obtain reliable assemblies (this will also depend on the technology applied for sequencing). Moreover, whenever possible, genome projects should comply with agreed upon community standards as outlined via the Genome Standards Consortium.
  • Sequence Assembly, Annotation, and Analyses: With current state of the art sequencing methods, computerized assembly from short read fragments remains a necessity (Earl et al. 2011; Salzberg et al. 2012). A minimum set of statistical measures for an assembly should be generated including: (1) N50 length of scaffolds and contigs (see explanation of N50 in Bradnam et al. 2013), (2) percentage of 458 highly conserved eukaryotic genes present in the assembly (Parra et al. 2007), (3) paired read mapping statistics (REAPR - Recognising Errors in Assemblies using Paired Reads, (4) align to any available syntenic or physical maps (Lewin et al. 2009) and (5) transcript mapping statistics ( baa.pl).

References

  • Dawson, M.N., Raskoff, K.A., Jacobs, D.K. 1998. Field preservation of marine invertebrate tissue for DNA analyses. Mol. Mar. Biol. Biotech. 7:145-152
  • United Nations Environmental Programme (UNEP) 2011. Nagoya Protocol On Access To Genetic Resources And The Fair And Equitable Sharing Of Benefits Arising From Their Utilization To The Convention On Biological Diversity. Secretariat of    the Convention on Biological Diversity, Montreal, Quebec, Canada
  • Wong PBY, Wiley EO, Johnson WE, Ryder OA, O’Brien SJ, Haussler D, Koepfli K-P, Houck ML, Perelman P, Mastromonaco G, Bentley AC, Venkatesh B, Zhang Y-p, Murphy RW. 2012. Tissue sampling methods and standards for vertebrate genomics. GigaScience. 1:8.
English French German Italian Portuguese Russian Spanish

GIGA
Nova Southeastern University
Dania Beach, FL 33004
954-242-3600