NCBI’s Nucleotide database

The Nucleotide database is one of three major NCBI databases containing nucleotide sequence data. It contains sequences from the following sources:

  • The International Nucleotide Sequence Database Collaboration (INSDC, which includes NCBI’s GenBank, the Nucleotide Archive of the European Molecular Biology Laboratory (EMBL), and the DNA Databank of Japan (DDBJ)): a repository containing primary sequence data directly sequenced and submitted by researchers;
  • Reference Sequences (RefSeq): a NCBI-curated collection of non-redundant sequences for major organisms which are derived from primary GenBank data and annotated by domain experts;
  • Third Party Annotation (TPA) Sequence Database: a database of submitter-annotated sequences assembled or derived from primary INSDC data;
  • Protein Data Bank (PDB): an archive of 3-D structural data for biological macromolecules from which nucleotide sequence data is extracted.

The Nucleotide database automatically maps keyword queries to NCBI’s taxonomic organism classification system, and provides a breakdown of the search results by species. Searches can be limited to a single species using the [Organism] field tag. Filters can also be applied for bacteria, INSDC/GenBank records, mRNA, and RefSeq records. Limits include date, source database, gene location, molecule and sequence types. As with other NCBI resources, My NCBI allows you to save Nucleotide searches, create e-mail alerts, and set up preferences for displaying and filtering search results.