Libraries: Molecular biology, biochemistry, and genetic databases from the National Center for Biotechnology Information: Downloads

Handouts/Links from Orientations

Poll

Was this Guide useful to you?

Yes: 0 votes (0%)

No: 0 votes (0%)

Total Votes: 0

Downloads

Downloads

BLAST (Stand-alone)

BLAST executables for local use are provided for Solaris, LINUX, Windows, and MacOSX systems. See the README file in the ftp directory for more information. Pre-formatted databases for BLAST nucleotide, protein, and translated searches also are available for downloading under the db subdirectory.

FTP: BLAST Databases

Sequence databases for use with the stand-alone BLAST programs. The files in this directory are pre-formatted databases that are ready to use with BLAST.

FTP: CDD

This site provides full data records for CDD, along with individual Position Specific Scoring Matrices (PSSMs), mFASTA sequences and annotation data for each conserved domain. See the README file for full details.

FTP: ClinVar Data

This site provides full data extractions in XML and summary data in VCF format. It contains files with information about standard terms used in ClinVar, MedGen, and GTR.

FTP: FASTA BLAST Databases

Sequence databases in FASTA format for use with the stand-alone BLAST programs. These databases must be formatted using formatdb before they can be used with BLAST.

FTP: GenBank

This site contains files for all sequence records in GenBank in the default flat file format. The files are organized by GenBank division, and the full contents are described in the README.genbank file.

FTP: GenPept

The protein sequences corresponding to the translations of coding sequences (CDS) in GenBank are collected for each GenBank release..Please see the README file in the directory for more information.

FTP: Gene

This site contains three directories: DATA, GeneRIF and tools. The DATA directory contains files listing all data linked to GeneIDs along with subdirectories containing ASN.1 data for the Gene records. The GeneRIF (Gene References into Function) directory contains PubMed identifiers for articles describing the function of a single gene or interactions between products of two genes. Sample programs for manipulating gene data are provided in the tools directory. Please see the README file for details.

FTP: Gene Expression Omnibus (GEO) Profiles and Datasets

This site contains GEO data in two formats: SOFT (Simple Omnibus in Text Format) and MINiML (MIAME Notation in Markup Language). Summary text files and supplementary data are also available. Please see the README.TXT file for more information.

FTP: Genome

This site contains genome sequence and mapping data for organisms in Entrez Genome. The data are organized in directories for single species or groups of species. Mapping data are collected in the directory MapView and are organized by species. See the README file in the root directory and the README files in the species subdirectories for detailed information.

FTP: Genome Mapping Data

Contains directories for each genome that include available mapping data for current and previous builds of that genome.

FTP: HomoloGene

This site contains data for each build of HomoloGene, beginning with build 35. Complete data for each build are provided in XML, and a data summary is provided in tab-delimited text format.

FTP: NCBI Field Guide Manual

Downloadable material for NCBI's previously offered Field Guide training course.

FTP: NCBI Structure Course Materials

PowerPoint slides, handouts and exercises for the previously offered NCBI course "Exploring 3D Molecular Structures."

FTP: NCBI Taxonomy

This site contains the full taxonomy database along with files associating nucleotide and protein sequence records with their taxonomy IDs. See the taxdump_readme.txt and gi_taxid.readme files for more information.

FTP: Protein Clusters

This site contains data from the Protein Clusters database arranged by release date. See the README files for more information.

FTP: PubChem

This site provides data from the PubChem Substance, Compound and Bioassay databases for download via ftp. Full downloads of the databases are available along with daily, weekly and monthly updates for Substance and Compound. Substance and Compound data are provided in ASN.1, SDF and XML formats. See the README files for more information.

FTP: RefSeq

This site contains all nucleotide and protein sequence records in the Reference Sequence (RefSeq) collection. The ""release"" directory contains the most current release of the complete collection, while data for selected organisms (such as human, mouse and rat) are available in separate directories. Data are available in FASTA and flat file formats. See the README file for details.

FTP: SKY/M-Fish and CGH Data

This site contains SKY-CGH data in ASN.1, XML and EasySKYCGH formats. See the skycghreadme.txt file for more information.

FTP: SNP

Downloadable data for SNP.

FTP: Sequence Read Archive (SRA) Download Facility

This site contains next-generation sequencing data organized by the submitted sequencing project.

FTP: Site

FTP download site for NCBI databases, tools, and utilities.

FTP: Structure (MMDB)

This site contains ASN.1 data for all records in MMDB along with VAST alignment data and the non-redundant PDB (nr-PDB) data sets. See the README file for more information.

FTP: Trace Archive

This site contains the trace chromatogram data organized by species. Data include chromatogram, quality scores, FASTA sequences from automatic base calls, and other ancillary information in tab-delimited text as well as XML formats. See the README file for details.

FTP: UniGene

This site contains individual directories for each organism with data in UniGene. The data for each species includes the unique sequence for each UniGene cluster, all sequences in each cluster in FASTA format and library information for the cluster. See the README file for further details.

FTP: UniVec

This site contains the UniVec and UniVec_Core databases in FASTA format. See the README.uv file for details.

FTP: Whole Genome Shotgun Sequences

This site contains whole genome shotgun sequence data organized by the 4-digit project code. Data include GenBank and GenPept flat files, quality scores and summary statistics. See the README.genbank.wgs file for more information.

FTP: dbGAP Open-Access Data

Open-access data generally include summaries of genotype/phenotype association studies, descriptions of the measured variables, and study documents, such as the protocol and questionnaires. Access to individual-level data, including phenotypic data tables and genotypes, requires varying levels of authorization.

FTP: dbMHC Data

This site contains data in separate directories for the various projects and resources within the database of human major histocompatibility (dbMHC).

MEDLINE (Leasing)

NLM leases MEDLINE/PubMed to U.S. individuals or organizations.

NCBI Data Specifications

Specifications for NCBI data in ASN.1 or DTD format are available on the Index of data_specs page. The "NCBI_data_conversion.html" links to the conversion tool.

National Library of Medicine (NLM) DTDs

A suite of tag sets for authoring and archiving journal articles as well as transferring journal articles from publishers to archives and between archives. There are four tag sets: Archiving and Interchange Tag Set - Created to enable an archive to capture as many of the structural and semantic components of existing printed and tagged journal material as conveniently as possible; Journal Publishing Tag Set - Optimized for archives that wish to regularize and control their content, not to accept the sequence and arrangement presented to them by any particular publisher; Article Authoring Tag Set - Designed for authoring new journal articles; NCBI Book Tag Set - Written specifically to describe volumes for the NCBI online libraries.

PubChem Download Service

This service allows users to download compound or substance records corresponding to a set of PubChem identifiers, which can be supplied manually or through a text file. Numerous download formats are available, including SDF, XML and SMILES.

PubMed Central (PMC) Open-Access Subset

The PMC Open-Access Subset is a relatively small part of the total collection of articles in PMC. Whereas the majority of articles in PMC are subject to traditional copyright restrictions, these articles are protected by copyright, but are made available under a Creative Commons or similar license that generally allows more liberal redistribution and reuse than a traditional copyright. Please refer to the license statement in each article for specific terms of use.

RSS Feeds

Subscribe to Web/RSS feeds for updates about NCBI resources.

National Institutes of Health United States National Library of Medicine National Center for Biotechnology Information. (2016). All Resources. Retrieved 12/27/2016 from

https://www.ncbi.nlm.nih.gov/guide/all/