Skip to Main Content

Genetics : Sequence Databases

Sequence Databases

NCBI: BLAST
BLAST (Basic Local Alignment Search Tool) compares your sequences with other ungapped sequences and then ranks the matches statistically. BLAST Tutorial

NCBI: Nucleotides
Used to search for related DNA sequence files.

NCBI: Protein
Used to search for related amino acid sequence files. "Collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are the fundamental determinants of biological structure and function."

NCBI: GenBank
The NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.

EMBL-EBI Databases
European Bioinformatics Institute (EBI). EBI "manages databases of biological data including nucleic acid, protein sequences and macromolecular structures." The EMBL database is similar to GenBank and the DDBJ; information between the three is exchanged daily. While unnecessary to search EMBL if you have searched GenBank, EMBL does provide more cross references to related information (such as motifs, structure, and so on).

Pfam
Large collection of protein families and domains. For each protein family "you can look at multiple alignments, view protein domain architectures, examine species distribution, follow links to other databases, and view known protein structures."

PIR - International Protein Sequence Database
Contains protein sequences. The database, organized by homology and taxonomy, also contains information on function, classification of the protein and organism, literature references, and sites of biological interest.

UniProtKB/Swiss-Prot
Protein sequence database. Includes extensive annotations (description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), minimal redundancy, and links to other databases.