UVa |  School of Medicine |  Arts & Sciences |  Health System |  Health Sciences Libray  

Resources Quick Links

Cores

Centers, Institutes, and Other Research Facilities

Other Support Facilities

Policies

 

Resources

3. Analyzing Your Data

Biomolecular Data
Sequence Homology Searching Online

BLAST searches
NCBI is the primary Web site for BLAST

FASTA searches
The European Bioinformatics Institute (EBI) hosts the primary web site for FASTA searches
Contact: Michael Black, PhD, mblack@virginia.edu

WU-BLAST2 server
Also hosted by EBI
Contact: Michael Black, PhD, mblack@virginia.edu

Finding specific sequences on the web
DNA sequences

Note that GenBank, EMBL, and the DDJB are all synchronized daily

GenBank (NCBI)
GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences
Search GenBank using NCBI's integrated search engine ENTREZ
Contact: Michael Black, PhD, mblack@virginia.edu

European Molecular Biology Laboratory's EMBL database
The EMBL Nucleotide Sequence Database (also known as EMBL-Bank) constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing  projects and patent applications.
Contact: Michael Black, PhD, mblack@virginia.edu

DNA Databank of Japan (DDBJ)
DDBJ is one of three summit databanks that construct the DDBJ/EMBL/GenBank International Nucleotide Sequence Database.
Contact: Michael Black, PhD, mblack@virginia.edu

Protein sequences

Protein sequence data are primarily available via UniProt, the Unified Protein Knowledgebase

ExPaSy
The ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE
Contact: Michael Black, PhD, mblack@virginia.edu

Protein Information Resource Center (PIR), Georgetown University Medical Center
An integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies
Contact: Michael Black, PhD, mblack@virginia.edu

NCBI's ENTREZ integrated search system
ENTREZ can be used to find predicted structures. Also has a free viewer (Cn3D) for download
Contact: Michael Black, PhD, mblack@virginia.edu

Sequence Analysis

Tools available at the UVA Bioinformatics Core
Access to GCG suite of tools
Command line, UNIX X graphical interface, or Web based via Seqlab
Contact: Michael Black, PhD, mblack@virginia.edu

EMBOSS collection of applications
Open source collection comparable to GCG. On UNIX and Apple OS X machines, EMBOSS can be run in a Java graphical environment using the JemBoss package. Command line EMBOSS also available on the Bioinformatics Core's server watson.achs.virginia.edu
Contact: Michael Black, PhD, mblack@virginia.edu

Online Resources

Other freely available resources:

San Diego Supercomputer Center
Online sequence analysis workbench server
Contact: Michael Black, PhD, mblack@virginia.edu

European Bioinformatics Institute
Numerous analysis tools
Contact: Michael Black, PhD, mblack@virginia.edu

Institut Pasteur
Numerous analysis tools
Contact: Michael Black, PhD, mblack@virginia.edu

Genomic Analysis

Other freely available resources:

NCBI's genomic resources
Tools and databases for working with genomic data
Contact: Michael Black, PhD, mblack@virginia.edu

Sanger Institute and EBI's Ensemble Genome Suite
Genome suites
Contact: Michael Black, PhD, mblack@virginia.edu

UCSC Genome Browser tools
Genome browser tools
Contact: Michael Black, PhD, mblack@virginia.edu

Sequence Alignment

ClustalW
Widely used multiple sequence alignment programs available at command line on watson.achs.virginia.edu. Can be freely downloaded from ftp://ftp.ebi.ac.uk/pub/software/clustalw2/
Contact: Michael Black, PhD, mblack@virginia.edu

Structure Prediction

PDB
Known protein structures can be viewed and downloaded
Contact: Michael Black, PhD, mblack@virginia.edu

ExPASy
Tools for predicting protein structure and viewing simulated structures (Swiss-PdbViewer, aka DeepView)
Contact: Michael Black, PhD, mblack@virginia.edu

NCBI's ENTREZ integrated search system
ENTREZ can be used to find predicted structures. Also have free viewer (Cn3D) for download
Contact: Michael Black, PhD, mblack@virginia.edu

Function Prediction

NCBI's RefSeq database
See "Sequence Homology Searching online" above
Contact: Michael Black, PhD, mblack@virginia.edu

The Pfam data- base (Protein Family)
The Pfam data-base (Protein Family)
Contact: Michael Black, PhD, mblack@virginia.edu

Transcription factors

TRANSFAC database
TRANSFAC® 7.0 Public 2005 contains data on transcription factors, their experimentelly-proven binding sites, and regulated genes. Its broad compilation of binding sites allows the derivation of positional weight matrices.
Contact: Michael Black, PhD, mblack@virginia.edu

TESS search tools
TESS is a web tool for predicting transcription factor binding sites in DNA sequences. It can identify binding sites using site or consensus strings and positional weight matrices from the TRANSFAC, JASPAR, IMD, and our CBIL-GibbsMat database. You can use TESS to search a few of your own sequences or for user-defined CRMs genome-wide near genes throughout genomes of interest.
Contact: Michael Black, PhD, mblack@virginia.edu

McPromoter gene promotor searching tool
McPromoter is a program aiming at the exact localization of eukaryotic RNA polymerase II transcription start sites.
Contact: Michael Black, PhD, mblack@virginia.edu

Gene Expression Profiling/Microarray

Data generated by the Biomolecular Research Facility (BRF) are sent directly to the Bioinformatics Core.
Investigators access their data via the Affy LIMS systme (an account will be created by BRF staff when you first place an order)
Contact: Michael Black, PhD, mblack@virginia.edu


GEOSS home page
Contact: Michael Black, PhD, mblack@virginia.edu

Non-commercial gene expression analysis software

GenePattern (BROAD Institute at MIT)
Comprehensive, modular package for gene expression analysis
Contact: Michael Black, PhD, mblack@virginia.edu

Expression Profiler at the EBI
An online set of R-based tools for microarray analysis
Contact: Michael Black, PhD, mblack@virginia.edu

dChip
One of the oldest and established open-source packages (requires Windows OS) developed by Drs. Li and Wong
Contact: Michael Black, PhD, mblack@virginia.edu

TM4 suite
Open source set of programs (Java based) developed originally by scientists at TIGR (now the J. Craig Venter Institute) and the Dana-Farber Cancer Institute and Harvard School of Public Health
TM4 is targeted to researchers with 2-color array data (especially the Data Manager tool, MADAM), it can also be used to analyze Affymetrix data
Contact: Michael Black, PhD, mblack@virginia.edu

R and Bioconductor
R is an open source software environment for statistical computing and graphics, and Bioconductor is a suite of open source R tools specifically for genomic data analysis, including gnee expression data.
This approach to expression data analysis is arguably the least user friendly, but allows the use of a large selection of analysis tools.
Contact: Michael Black, PhD, mblack@virginia.edu

Pathway analysis

GenMapp
Free computer application from the Gladstone Institutes to visualize gene expression data on maps of biological pathways
Contact: Michael Black, PhD, mblack@virginia.edu

Gene expression databases

NCBI's Gene Expression Omnibus (GEO)
Other gene expression databases
There are also numerous organism specific and pathological-based gene expression databases that can be found by simple key word Web searches
Contact: Michael Black, PhD, mblack@virginia.edu

Example: Mouse embryo gene expression map
Contact: Michael Black, PhD, mblack@virginia.edu

Example: mouse brain database
Contact: Michael Black, PhD, mblack@virginia.edu

Example: E. coli gene expression database at the University of Oklahoma
Contact: Michael Black, PhD, mblack@virginia.edu

A searchable registry of genome projects
Contact: Michael Black, PhD, mblack@virginia.edu

Genome/Model Organisms

Genome/model organisms
Check for information about a specific genome project at the Genome Online Database (GOLD).

Genome Browsers

NCBI's genome Map Viewer

Ensemble

UCSC's genome browser
Provides entries to specific organism's genome resources

Model organisms have many additional resources, such as

Yeast (Saccharomyces cerevisiae) database

The zebrafish (Danio rerio)

Mouse resources at the Jackson Laboratory

Sanger Instite model organizm genome sites

Microbes

Bacterial genomes at the Sanger Institute

Comprehensive Microbial Resource at the J. Craig Venter Institute (formally TIGR)

Plants

The cowpea (Vigna unguiculata) genome

The rice (Oryza sativa) genome project

The corn (Zea mays) genome project

Phylogenetics

PHYLIP
Command line suite of tools for analyzing all types of phylogenetic data. The Bioinformatics Core maintains the complete PHYLIP package on our primary molecular biology server watson.achs.virginia.edu
Contact: Michael Black, PhD, mblack@virginia.edu

Institut Pasteur
Online tools for phylogenetic analysis
Contact: Michael Black, PhD, mblack@virginia.edu

Tree of Life Project
Phylogenetic/taxonomic information on organisms
Contact: Michael Black, PhD, mblack@virginia.edu

Clinical Data

UVa Center for Survey Research
CSR is a full-service survey research facility, offering: customized project design, from sampling to instrument development, Professional interviewing and data collection using the latest survey technologies, and data analysis and report preparation.
Contact: surveys@virginia.edu

Health Services Data

Spatial and Statistical Data and Services (Scholars' Lab, UVA Library)
Specialized software for GIS and statistical analysis
Contact: Kelly Johnston, MS GIS, kgj3t@virginia.edu

Research Computing Lab (UVA Library)
Software and Consultation services in a wide variety of technologies and methodologies for high performance and research computing.
Contact: Andrew Sallans, MLIS, als9q@virginia.edu

Tissue

Tissue Microarrays (Biorepository and Tissue Research Facility)
Tissue microarray technology places up to a thousand discs of tissue on a single glass slide, which can then be assayed by histologic staining, immunohistochemistry and/or in situ hybridization.
Contact: Craig Rumpel, MS, Biorepository Manager, crumpel@virginia.edu