Duke Integrated Genomics



Software Applications

A variety of applications have been developed with the Duke IGSP projects for the analysis of microarray data. These applications result from statistical research directed at developing methodologies for the most effective classification and prediction of biological and clinically-relevant phenotypes. These include methods for the analysis of microarray data as well as tools to facilitate the use of microarray datasets. Additional programs have been developed to facilitate the understanding of gene profiling results -- tools for functional annotation of genes identified in a variety of contexts.

Profiler [download Zip] uses binary probit regression models in which the predictors are singular factors (principal components) in the expression of a selected set of genes, and the genes selected are simply chosen via sample correlation with the binary state. The Bayesian statistical analysis induces shrinkage towards zero of the estimated regression parameters, and produces fitted regression coefficients for the genes selected that are available to rank the genes according to their contribution to the discrimination, and also as weights to weight the genes in a weighted average defining an overall metagene profile. The analysis produces fitted classification probabilities and also allows for cross-validation prediction, using leave-one-out methods, to properly assess predictive robustness. The program also provides the mechanism to predict new samples based on an initially-trained model.

Tree Profiler [download TAR Gzip] uses classification and regression tree methods for binary classification. One approach that has been found useful in a number of studies in cancer and other contexts is to use multiple metagene summaries as predictors of a phenotype. The metagenes are simply gene expression signatures representing patterns of co-expression generated by initial clustering of the expression data . The classification tree strategy provides a mechanism to sample many sources of data to predict a phenotype, such as ER status in breast tumors. The advantage in this approach is the ability to utilize multiple forms of data; this could be multiple metagenes (clusters), other genomic data such as DNA methylation patterns or DNA copy number patterns, protein profiles, or other biological or clinical data, all combined together using the statistical analysis to identify those data that best classify and develop predictions of samples.

ChipComparer This program is designed to identify common genesets on different microarrays. The program will first map each probeset ID in your selected microarray chips (A and B) to corresponding LocusID using LocusLink and UniGene dbs, then report the probeset ID pair (from A and B) that refer to the same gene locus (if same organism) or the orthologs (if different organisms, using NCBI-HomoloGene).

Duke Integrated Genomics (DIG) Annotation System A web-based data management and information system for retrieval of a variety of functional information sources linked to the genes included on most microarrays utilized within the Duke Microarray Center. The system also provides access to a powerful method for literature searching.

GATHER (Gene Annotation Tool to Help Explain Relationships is a computational tool that analyzes lists of genes identified in high throughput experiments. It will identify significant Gene Ontology functions, biological pathways, interacting proteins, microRNA regulation, transcription factor regulation, or other biological systems to develop a deeper insight into the biology underlying the gene signature. It can infer novel functions and successfully predicted 90% of the functions in an evaluation over a broad range of gene groups.

File Merger merges the contents from Source and Target Files, according to the shared identifiers, or the correlationship in the Bridging file.