MUSC Faculty Journal Articles

Automatic Annotation of Protein Motif Function with Gene Ontology terms

Xinghua Lu, Medical University of South Carolina
Chengxiang Zhai, University of Illinois at Urbana-Champaign
Vanathi Gopalakrishnan, University of Pittsburgh
Bruce G. Buchanan, University of Pittsburgh

Document Type

Article

Embargo Period

9-2-2004

Publication Date

9-2-2004

Abstract

Background: Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist for identifying candidate protein motifs at the whole genome level. However, a much needed and important task is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results: This paper presents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifs is viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and a GO term association is found to be a very useful feature. We take advantage of the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correct association. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions: In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about the functions of newly discovered candidate protein motifs.

Journal

BMC Bioinformatics

DOI

https://doi.org/10.1186/1471-2105-5-122

Recommended Citation

Lu, Xinghua; Zhai, Chengxiang; Gopalakrishnan, Vanathi; and Buchanan, Bruce G., "Automatic Annotation of Protein Motif Function with Gene Ontology terms" (2004). MUSC Faculty Journal Articles. 54.
https://medica-musc.researchcommons.org/facarticles/54

Download

COinS

MUSC Faculty Journal Articles

Automatic Annotation of Protein Motif Function with Gene Ontology terms

Document Type

Embargo Period

Publication Date

Abstract

Journal

DOI

Recommended Citation

Browse

Search

Author Corner

MUSC Faculty Journal Articles

Automatic Annotation of Protein Motif Function with Gene Ontology terms

Authors

Document Type

Embargo Period

Publication Date

Abstract

Journal

DOI

Recommended Citation

Share

Browse

Search

Author Corner