CIS595-Graduate Seminar
Knowledge Discovery and Data Mining
COMMONESS, COMPLEXITY, FUNCTION AND FLAVORS OF INTRINSIC PROTEIN DISORDER
Professor Zoran Obradovic
Center for Information Science and Technology, Temple University
Oct. 24, 2000
Abstract
Intrinsic protein disorder refers to segments or to whole proteins that fail
to fold to a fixed 3D structure on their own. Contrary to the
{Sequence} ={3D Structure} ={Function} paradigm, there are examples of
proteins with long intrinsic disorders that carry out function. In order to
realize the potential of the human genome project, it is essential to
determine the commonness and types of intrinsic protein disorder and to
determine the set of functions carried out by such proteins. Towards such an
objective we assembled a database of known disordered protein sequence segments
and used it for developing predictors of protein disorder from the primary
sequence information. In addition to designing global classifiers trained on
all disorder data, we also used disjoint data subsets for developing
specialized
disorder predictors. These partitions were initially defined using domain
specific knowledge, but we also employed a novel incremental competitive
machine learning algorithm that automatically partitions a set of available
disordered proteins into subsets with similar properties. In the talk, we will
describe data mining and machine learning procedures used in the study and will
report results obtained by analyzing sequences from the Protein Data Bank,
Swiss
Protein database and 28 complete genomes. The obtained results provide strong
evidence that: (1) disorder is a very common element of protein structure;
(2) strength of disorder predictions is correlated to the sequence complexity;
(3) eucaryotes may have a higher proportion of intrinsic protein disorder than
eubacteria or archaebacteria; and (4) at least three different types of
protein disorder exist in nature.
Paper (postscript)
Biography
Zoran Obradovic is the Director at the Center for Information Science and
Technology and a Professor of Computer and Information Sciences at Temple
University. His research interests focus on solving challenging Bioinformatics,
E-Commerce and Computational Finance problems by developing and integrating
data mining and statistical learning technology for an efficient knowledge
discovery at large databases. Funded by NSF, NIH, DOE and industry, during
the last decade he contributed to about 100 refereed articles on these and
related topics and to several academic and commercial software systems. He is
an editorial board member at the Multiple Valued Logic, Journal of
Computational Intelligence in Finance and the IEEE Transactions on Education.