Last edited by
Quratt ul ain Siddique
Summary:
In this paper they introduced the most comprehensive method for predicting the function of proteins. Their approach differs in several respects from the earlier work in that it uses a multistage decomposition that makes use of both unsupervised and supervised machine learning techniques; they refer to this as Unsupervised-Supervised Tree (UST) algorithm.
The typical first stage (optional) of the UST uses clustering algorithms such as neural network self organizing maps (SOMs) and K-means; this is the unsupervised stage. Subsequent indispensable stages typically involve constructing a Maximum Contrast Tree (MCT) so that protein functional relationships can be mapped onto the relational tree structure.
The MCTs are a family of completely independent algorithms that can be used alone. Testing is based on a newly developed MLIC (Multiple-Labeled Instance Classifier) based on supervised K nearest neighbor classifier on the tree structure. Performance has been compared with the decision tree C4.5 and C5 programs and with support vector machines.
Based on the experiments, UST algorithms appear to perform considerably better than decision tree algorithms C4.5 and C5, and support vector machines, and can provide a viable alternative to supervised or unsupervised methods alone. In addition, UST and MLIC classifiers are capable of handling protein functional classes with a small number of proteins (rare events), and also handle multifunctional proteins. The abilities of the USTs and MLICs to handle such cases means that a larger dataset can be used, which may provide deeper insight into protein functional relationships at the genomic level, and thus may lead to a better understanding of evolution at a molecular and genomic level.
0 comments:
Post a Comment