Abstract
A protein's properties are influenced by both its amino-acid sequence and its three-dimensional conformation. Ascertaining a protein's sequence is relatively easy using modern techniques, but determining its conformation requires much more expensive and time-consuming techniques. Consequently, it would be useful to identify a method that can accurately predict a protein's secondary-structure conformation using only the protein's sequence data. This problem is not trivial, however, because identical amino-acid subsequences in different contexts sometimes have disparate secondary structures, while highly dissimilar amino-acid subsequences sometimes have identical secondary structures. We propose (1) to develop a set of metrics that facilitates better comparisons between dissimilar subsequences and (2) to design a custom set of inputs for machine-learning models that can harness contextual dependence information between the secondary structures of successive amino acids in order to achieve better secondary-structure prediction accuracy.
Degree
MS
College and Department
Physical and Mathematical Sciences; Computer Science
Rights
http://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Seeley, Matthew Benjamin, "Feature Identification and Reduction for Improved Generalization Accuracy in Secondary-Structure Prediction Using Temporal Context Inputs in Machine-Learning Models" (2015). Theses and Dissertations. 5267.
https://scholarsarchive.byu.edu/etd/5267
Date Submitted
2015-05-01
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd8506
Keywords
Bioinformatics, machine learning, secondary-structure prediction, amino-acid properties
Language
english