Abstract
This thesis presents two papers addressing important biochemical prediction challenges. The first paper focuses on accurate protein distance predictions and introduces updates to the ProSPr network. We evaluate its performance in the Critical Assessment of techniques for Protein Structure Prediction (CASP14) competition, investigating its accuracy dependence on sequence length and multiple sequence alignment depth. The ProSPr network, an ensemble of three convolutional neural networks (CNNs), demonstrates superior performance compared to individual networks. The second paper addresses the issue of accurate ligand ranking in virtual screening for drug discovery. We propose MILCDock, a machine learning consensus docking tool that leverages predictions from five traditional molecular docking tools. MILCDock, an ensemble of eight neural networks, outperforms single-network approaches and other consensus docking methods on the DUD-E dataset. However, we find that LIT-PCBA targets remain challenging for all methods tested. Furthermore, we explore the effectiveness of training machine learning tools on the biased DUD-E dataset, emphasizing the importance of mitigating its biases during training. Collectively, this work emphasizes the power of ensembling in deep learning-based biochemical prediction problems, highlighting improved performance through the combination of multiple models. Our findings contribute to the development of robust protein distance prediction tools and more accurate virtual screening methods for drug discovery.
Degree
MS
College and Department
Physical and Mathematical Sciences; Computer Science
Rights
https://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Stern, Jacob A., "More is Better than One: The Effect of Ensembling on Deep Learning Performance in Biochemical Prediction Problems" (2023). Theses and Dissertations. 10123.
https://scholarsarchive.byu.edu/etd/10123
Date Submitted
2023-08-07
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd12961
Keywords
deep learning, protein structure prediction, docking, ensembles
Language
english