Abstract

This thesis presents two papers addressing important biochemical prediction challenges. The first paper focuses on accurate protein distance predictions and introduces updates to the ProSPr network. We evaluate its performance in the Critical Assessment of techniques for Protein Structure Prediction (CASP14) competition, investigating its accuracy dependence on sequence length and multiple sequence alignment depth. The ProSPr network, an ensemble of three convolutional neural networks (CNNs), demonstrates superior performance compared to individual networks. The second paper addresses the issue of accurate ligand ranking in virtual screening for drug discovery. We propose MILCDock, a machine learning consensus docking tool that leverages predictions from five traditional molecular docking tools. MILCDock, an ensemble of eight neural networks, outperforms single-network approaches and other consensus docking methods on the DUD-E dataset. However, we find that LIT-PCBA targets remain challenging for all methods tested. Furthermore, we explore the effectiveness of training machine learning tools on the biased DUD-E dataset, emphasizing the importance of mitigating its biases during training. Collectively, this work emphasizes the power of ensembling in deep learning-based biochemical prediction problems, highlighting improved performance through the combination of multiple models. Our findings contribute to the development of robust protein distance prediction tools and more accurate virtual screening methods for drug discovery.

Degree

MS

College and Department

Physical and Mathematical Sciences; Computer Science

Rights

https://lib.byu.edu/about/copyright/

Date Submitted

2023-08-07

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd12961

Keywords

deep learning, protein structure prediction, docking, ensembles

Language

english

Share

COinS