Abstract
Gene-expression profiling enables researchers to quantify transcription levels in cells, thus providing insight into functional mechanisms of diseases and other biological processes. However, because of the high dimensionality of these data and the sensitivity of measuring equipment, expression data often contains unwanted confounding effects that can skew analysis. For example, collecting data in multiple runs causes nontrivial differences in the data (known as batch effects), known covariates that are not of interest to the study may have strong effects, and there may be large systemic effects when integrating multiple expression datasets. Additionally, many of these confounding effects represent higher-order interactions that may not be removable using existing techniques that identify linear patterns. We created Confounded to remove these effects from expression data. Confounded is an adversarial variational autoencoder that removes confounding effects while minimizing the amount of change to the input data. We tested the model on artificially constructed data and commonly used gene expression datasets and compared against other common batch adjustment algorithms. We also applied the model to remove cancer-type-specific signal from a pan-cancer expression dataset. Our software is publicly available at https://github.com/jdayton3/Confounded.
Degree
MS
College and Department
Life Sciences; Biology
Rights
http://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Dayton, Jonathan Bryan, "Adversarial Deep Neural Networks Effectively Remove Nonlinear Batch Effects from Gene-Expression Data" (2019). Theses and Dissertations. 7521.
https://scholarsarchive.byu.edu/etd/7521
Date Submitted
2019-07-01
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd12239
Keywords
batch effects, batch correction, gene expression, transcriptomics, deep learning, adversarial neural network, variational autoencoder
Language
english