Gene-expression profiling enables researchers to quantify transcription levels in cells, thus providing insight into functional mechanisms of diseases and other biological processes. However, because of the high dimensionality of these data and the sensitivity of measuring equipment, expression data often contains unwanted confounding effects that can skew analysis. For example, collecting data in multiple runs causes nontrivial differences in the data (known as batch effects), known covariates that are not of interest to the study may have strong effects, and there may be large systemic effects when integrating multiple expression datasets. Additionally, many of these confounding effects represent higher-order interactions that may not be removable using existing techniques that identify linear patterns. We created Confounded to remove these effects from expression data. Confounded is an adversarial variational autoencoder that removes confounding effects while minimizing the amount of change to the input data. We tested the model on artificially constructed data and commonly used gene expression datasets and compared against other common batch adjustment algorithms. We also applied the model to remove cancer-type-specific signal from a pan-cancer expression dataset. Our software is publicly available at https://github.com/jdayton3/Confounded.
College and Department
Life Sciences; Biology
BYU ScholarsArchive Citation
Dayton, Jonathan Bryan, "Adversarial Deep Neural Networks Effectively Remove Nonlinear Batch Effects from Gene-Expression Data" (2019). Theses and Dissertations. 7521.
batch effects, batch correction, gene expression, transcriptomics, deep learning, adversarial neural network, variational autoencoder