Abstract

Gene-expression profiling enables researchers to quantify transcription levels in cells, thus providing insight into functional mechanisms of diseases and other biological processes. However, because of the high dimensionality of these data and the sensitivity of measuring equipment, expression data often contains unwanted confounding effects that can skew analysis. For example, collecting data in multiple runs causes nontrivial differences in the data (known as batch effects), known covariates that are not of interest to the study may have strong effects, and there may be large systemic effects when integrating multiple expression datasets. Additionally, many of these confounding effects represent higher-order interactions that may not be removable using existing techniques that identify linear patterns. We created Confounded to remove these effects from expression data. Confounded is an adversarial variational autoencoder that removes confounding effects while minimizing the amount of change to the input data. We tested the model on artificially constructed data and commonly used gene expression datasets and compared against other common batch adjustment algorithms. We also applied the model to remove cancer-type-specific signal from a pan-cancer expression dataset. Our software is publicly available at https://github.com/jdayton3/Confounded.

Degree

MS

College and Department

Life Sciences; Biology

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2019-07-01

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd12239

Keywords

batch effects, batch correction, gene expression, transcriptomics, deep learning, adversarial neural network, variational autoencoder

Language

english

Included in

Biology Commons

Share

COinS