Abstract
Breast cancer is the most frequently diagnosed cancer type and the leading cause of cancer death; it accounts for 23% of total cases and 14% of deaths, with over 2 million new cases diagnosed and 600,000 deaths in 2020. This is a huge public health burden. Traditionally, clinical subtypes of breast cancer are defined based on a patient's immunohistochemistry profile. These profiles are determined by the combined expression status of three receptors: estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2. It has also been observed that women of African descent, more commonly present with ER, PR and HER2 negative breast cancer. These cancers are typically referred to as triple negative breast cancers (TNBC). TNBC is associated with worse outcomes when compared to other subtypes. In chapter 1, we undertake a review of existing literature to identify relevant datasets as well as broad themes across research articles associated with these datasets. In chapter 2, we develop a pipeline to identify and curate publicly available breast cancer datasets and annotate the metadata against the National Cancer Institute Thesaurus, thus making it easier to infer semantic meaning and compare insights across datasets. In chapter 3, we identified genes that were consistently expressed differently between African Americans and European Americans with breast cancer via a meta-analysis; additionally, we compared gene expression levels based on ER, PR, HER2 and TNBC status. In chapter 4, we evaluate the performance of several machine learning classifiers in addressing class imbalance in gene expression datasets. In this dissertation, we have identified genes that biologists may focus on to provide better treatments for TNBC. We also provide computational tools that are freely available for other researchers to use. These tools make it easier to identify genes in different comparisons.
Degree
PhD
College and Department
Life Sciences; Biology
Rights
https://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Nwosu, Ifeanyichukwu O., "A Comprehensive Review and Meta-Analysis of Breast Cancer Gene Expression Data: Understanding the Molecular Complexity of a Silent Epidemic" (2024). Theses and Dissertations. 11017.
https://scholarsarchive.byu.edu/etd/11017
Date Submitted
2024-08-19
Document Type
Dissertation
Permanent Link
https://apps.lib.byu.edu/arks/ark:/34234/q2476c3e08
Keywords
breast cancer, gene-expression microarrays, gene-expression profiling, race, immunohistochemistry, immunohistochemistry status, disease subtypes, triple negative breast cancer, health disparities, racial disparity, meta-analysis, machine learning, class imbalance, predictive modeling, ensemble methods
Language
english