Keywords

Breast cancer; gene-expression profiling; race; immunohistochemistry status; disease subtypes; triple negative breast cancer; health disparities

Abstract

Scholarly requirements have led to a massive increase of transcriptomic data in the public domain, withmillions of samples available for secondary research. We identified gene-expression datasets representing10,214 breast-cancer patients in public databases. We focused on datasets that included patient metadataon race and/or immunohistochemistry (IHC) profiling of the ER, PR, and HER-2 proteins. This reviewprovides a summary of these datasets and describes findings from 32 research articles associated withthe datasets. These studies have helped to elucidate relationships between IHC, race, and/or treatmentoptions, as well as relationships between IHC status and the breast-cancer intrinsic subtypes. We have alsoidentified broad themes across the analysis methodologies used in these studies, including breast cancersubtyping, deriving predictive biomarkers, identifying differentially expressed genes, and optimizing dataprocessing. Finally, we discuss limitations of prior work and recommend future directions for reusing thesedatasets in secondary analyses.

Original Publication Citation

Nwosu IO‡ and Piccolo SR*. A systematic review of datasets that can help elucidate relationships among gene expression, race, and immunohistochemistry-defined subtypes in breast cancer. Cancer Biology & Therapy, 2021, 22:7-9, pp. 417-429

Document Type

Peer-Reviewed Article

Publication Date

2021-08-19

Publisher

Taylor and Francis Group

Language

English

College

Life Sciences

Department

Biology

University Standing at Time of Publication

Associate Professor

Included in

Biology Commons

Share

COinS