Faculty Publications

The Ability to Classify Patients Based on Gene-Expression Data Varies by Algorithm and Performance Metric

Stephen Piccolo, Brigham Young UniversityFollow
Avery Mecham, Brigham Young University
Nathan P. Golightly, Brigham Young University
Jérémie L. Johnson, Brigham Young University
Dustin B. Miller, Brigham Young University

Keywords

patient classification, transcriptomic measurements, benchmark comparisons

Abstract

By classifying patients into subgroups, clinicians can provide more effective care than using a uniform approach for all patients. Such subgroups might include patients with a particular disease subtype, patients with a good (or poor) prognosis, or patients most (or least) likely to respond to a particular therapy. Transcriptomic measurements reflect the downstream effects of genomic and epigenomic variations. However, high-throughput technologies generate thousands of measurements per patient, and complex dependencies exist among genes, so it may be infeasible to classify patients using traditional statistical models. Machine-learning classification algorithms can help with this problem. However, hundreds of classification algorithms exist—and most support diverse hyperparameters—so it is difficult for researchers to know which are optimal for gene-expression biomarkers. We performed a benchmark comparison, applying 52 classification algorithms to 50 gene-expression datasets (143 class variables). We evaluated algorithms that represent diverse machine-learning methodologies and have been implemented in general-purpose, open-source, machine-learning libraries. When available, we combined clinical predictors with gene-expression data. Additionally, we evaluated the effects of performing hyperparameter optimization and feature selection using nested cross validation. Kernel- and ensemble-based algorithms consistently outperformed other types of classification algorithms; however, even the top-performing algorithms performed poorly in some cases. Hyperparameter optimization and feature selection typically improved predictive performance, and univariate feature-selection algorithms typically outperformed more sophisticated methods. Together, our findings illustrate that algorithm performance varies considerably when other factors are held constant and thus that algorithm selection is a critical step in biomarker studies.

Original Publication Citation

Piccolo SR*, Mecham A†, Golightly NP†, Johnson JL†, and Miller DB‡. The ability to classify patients based on gene-expression data varies by algorithm and performance metric. PLoS Computational Biology, 18(3): e1009926 (2022)

BYU ScholarsArchive Citation

Piccolo, Stephen; Mecham, Avery; Golightly, Nathan P.; Johnson, Jérémie L.; and Miller, Dustin B., "The Ability to Classify Patients Based on Gene-Expression Data Varies by Algorithm and Performance Metric" (2022). Faculty Publications. 7345.
https://scholarsarchive.byu.edu/facpub/7345

Document Type

Peer-Reviewed Article

Publication Date

2022-03-11

Publisher

PLOS

Language

English

College

Life Sciences

Department

Biology

University Standing at Time of Publication

Associate Professor

Copyright Use Information

https://lib.byu.edu/about/copyright/

Download

Included in

Biology Commons

COinS

BYU ScholarsArchive

Faculty Publications

The Ability to Classify Patients Based on Gene-Expression Data Varies by Algorithm and Performance Metric

Keywords

Abstract

Original Publication Citation

BYU ScholarsArchive Citation

Document Type

Publication Date

Publisher

Language

College

Department

University Standing at Time of Publication

Copyright Use Information

Included in

Search

Browse

BYU Links

Author Corner

Hosted by the

BYU ScholarsArchive

Faculty Publications

The Ability to Classify Patients Based on Gene-Expression Data Varies by Algorithm and Performance Metric

Authors

Keywords

Abstract

Original Publication Citation

BYU ScholarsArchive Citation

Document Type

Publication Date

Publisher

Language

College

Department

University Standing at Time of Publication

Copyright Use Information

Included in

Share

Search

Browse

BYU Links

Author Corner

Hosted by the