Files

Download

Download Full Text (762 KB)

Keywords

Machine Learning, Prediction, Ontology

Abstract

US researchers are required to deposit data in public repositories so other researchers can validate their findings and reuse their data for new studies. There are > 100,000 transcriptomic (RNA) datasets in the public domain. However, it is of little use without metadata (details about individual samples e.g. sex, race). Having quality metadata provides many benefits including

  • controlling for confounding factors.
  • improving precision medicine.

Research has suggested

  • males are studied more than females.
  • people of European descent are studied most.

We need to make representation more equitable so genetic research will be relevant to all people. The first step is characterizing current subgroup representation in more detail. But classifying datasets by their metadata is time intensive and requires manual curation because metadata are described inconsistently across studies.

Document Type

Poster

Publication Date

2024-03-21

Language

English

College

Life Sciences

Department

Biology

University Standing at Time of Publication

Senior

Making Sense of Messy Data: Classifying Metadata Variables in Public Transcriptomic (RNA) Data

Share

COinS