Files
Download Full Text (762 KB)
Keywords
Machine Learning, Prediction, Ontology
Abstract
US researchers are required to deposit data in public repositories so other researchers can validate their findings and reuse their data for new studies. There are > 100,000 transcriptomic (RNA) datasets in the public domain. However, it is of little use without metadata (details about individual samples e.g. sex, race). Having quality metadata provides many benefits including
- controlling for confounding factors.
- improving precision medicine.
Research has suggested
- males are studied more than females.
- people of European descent are studied most.
We need to make representation more equitable so genetic research will be relevant to all people. The first step is characterizing current subgroup representation in more detail. But classifying datasets by their metadata is time intensive and requires manual curation because metadata are described inconsistently across studies.
BYU ScholarsArchive Citation
Brown, Grace S.; Akinbo, Tolulope; and Piccolo, Stephen, "Making Sense of Messy Data: Classifying Metadata Variables in Public Transcriptomic (RNA) Data" (2024). Library/Life Sciences Undergraduate Poster Competition 2024. 58.
https://scholarsarchive.byu.edu/library_studentposters_2024/58
Document Type
Poster
Publication Date
2024-03-21
Language
English
College
Life Sciences
Department
Biology
Copyright Use Information
https://lib.byu.edu/about/copyright/