Keywords
data cleaning; data wrangling; gene-expression analysis; interactive curation; web application
Abstract
TidyGEO is a Web-based tool for downloading, tidying, and reformatting data series from Gene Expression Omnibus (GEO). As a freely accessible repository with data from over 6 million biological samples across more than 4000 organisms, GEO provides diverse opportunities for secondary research. Although scientists may find assay data relevant to a given research question, most analyses require sample-level annotations. In GEO, such annotations are stored alongside assay data in delimited, text-based files. However, the structure and semantics of the annotations vary widely from one series to another, and many annotations are not useful for analysis purposes. Thus, every GEO series must be tidied before it is analyzed. Manual approaches may be used, but these are error prone and take time away from other research tasks. Custom computer scripts can be written, but many scientists lack the computational expertise to create such scripts. To address these challenges, we created TidyGEO, which supports essential data-cleaning tasks for sample-level annotations, such as selecting informative columns, renaming columns, splitting or merging columns, standardizing data values, and filtering samples. Additionally, users can integrate annotations with assay data, restructure assay data, and generate code that enables others to reproduce these steps.
Original Publication Citation
Mecham A, Stephenson Ashlie, Quinteros BI, Brown GS, Piccolo SR. TidyGEO: Preparing analysis-ready datasets from Gene Expression Omnibus. Journal of Integrative Bioinformatics, 2023, pp. 20230021
BYU ScholarsArchive Citation
Mecham, Avery; Stephenson, Ashlie; Quinteros, Badi I.; Brown, Grace S.; and Piccolo, Stephen R., "TidyGEO Preparing Analysis-Ready Datasets from Gene Expression Omnibus" (2023). Faculty Publications. 7343.
https://scholarsarchive.byu.edu/facpub/7343
Document Type
Peer-Reviewed Article
Publication Date
2023-12-05
Publisher
De Gruyter
Language
English
College
Life Sciences
Department
Biology
Copyright Use Information
https://lib.byu.edu/about/copyright/