Keywords
sampling, corpus design, methodological synthesis, methodological reporting practices, representativeness
Abstract
Methodological design is a central issue for researchers in corpus linguistics. To understand trends in the reporting of important aspects of corpus design and the type of corpora being used in corpus linguistics research articles better, this study analyzes 709 descriptions of corpora from research published in corpus journals between 2010–2019. Each article was manually coded by two trained coders for aspects of corpus design, such as the population definition, sampling method, and sample size. Additionally, the study identifies missing information in corpus reporting. Our results show trends in corpus design, such as an increased use of spoken corpora. We also observe the existence of some robust sampling methodology and slight improvements in reporting practices over time. Overall, there is great diversity in the types of corpora that are observed in the corpus data, such as size. However, our results also show widespread underreporting of generally important corpus design choices and features, such as sampling methods or the number of texts in in even newly constructed corpora. Resultantly, suggestions for ways to improve reporting practices for empirical corpus linguistics studies are provided for authors, reviewers, and editors.
Original Publication Citation
Hashimoto, B., & Nelson, K. (2024). Recent trends in corpus design and reporting: A methodological synthesis. Research in Corpus Linguistics, 12(1), 59-88.
BYU ScholarsArchive Citation
Hashimoto, Brett James and Nelson, Kyra, "Recent Trends in Corpus Design and Reporting: A Methodological Synthesis" (2024). Faculty Publications. 7860.
https://scholarsarchive.byu.edu/facpub/7860
Document Type
Peer-Reviewed Article
Publication Date
2024
Publisher
Research in Corpus Linguistics
Language
English
College
Humanities
Department
Linguistics
Copyright Use Information
https://lib.byu.edu/about/copyright/
Previous Versions
Nov 24 2025 (withdrawn)