Keywords
Late Modern English, register, text classification, historical natural language processing, BERT
Abstract
Registers are situationally defined text varieties, such as letters, essays, or news articles, that are considered to be one of the most important predictors of linguistic variation. Often historical databases of language lack register information, which could greatly enhance their usability (e.g. Early English Books Online). This article examines register variation in Late Modern English and automatic register identification in historical corpora. We model register variation in the corpus of Founding Era American English (COFEA) and develop machine-learning methods for automatic register identification in COFEA. We also extract and analyze the most significant grammatical characteristics estimated by the classifier for the best-predicted registers and found that letters and journals in the 1700s were characterized by informational density. The chosen method enables us to learn more about registers in the Founding Era. We show that some registers can be reliably identified from COFEA, the best overall performance achieved by the deep learning model Bidirectional Encoder Representations from Transformers with an F1-score of 97 per cent. This suggests that deep learning models could be utilized in other studies concerned with historical language and its automatic classification.
Original Publication Citation
Repo, L., Hashimoto, B., & Laippala, V. (2023). In search of Founding Era registers: Automatic modelling of registers from the Corpus of Founding Era American English. Digital Scholarship in the Humanities., fqad049.
BYU ScholarsArchive Citation
Repo, Liina; Hashimoto, Brett James; and Laippala, Veronika, "In Search of Founding Era Registers: Automatic Modeling of Registers from the Corpus of Founding Era American English" (2023). Faculty Publications. 7863.
https://scholarsarchive.byu.edu/facpub/7863
Document Type
Peer-Reviewed Article
Publication Date
2023
Publisher
Digital Scholarship in the Humanities
Language
English
College
Humanities
Department
Linguistics
Copyright Use Information
https://lib.byu.edu/about/copyright/