Files
Download Full Text (535 KB)
Keywords
machine learning, handwriting recognition, deep learning, AI
Abstract
Digitizing 20th-century French census records offers a valuable resource for understanding demographic shifts during significant historical events such as the Thirty Years Crisis and the Industrial Revolution. These records provide insights into the migration patterns of Jewish populations during WWII, revealing how the war impacted Jewish communities and broader population movements. Additionally, they help researchers analyze the transformation of Paris’s neighborhoods over time, examining development and decline. To digitize these records with deep learning models, researchers face the challenge of developing a dataset that allows machine reading of handwritten French census entries. Traditionally, this would require extensive manual labeling of thousands of images, a costly and time-consuming task. Instead, synthetic data generation is used to create a dataset of French words for training the model. By synthesizing labeled data, researchers reduce the need for labor-intensive labeling while still achieving meaningful training outcomes. BYU Pathways students are then used to label to fine-tune the model. Initial results from the model show strong performance, with birth year fields reaching 67% word accuracy and 87% character accuracy after training solely on synthetic data and transfer learning. However, more complex fields had only 39% word accuracy and 49% character accuracy after training. This approach underscores the potential of introducing synthetic data training to traditional transfer learning and active learning to efficiently train high-accuracy models, enhancing historical research capabilities and creating robust tools for analyzing handwritten records.
BYU ScholarsArchive Citation
Leavitt, Jacob, "If you Teach a Bot to Read: Using Machine Learning to Read the Paris Census" (2024). FHSS Mentored Research Conference. 370.
https://scholarsarchive.byu.edu/fhssconference_studentpub/370
Document Type
Poster
Publication Date
2024-12-05
Language
English
College
Family, Home, and Social Sciences
Department
Economics
Copyright Use Information
http://lib.byu.edu/about/copyright/