Files

Download

Download Full Text (535 KB)

Keywords

machine learning, handwriting recognition, deep learning, AI

Abstract

Digitizing 20th-century French census records offers a valuable resource for understanding demographic shifts during significant historical events such as the Thirty Years Crisis and the Industrial Revolution. These records provide insights into the migration patterns of Jewish populations during WWII, revealing how the war impacted Jewish communities and broader population movements. Additionally, they help researchers analyze the transformation of Paris’s neighborhoods over time, examining development and decline. To digitize these records with deep learning models, researchers face the challenge of developing a dataset that allows machine reading of handwritten French census entries. Traditionally, this would require extensive manual labeling of thousands of images, a costly and time-consuming task. Instead, synthetic data generation is used to create a dataset of French words for training the model. By synthesizing labeled data, researchers reduce the need for labor-intensive labeling while still achieving meaningful training outcomes. BYU Pathways students are then used to label to fine-tune the model. Initial results from the model show strong performance, with birth year fields reaching 67% word accuracy and 87% character accuracy after training solely on synthetic data and transfer learning. However, more complex fields had only 39% word accuracy and 49% character accuracy after training. This approach underscores the potential of introducing synthetic data training to traditional transfer learning and active learning to efficiently train high-accuracy models, enhancing historical research capabilities and creating robust tools for analyzing handwritten records.

Document Type

Poster

Publication Date

2024-12-05

Language

English

College

Family, Home, and Social Sciences

Department

Economics

University Standing at Time of Publication

Senior

If you Teach a Bot to Read: Using Machine Learning to Read the Paris Census

Included in

Economics Commons

Share

COinS