Abstract
We use a Fully Convolutional Neural Network (FCNN) to classify pixels in historical document images, enabling the extraction of high-quality, pixel-precise and semantically consistent layers of masked content. We also analyze a dataset of hand-labeled historical form images of unprecedented detail and complexity. The semantic categories we consider in this new dataset include handwriting, machine-printed text, dotted and solid lines, and stamps. Segmentation of document images into distinct layers allows handwriting, machine print, and other content to be processed and recognized discriminatively, and therefore more intelligently than might be possible with content-unaware methods. We show that an efficient FCNN with relatively few parameters can accurately segment documents having similar textural content when trained on a single representative pixel-labeled document image, even when layouts differ significantly. In contrast to the overwhelming majority of existing semantic segmentation approaches, we allow multiple labels to be predicted per pixel location, which allows for direct prediction and reconstruction of overlapped content. We perform an analysis of prevalent pixel-wise performance measures, and show that several popular performance measures can be manipulated adversarially, yielding arbitrarily high measures based on the type of bias used to generate the ground-truth. We propose a solution to the gaming problem by comparing absolute performance to an estimated human level of performance. We also present results on a recent international competition requiring the automatic annotation of billions of pixels, in which our method took first place.
Degree
MS
College and Department
Physical and Mathematical Sciences; Computer Science
Rights
http://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Stewart, Seth Andrew, "Fully Convolutional Neural Networks for Pixel Classification in Historical Document Images" (2018). Theses and Dissertations. 7064.
https://scholarsarchive.byu.edu/etd/7064
Date Submitted
2018-10-01
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd10366
Keywords
Convolutional Neural Networks, Document Image Analysis, Fully Convolutional Neural Networks, Layout Analysis, Page Segmentation, Pixel-Labeling, Region Classification, Semantic Segmentation, Data, Augmentation, Historical Document Processing, Optical Character Recognition, Handwriting Recognition
Language
english