Paper forms are a commonly used format for collecting information, including information that ultimately will be added to a digital database. This work focuses on the automatic extraction of information from form images. It examines what can be achieved at parsing forms without any textual information. The resulting model, FUDGE, shows that computer vision alone is reasonably successful at the problem. Drawing from the strengths and weaknesses of FUDGE, this work also introduces a novel model, Dessurt, for end-to-end document understanding. Dessurt performs text recognition implicitly and is capable of outputting arbitrary text, making it a more flexible document processing model than prior methods. Dessurt is capable of parsing the entire contents of a form image into a structured format directly, achieving better performance than FUDGE at this task. Also included is a technique to generate synthetic handwriting, which provides synthetic training data for Dessurt.
College and Department
Physical and Mathematical Sciences; Computer Science
BYU ScholarsArchive Citation
Davis, Brian Lafayette, "A Visual Focus on Form Understanding" (2022). Theses and Dissertations. 9504.
document understanding, form understanding, handwriting, template-free, end-to-end