Abstract
Data extraction from engraved text is discussed rarely, and nothing in the open literature discusses data extraction from cemetery headstones. Headstone images present unique challenges such as engraved or embossed characters (causing inner-character shadows), low contrast with the background, and significant noise due to inconsistent stone texture and weathering. Current systems for extracting text from outdoor environments (billboards, signs, etc.) make assumptions (i.e. clean and/or consistently-textured background and text) that fail when applied to the domain of engraved text. Additionally, the ability to extract the data found on headstones is of great historical value. This thesis describes a novel and efficient feature-based text zoning and segmentation method for the extraction of noisy text from a highly textured engraved medium. Additionally, the usefulness of constraining a problem to a specific domain is demonstrated. The transcriptions of images zoned and segmented through the proposed system result in a precision of 55% compared to 1% precision without zoning, a 62% recall compared to 39%, an F-measure of 58% compared to 2%, and an error rate of 77% compared to 8303%.
Degree
MS
College and Department
Physical and Mathematical Sciences; Computer Science
Rights
http://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Christiansen, Cameron Smith, "Data Acquisition from Cemetery Headstones" (2012). Theses and Dissertations. 3383.
https://scholarsarchive.byu.edu/etd/3383
Date Submitted
2012-11-27
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd5708
Keywords
Cemetery Headstones, OCR, Graph Cut, Zoning, Text Extraction, Scene Text
Language
English