Abstract

We propose subword spotting, a generalization of word spotting where the search is for groups of characters within words. We present a method for performing subword spotting based on state-of-the-art word spotting techniques and evaluate its performance at three granularitires (unigrams, bigrams and trigrams) on two datasets. We demonstrate three applications of subword spotting, though others may exist. The first is assisting human transcribers identify unrecognized characters by locating them in other words. The second is searching for suffixes directly in word images (suffix spotting). And the third is computer assisted transcription (semi-automated transcription). We investigate several variations of computer assisted transcription using subword spotting, but none achieve transcription speeds above manual transcription. We investigate the causes.

Degree

MS

College and Department

Physical and Mathematical Sciences; Computer Science

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2018-05-01

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd10011

Keywords

subword, spotting, CAT, semi-automated, handwriting, n-gram, character

Language

english

Share

COinS