Keywords

chinese, word segmentation, open-source consistency evaluation

College

Humanities

Abstract

Chinese in its written form, whether typed or penned, does not separate its characters by spaces. Imagine if this were the case with English, and a sign for a job fair were to display “opportunityisnowhere.” Regardless of the intent being to announce that “opportunity is now here,” the ambiguity caused by the lack of spacing also enables a negative reading. Figuring out where the spaces, or word boundaries, belong in Chinese can even be tricky on occasion even for native speakers. Imagine then how difficult this task is for computers. And so, machines need to be able to decipher word boundaries in Chinese text before they can do anything else with it such as translation or web-search. Computational tools that do this essential preprocessing are called segmenters. They take spaceless Chinese text as input and output their best guess at a spaced version. (See Figure 1.)

Recommended Citation

Smith, Blake and Reynolds, Robert (2019) "Open-source Consistency Evaluation for Chinese Word Segmentation," Journal of Undergraduate Research: Vol. 2019: Iss. 2019, Article 81.
Available at: https://scholarsarchive.byu.edu/jur/vol2019/iss2019/81

Download

Included in

Arts and Humanities Commons

COinS

BYU ScholarsArchive

Journal of Undergraduate Research

Open-source Consistency Evaluation for Chinese Word Segmentation

Keywords

College

Abstract

Recommended Citation

Included in

Search

BYU ScholarsArchive

Journal of Undergraduate Research

Open-source Consistency Evaluation for Chinese Word Segmentation

Authors

Keywords

College

Abstract

Recommended Citation

Included in

Share

Search