Abstract

This report introduces a suite of command-line tools created to assist content developers with the creation of rich supplementary material to use in conjunction with feature films and other video assets in language teaching. The tools are intended to leverage open-source corpora and software (the OPUS OpenSubs corpus and the Moses statistical machine translation system, respectively), but are written in a modular fashion so that other resources could be leveraged in their place. The completed tool suite facilitates three main tasks, which together constitute this project. First, several scripts created for use in preparing linguistic data for the system are discussed. Next, a set of scripts are described that together leverage the strengths of both terminology management and statistical machine translation to provide candidate translation entries for terms of interest. Finally, a tool chain and methodology are given for enriching the terminological data store based on the output of the machine translation process, thereby enabling greater accuracy and efficiency with each subsequent application.

Degree

MA

College and Department

Humanities; Linguistics and English Language

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2010-08-04

Document Type

Selected Project

Handle

http://hdl.lib.byu.edu/1877/etd3899

Keywords

electronic film review, language instruction, statistical machine translation, terminology management

Language

English

Included in

Linguistics Commons

Share

COinS