Abstract

Stylometry assumes that the essence of the individual style of an author can be captured using a number of quantitative criteria, such as the relative frequencies of noncontextual words (e.g., or, the, and, etc.). Several statistical methodologies have been developed for authorship analysis. Jockers et al. (2009) utilize Nearest Shrunken Centroid (NSC) classification, a promising classification methodology in DNA microarray analysis for authorship analysis of the Book of Mormon. Schaalje et al. (2010) develop an extended NSC classification to remedy the problem of a missing author. Dabney (2005) and Koppel et al. (2009) suggest other modifications of NSC. This paper develops a full Bayesian classifier and compares its performance to five versions of the NSC classifier using the Federalist Papers, the Book of Mormon text blocks, and the texts of seven other authors. The full Bayesian classifier was superior to all other methods.

Degree

MS

College and Department

Physical and Mathematical Sciences; Statistics

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2010-03-16

Document Type

Selected Project

Handle

http://hdl.lib.byu.edu/1877/etd3487

Keywords

machine learning, discriminant analysis, authorship, attribution, fudge factor, shrinkage

Language

English

Share

COinS