Abstract
Stylometry assumes that the essence of the individual style of an author can be captured using a number of quantitative criteria, such as the relative frequencies of noncontextual words (e.g., or, the, and, etc.). Several statistical methodologies have been developed for authorship analysis. Jockers et al. (2009) utilize Nearest Shrunken Centroid (NSC) classification, a promising classification methodology in DNA microarray analysis for authorship analysis of the Book of Mormon. Schaalje et al. (2010) develop an extended NSC classification to remedy the problem of a missing author. Dabney (2005) and Koppel et al. (2009) suggest other modifications of NSC. This paper develops a full Bayesian classifier and compares its performance to five versions of the NSC classifier using the Federalist Papers, the Book of Mormon text blocks, and the texts of seven other authors. The full Bayesian classifier was superior to all other methods.
Degree
MS
College and Department
Physical and Mathematical Sciences; Statistics
Rights
http://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Funai, Tomohiko, "Extensions of Nearest Shrunken Centroid Method for Classification" (2010). Theses and Dissertations. 2402.
https://scholarsarchive.byu.edu/etd/2402
Date Submitted
2010-03-16
Document Type
Selected Project
Handle
http://hdl.lib.byu.edu/1877/etd3487
Keywords
machine learning, discriminant analysis, authorship, attribution, fudge factor, shrinkage
Language
English