Stylometry assumes that the essence of the individual style of an author can be captured using a number of quantitative criteria, such as the relative frequencies of noncontextual words (e.g., or, the, and, etc.). Several statistical methodologies have been developed for authorship analysis. Jockers et al. (2009) utilize Nearest Shrunken Centroid (NSC) classification, a promising classification methodology in DNA microarray analysis for authorship analysis of the Book of Mormon. Schaalje et al. (2010) develop an extended NSC classification to remedy the problem of a missing author. Dabney (2005) and Koppel et al. (2009) suggest other modifications of NSC. This paper develops a full Bayesian classifier and compares its performance to five versions of the NSC classifier using the Federalist Papers, the Book of Mormon text blocks, and the texts of seven other authors. The full Bayesian classifier was superior to all other methods.
College and Department
Physical and Mathematical Sciences; Statistics
BYU ScholarsArchive Citation
Funai, Tomohiko, "Extensions of Nearest Shrunken Centroid Method for Classification" (2010). Theses and Dissertations. 2402.
machine learning, discriminant analysis, authorship, attribution, fudge factor, shrinkage