Abstract

Because of the huge amounts of data made available by the technology boom in the late twentieth century, new methods are required to turn data into usable information. Much of this data is categorical in nature, which makes estimation difficult in highly multivariate settings. In this thesis we review various multivariate statistical methods, discuss various statistical methods of natural language processing (NLP), and discuss a general class of models described by Erosheva (2002) called generalized mixed membership models. We then propose extensions of the information partition function (IPF) derived by Engler (2002), Oliphant (2003), and Tolley (2006) that will allow modeling of discrete, highly multivariate data in linear models. We report results of the modified IPF model on the World Health Organization's Survey on Global Aging (SAGE).

Degree

MS

College and Department

Physical and Mathematical Sciences; Statistics

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2007-12-28

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd2263

Keywords

Information Partition Function, interaction effects, multivariate analysis, discrete data, Natural Language Processing

Language

English

Share

COinS