•  
  •  
 

Journal of Undergraduate Research

Keywords

happiness, cigarette smokers, gender, ordered response model

College

Family, Home, and Social Sciences

Department

Economics

Abstract

Does happiness depend on income? What puts people at risk to become “heavy smokers?” Do gender and wage affect job promotion? The answers to these varied questions have one thing in common: they employ grouped or categorical data. Happiness is often reported on cales of 1 to 10 (Winkelmann 2005). Tobacco users and cigarette smokers are asked if they are “non-users,” “light users,” or “heavy users” (Harris and Zhao 2007). In some professions, such as the British nursing field, careers are assigned ranks from one to six Pudney and Shields 2000). Categsorization often cannot be avoided when collecting data. The nature of this categorical data should be taken into account when seeking causal relationships between the categorical variable of interest, such as happiness, and explanatory variables, such as income and education. In the past, statisticians and academics have used a regression technique called the ordered probit to estimate relationships between explanatory variables and grouped data. Although this method presents a better option than the standard ordinary least squares technique, both the ordered probit and the ordered logit make very restrictive assumptions about the distribution of the grouped data. Unfortunately, these theoretic assumptions about the data distribution often do not hold in actual data and so we find ourselves with an inconsistency between the data and the model. This inconsistency is similar to buying an 11 year-old nephew a size six shoe under the assumption that all 11 year olds share the same, average shoe size. To your chagrin, your nephew has size eight feet and the shoes suddenly lose much of their forecasted usefulness. The purpose of this project is to remove this inconsistency in previous regression models or at least reduce it to a lesser degree; I am trying to find the right size shoe for each set of data in order to make estimation results more reliable. I do this by incorporating more information about the data distribution when estimating the relationships of interest.

Included in

Economics Commons

Share

COinS