addictive behaviors, count regression, longitudinal substance use data


Critical research questions in the study of addictive behaviors concern how these behaviors change over time - either as the result of intervention or in naturalistic settings. The combination of count outcomes that are often strongly skewed with many zeroes (e.g., days using, number of total drinks, number of drinking consequences) with repeated assessments (e.g., longitudinal follow-up after intervention or daily diary data) present challenges for data analyses. The current article provides a tutorial on methods for analyzing longitudinal substance use data, focusing on Poisson, zero-inflated, and hurdle mixed models, which are types of hierarchical or multilevel models. Two example datasets are used throughout, focusing on drinking-related consequences following an intervention and daily drinking over the past 30 days, respectively. Both datasets as well as R, SAS, Mplus, Stata, and SPSS code showing how to fit the models are available on a supplemental website. What is the impact of personalized normative feedback on drinking related problems in college students over time? How does weekend versus weekday drinking vary by gender and fraternity/sorority status when assessed on a daily basis? Many questions about alcohol and substance abuse focus on change across time, and the methods used to analyze these questions need to account for the longitudinal nature of the data. Generalized linear mixed models (GLMMs; Gelman & Hill, 2007; Hedeker & Gibbons, 2006; also called hierarchical [or multilevel] generalized linear modeling, Raudenbush & Bryk, 2002; Snijders & Bosker, 1999) are increasingly common analytic approaches for longitudinal data, given their flexible handling of unbalanced repeated measures (i.e., individual participants may have unique numbers and timings of assessments) and the widespread availability of software for estimating such models. Moreover, GLMMs are appropriate for continuous as well as discrete outcomes. However, the distributions of alcohol and substance abuse outcomes have characteristic shapes: They are often positively skewed and bounded by zero. Moreover, there can be a Correspondence concerning this article should be addressed to Dave Atkins, Department of Psychiatry and Behavioral Sciences, 1100 NE 45th St., Suite 300, Seattle, WA, 98105. NIH Public Access Author Manuscript Psychol Addict Behav. Author manuscript; available in PMC 2014 March 01. Published in final edited form as: Psychol Addict Behav. 2013 March ; 27(1): 166–177. doi:10.1037/a0029508. NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript large stack of data points at zero, indicating individuals and/or occasions without drinking, use, or related problems. These distributions reflect that alcohol and substance abuse outcomes are often count data, representing a total number of something, be it drinks, days using, or number of problems. Except in special circumstances (e.g., specially selected samples with high drinking or drug use), statistical models that assume normally distributed residuals will provide poor fit to such data and will lead to incorrect confidence intervals and p-values. Instead, count regression approaches such as Poisson or negative binomial regression or zero-altered count models (e.g., zero-inflated or hurdle models) are much more appropriate for these types of data (Atkins & Gallop, 2007; Coxe, West, & Aiken, 2009; Hilbe, 2011; Neal & Simons, 2007; Simons, Neal, & Gaher, 2006). In the past, addictions researchers have often ignored (or not been aware of) violations of distributional assumptions or have attempted to deal with them in non-optimal ways. Count regression models are beginning to be applied to addictions data (e.g., Gaher & Simons, 2007; Lewis et al., 2010), but accessible resources on how to apply these models to longitudinal data are scarce. The present article provides a tutorial in analytic methods for count data from longitudinal studies, focusing on extensions to GLMMs for count outcomes. We use two examples from our research to illustrate the need for, and application of, longitudinal count models. Data and computer code to run the analyses in R, Mplus, SAS, Stata, and SPSS are available on a supplementary website, though note that not all software can run all models that are covered here at present time ( newweb/statstutorials.html). The outline of the article is as follows: Introduction to example data and research questions, brief overview of count regression models, GLMMs for count regression models, analyses and interpretation of example data, and discussion of software and practical issues in using these methods. In addition, there is a technical appendix containing important, but more advanced, material. We assume that readers have a basic familiarity with linear mixed models (i.e., hierarchical linear or multilevel models assuming normally distributed errors) and count regression models, though both are introduced briefly here and introductory resources are highlighted throughout.

Document Type

Peer-Reviewed Article

Publication Date



Psychol Addict Behav.




Family, Home, and Social Sciences



Included in

Psychology Commons