logistic regression, decision tree, Varbrul


The present paper compares logistic regression (referred to herein as its implementation in Varbrul) with another method for analyzing linguistic data-decision trees. Comparison of the two methods demonstrates that decision trees are able to find the same sorts of generalizations as Varbrul. However, decision trees provide more coarsely-grained output compared with Varbrul’s more informative factor weights. In addition, decision trees often mistakenly overgeneralize. Nevertheless, decision trees can be used in tandem with Varbrul. Because decision trees automatically calculate interactions, they suggest interaction terms that may be considered in subsequent Varbrul analyses. Decision trees also allow continuous variables in contrast to Varbrul’s instantiation of logistic regression which is limited to categorical variables. Therefore, decision tree analysis may help establish cutoff points when continuous data are converted into categories for Varbrul. Data sets containing knockouts and multinomial dependent variables, as well as those containing cells with zeros, are a challenge for Varbrul analysis. These are usually dealt with by recoding and reconfiguring the data. However, in some cases no amount of principled recoding is able to yield a parsimonious Varbrul analysis. Therefore, decision trees are suggested as an alternative method of analysis since they are not adversely affected by these factors. In order to contrast and compare the two methods, Varbrul and decision tree analyses of a number of linguistic data sets are presented.

Original Publication Citation

2010. “A Comparison of Two Tools for Analyzing Linguistic Data: Logistic Regression and Decision Trees.” Italian Journal of Linguistics 22.265-286.

Document Type

Peer-Reviewed Article

Publication Date



Pacini (Pisa)







University Standing at Time of Publication

Associate Professor

Included in

Linguistics Commons