This study used simulated results to evaluate four alternative methods of computing confidence intervals for class means in the context of student evaluations of teaching in a university setting. Because of the skewed and bounded nature of the ratings, the goal was to identify a procedure for constructing confidence intervals that would be asymmetric and not dependent upon normal curve theory. The four methods included (a) a logit transformation, (b) a resampling procedure, (c) a nonparametric, bias corrected accelerated Bootstrapping procedure, and (d) a Bayesian bootstrap procedure. The methods were compared against four criteria including (a) coverage probability, (b) coverage error, (c) average interval width, and (d) the lower and upper error probability. The results of each method were also compared with a classical procedure for computing the confidence interval based on normal curve theory. In addition, Student evaluations of teaching effectiveness (SET) ratings from all courses taught during one semester at Brigham Young University were analyzed using multilevel generalizability theory to estimate variance components and to estimate the reliability of the class means as a function of the number of respondents in each class. The results showed that the logit transformation procedure outperformed the alternative methods. The results also showed that the reliability of the class means exceeded .80 for classes averaging 15 respondents or more. The study demonstrates the need to routinely report a margin of error associated with the mean SET rating for each class and recommends that a confidence interval based on the logit transformation procedure be used for this purpose.



College and Department

David O. McKay School of Education; Educational Inquiry, Measurement, and Evaluation



Date Submitted


Document Type





student evaluations of teaching, confidence interval, reliability of class means, logit transformation, resampling, bias corrected accelerated, Bayesboot