Although replication is considered an indispensable part of the scientific method in software engineering, few replication studies are published each year. The rate of replication, however, is not surprising given that replication theory in software engineering is immature. Not only are replication taxonomies varied and difficult to reconcile, but opinions on the role of replication contradict. In general, we have no clear sense of how to build knowledge via replication, particularly given the practical realities of our research field. Consequently, most replications in software engineering yield little useful information. In particular, the vast majority of external replications (i.e., replications performed by researchers unaffiliated with the original study) not only fail to reproduce the original results, but defy explanation. The net effect is that, as a research field, we consistently fail to produce usable (i.e., transferable) knowledge, and thus, our research results have little if any impact on industry. In this dissertation, we dissect the problem of replication into four primary concerns: 1) rate and explicitness of replication; 2) theoretical foundations of replication; 3) tractability of methods for context analysis; and 4) effectiveness of inter-study communication. We address each of the four concerns via a two-part research strategy involving both a theoretical and a practical component. The theoretical component consists of a grounded theory study in which we integrate and then apply external replication theory to problems of replication in empirical software engineering. The theoretical component makes three key contributions to the literature: first, it clarifies the role of replication with respect to the overall process of science; second, it presents a flexible framework for reconciling disparate replication terminology; and third, it informs a broad range of practical replication concerns. The practical component involves a series of replication studies, through which we explore a variety of replication concepts and empirical methods, ultimately culminating in the development of a tractable method for context analysis (TCA). TCA enables the quantitative evaluation of context variables in greater detail, with greater statistical power, and via considerably smaller datasets than previously possible. As we show (via a complex, real-world example), the method ultimately enables the empirically and statistically-grounded reconciliation and generalization of otherwise contradictory results across dissimilar replications—which problem has previously remained unsolved in software engineering.



College and Department

Physical and Mathematical Sciences; Computer Science



Date Submitted


Document Type





replication, experimentation, generalization, context analysis, multi-site joint replication, post-hoc moderator analysis, Bayesian methods, theory of conceptual frameworks, design patterns, Conway's Law