In an effort to increase both the quality of software applications and the efficiency with which applications can be written, developers often incorporate multiple programming languages into software projects. Although language specialization arguably introduces benefits, the total impact of the resulting language fragmentation (working concurrently in multiple programming languages) on developer performance is unclear. For instance, developers may solve problems more efficiently when they have multiple language paradigms at their disposal. However, the overhead of maintaining efficiency in more than one language may outweigh those benefits. This thesis represents a first step toward understanding the relationship between language fragmentation and programmer productivity. We address that relationship within two different contexts: 1) the individual developer, and 2) the overall project. Using a data-centered approach, we 1) develop metrics for measuring productivity and language fragmentation, 2) select data suitable for calculating the needed metrics, 3) develop and validate statistical models that isolate the correlation between language fragmentation and individual programmer productivity, 4) develop additional methods to mitigate threats to validity within the developer context, and 5) explore limitations that need to be addressed in future work for effective analysis of language fragmentation within the project context using the SourceForge data set. Finally, we demonstrate that within the open source software development community, SourceForge, language fragmentation is negatively correlated with individual programmer productivity.



College and Department

Physical and Mathematical Sciences; Computer Science



Date Submitted


Document Type





software engineering, empirical, software metrics, programming languages, language fragmentation, productivity, problem solving, cognition, language entropy, open source, SourceForge