Abstract

Software development is a process fraught with unpredictability, in part because software is created by people. Human interactions add complexity to development processes, and collaborative development can become a liability if not properly understood and managed. Recent years have seen an increase in the use of data mining techniques on publicly-available repository data with the goal of improving software development processes, and by extension, software quality. In this thesis, we introduce the concept of author entropy as a metric for quantifying interaction and collaboration (both within individual files and across projects), present results from two empirical observational studies of open-source projects, identify and analyze authorship and collaboration patterns within source code, demonstrate techniques for visualizing authorship patterns, and propose avenues for further research.

Degree

MS

College and Department

Physical and Mathematical Sciences; Computer Science

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2012-03-02

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd5008

Keywords

software engineering, open source, data mining, collaboration, authorship patterns, author entropy, SourceForge, Subversion, Eclipse, Git

Share

COinS