Theses and Dissertations

Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis

Owen G. Riley, Brigham Young University - ProvoFollow

Abstract

Technical disciplines are evolving rapidly leading to changes in their associated vocabularies. Confusion in interdisciplinary communication occurs due to this evolving terminology. Two causes of confusion are multiple definitions (overloaded terms) and synonymous terms. The formal names for these two problems are polysemy and synonymy. Termediator-I, a web application built on top of a collection of glossaries, uses definition count as a measure of term confusion. This tool was an attempt to identify confusing cross-disciplinary terms. As more glossaries were added to the collection, this measure became ineffective. This thesis provides a measure of term polysemy. Term polysemy is effectively measured by semantically clustering the text concepts, or definitions, of each term and counting the number of resulting clusters. Hierarchical clustering uses a measure of proximity between the text concepts. Three such measures are evaluated: cosine similarity, latent semantic indexing, and latent Dirichlet allocation. Two linkage types, for determining cluster proximity during the hierarchical clustering process, are also evaluated: complete linkage and average linkage. Crowdsourcing through a web application was unsuccessfully attempted to obtain a viable clustering threshold by public consensus. An alternate metric of polysemy, convergence value, is identified and tested as a viable clustering threshold. Six resulting lists of terms ranked by cluster count based on convergence values are generated, one for each similarity measure and linkage type combination. Each combination produces a competitive list, and no clear combination can be determined as superior. Semantic clustering successfully identifies polysemous terms, but each similarity measure and linkage type combination provides slightly different results.

Degree

College and Department

Ira A. Fulton College of Engineering and Technology; Technology

Rights

http://lib.byu.edu/about/copyright/

BYU ScholarsArchive Citation

Riley, Owen G., "Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis" (2014). Theses and Dissertations. 4030.
https://scholarsarchive.byu.edu/etd/4030

Date Submitted

2014-04-23

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd6923

Keywords

cosine similarity, LSI, LDA, text similarity, hierarchical clustering, polysemy

Language

English

Technology Emphasis

Information Technology (IT)

Download

Included in

Construction Engineering and Management Commons

COinS

BYU ScholarsArchive

Theses and Dissertations

Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis

Abstract

Degree

College and Department

Rights

BYU ScholarsArchive Citation

Date Submitted

Document Type

Handle

Keywords

Language

Technology Emphasis

Included in

Search

Browse

BYU Links

Author Corner

Hosted by the

BYU ScholarsArchive

Theses and Dissertations

Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis

Author

Abstract

Degree

College and Department

Rights

BYU ScholarsArchive Citation

Date Submitted

Document Type

Handle

Keywords

Language

Technology Emphasis

Included in

Share

Search

Browse

BYU Links

Author Corner

Hosted by the