Abstract
During this "white collar recession,'' there is a flooded labor market of workers. For employers seeking to hire, there is a need to identify potential qualified candidates for each job. The current state of the art is LinkedIn Recruiting or elastic search on Resumes. The current state of the art lacks efficiency and scalability along with an intuitive ranking of candidates. We believe this can be fixed with multi-layer categorical clustering via modularity maximization. To test this, we gathered a dataset that is extensive and representative of the job market. Our data comes from PeopleDataLabs and LinkedIn and is sampled from 153 million individuals. As such, this data represents one of the most informative datasets for the task of ranking and clustering job titles and skills. Properly grouping individuals will help identify more candidates to fulfill the multitude of vacant positions. We implement a novel framework for categorical clustering, involving these attributes to deliver a reliable pool of candidates. We develop a metric for clustering based on commonality to rank clustering algorithms. The metric prefers modularity-based clustering algorithms like the Louvain algorithm. This allows us to use such algorithms to outperform other unsupervised methods for categorical clustering. Our implementation accurately clusters emergency services, health-care and other fields while managerial positions are interestingly swamped by soft or uninformative features thereby resulting in dominant ambiguous clusters.
Degree
MS
College and Department
Physical and Mathematical Sciences; Mathematics
Rights
https://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Steffen, Matthew James, "Unsupervised Categorical Clustering on Labor Markets" (2023). Theses and Dissertations. 9865.
https://scholarsarchive.byu.edu/etd/9865
Date Submitted
2023-04-10
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd12703
Keywords
Categorical Clustering, Modularity Maximization, Unsupervised Clustering, Louvain, Sparse, Categorical Data
Language
english