Abstract
Deep learning models can perform many tasks very capably, provided they are trained correctly. Usually, this requires a large amount of data. Pre-training refers to a process of creating a strong initial model by first training it on a large-scale dataset. Such a model can then be adapted to many different tasks, while only requiring a comparatively small amount of task-specific training data. Pre-training is the standard approach in most computer vision scenarios, but it's not without drawbacks. Aside from the cost and effort involved in collecting large pre-training datasets, such data may also contain unwanted biases, violations of privacy, inappropriate content, or copyright material used without permission. Such issues can lead to concerns about the ethical use of models trained using the data. This dissertation addresses a different approach to pre-training visual models by using abstract, procedurally generated data. Such data is free from the concerns around human bias, privacy, and intellectual property. It also has the potential to scale more easily, and provide precisely controllable sources of supervision that are difficult or impossible to extract from data collected in-the-wild from sources like the internet. The obvious disadvantage of such data is that it doesn't model real-world semantics, and thus introduces a large domain-gap. Surprisingly, however, such pre-training can lead to performance not far below models trained in the conventional way. This is shown for different visual recognition tasks, models, and procedural data-generation processes.
Degree
PhD
College and Department
Computational, Mathematical, and Physical Sciences; Computer Science
Rights
https://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Anderson, Connor S., "Procedural Pre-Training for Visual Recognition" (2024). Theses and Dissertations. 10453.
https://scholarsarchive.byu.edu/etd/10453
Date Submitted
2024-06-18
Document Type
Dissertation
Handle
http://hdl.lib.byu.edu/1877/etd13291
Keywords
computer vision, deep learning, pre-training, procedural data, image recognition, correspondence prediction
Language
english