Abstract

Deep learning models can perform many tasks very capably, provided they are trained correctly. Usually, this requires a large amount of data. Pre-training refers to a process of creating a strong initial model by first training it on a large-scale dataset. Such a model can then be adapted to many different tasks, while only requiring a comparatively small amount of task-specific training data. Pre-training is the standard approach in most computer vision scenarios, but it's not without drawbacks. Aside from the cost and effort involved in collecting large pre-training datasets, such data may also contain unwanted biases, violations of privacy, inappropriate content, or copyright material used without permission. Such issues can lead to concerns about the ethical use of models trained using the data. This dissertation addresses a different approach to pre-training visual models by using abstract, procedurally generated data. Such data is free from the concerns around human bias, privacy, and intellectual property. It also has the potential to scale more easily, and provide precisely controllable sources of supervision that are difficult or impossible to extract from data collected in-the-wild from sources like the internet. The obvious disadvantage of such data is that it doesn't model real-world semantics, and thus introduces a large domain-gap. Surprisingly, however, such pre-training can lead to performance not far below models trained in the conventional way. This is shown for different visual recognition tasks, models, and procedural data-generation processes.

Degree

PhD

College and Department

Computational, Mathematical, and Physical Sciences; Computer Science

Rights

https://lib.byu.edu/about/copyright/

Date Submitted

2024-06-18

Document Type

Dissertation

Handle

http://hdl.lib.byu.edu/1877/etd13291

Keywords

computer vision, deep learning, pre-training, procedural data, image recognition, correspondence prediction

Language

english

Share

COinS