Abstract

Identity verification is ubiquitous in our daily life. By verifying the user's identity, the authorization process grants the privilege to access resources or facilities or perform certain tasks. The traditional and most prevalent authentication method is the personal identification number (PIN) or password. While these knowledge-based credentials could be lost or stolen, human biometric-based verification technologies have become popular alternatives in recent years. Nowadays, more people are used to unlocking their smartphones using their fingerprint or face instead of the conventional passcode. However, these biometric approaches have their weaknesses. For example, fingerprints could be easily fabricated, and a photo or image could spoof the face recognition system. In addition, these existing biometric-based identity verification methods could continue if the user is unaware, sleeping, or even unconscious. Therefore, an additional level of security is needed. In this dissertation, we demonstrate a novel identity verification approach, which makes the biometric authentication process more secure. Our approach requires only one regular camera to acquire a short video for computing the face and facial motion representations. It takes advantage of the advancements in computer vision and deep learning techniques. Our new deep neural network model, or facial motion encoder, can generate a representation vector for the facial motion in the video. Then the decision algorithm compares the vector to the enrolled facial motion vector to determine their similarity for identity verification. We first proved its feasibility through a keypoint-based method. After that, we built a curated dataset and proposed a novel representation learning framework for facial motions. The experimental results show that this facial motion verification approach reaches an average precision of 98.8\%, which is more than adequate for customary use. We also tested this algorithm on complex facial motions and proposed a new self-supervised pretraining approach to boost the encoder's performance. At last, we evaluated two other potential upstream tasks that could help improve the efficiency of facial motion encoding. Through these efforts, we have built a solid benchmark for facial motion representation learning, and the elaborate techniques can inspire other face analysis and video understanding research.

Degree

PhD

College and Department

Ira A. Fulton College of Engineering; Electrical and Computer Engineering

Rights

https://lib.byu.edu/about/copyright/