Pose estimation and tracking is essential for applications involving human controls. Specifically, as the primary operating tool for human activities, hand pose estimation plays a significant role in applications such as hand tracking, gesture recognition, human-computer interaction and VR/AR. As the field develops, there has been a trend to utilize deep learning to estimate the 2D/3D hand poses using color-based information without depth data. Within the depth-based as well as color-based approaches, the research community has primarily focused on single-hand scenarios in a localized/normalized coordinate system. Due to the fact that both hands are utilized in most applications, we propose to push the frontier by addressing two-hand pose estimation in the global coordinate system using only color information. Our first chapter introduces the first system capable of estimating global 3D joint locations for both hands via only monocular RGB input images. To enable training and evaluation of the learning-based models, we propose to introduce a large-scale synthetic 3D hand pose dataset Ego3DHands. As knowledge in synthetic data cannot be directly applied to the real-world domain, a natural two-hand pose dataset is necessary for real-world applications. To this end, we present a large-scale RGB-based egocentric hand dataset Ego2Hands in two chapters. In chapter 2, we address the task of two-hand segmentation/detection using images in the wild. In chapter 3, we focus on the task of two-hand 2D/3D pose estimation using real-world data. In addition to research in hand pose estimation, chapter 4 includes our work on interactive refinement that generalizes the backpropagating refinement technique for dense prediction models.



College and Department

Physical and Mathematical Sciences; Computer Science



Date Submitted


Document Type





computer vision, deep learning, 2D, 3D, two-hand, hand pose estimation, synthetic, real-world, segmentation, detection, interactive, refinement