Abstract

3D scene understanding is an important problem that has experienced great progress in recent years, in large part due to the development of state-of-the-art methods for 3D object detection. However, the performance of 3D object detectors can suffer in scenarios where extreme occlusion of objects is present, or the number of object classes is large. In this paper, we study the problem of inferring 3D counts from densely packed scenes with heterogeneous objects. This problem has applications to important tasks such as inventory management or automatic crop yield estimation. We propose a novel regression-based method, CountNet3D, that uses mature 2D object detectors for finegrained classi- fication and localization, and a PointNet backbone for geo- metric embedding. The network processes fused data from images and point clouds for end-to-end learning of counts. We perform experiments on a novel synthetic dataset for inventory management in retail, which we construct and make publicly available to the community. We also have a proprietary dataset we've collected of real-world scenes. In addition we run experiments to quantify the uncertainty of the models and evaluate the confidence of our predic- tions. Our results show that regression-based 3D counting methods systematically outperform detection-based meth- ods, and reveal that directly learning from raw point clouds greatly assists count estimation under extreme occlusion.

Degree

MS

College and Department

Physical and Mathematical Sciences; Computer Science

Rights

https://lib.byu.edu/about/copyright/

Date Submitted

2023-08-30

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd12985

Keywords

3D Computer Vison, Extreme Occlusion, Deep Learning, Object Counting, Predictive Uncertainty

Language

english

Share

COinS