Video segmentation is an application of computer vision aimed at automating the extraction of an object from a series of video frames. However, it is a difficult problem, especially to compute at real-time, interactive rates. Although general application to video is difficult because of the wide range of image scenarios, user interaction can help to reduce the problem space and speed up the computation. This thesis presents a fast object-tracking tool that selects an object from a series of frames based on minimal user input. Our Intelligent Rotoscoping tool aims for increased speed and accuracy over other video segmentation tools, while maintaining reproducibility of results. For speed, the tool stays ahead of the user in selecting frames and responding to feedback. For accuracy, it interprets user input such that the user does not have to edit in every frame. For reproducibility, it maintains results for multiple iterations. Realization of these goals comes from the following process. After selecting a single frame, the user watches a speedy propagation of the initial selection with minor nudges where the selection misses its mark. This allows the user to “mold” the selection in certain frames while the tool is propagating the fixes to neighboring frames. It has a simple interface, minimal preprocessing, and minimal user input. It takes in any sort of film and exploits the spatial-temporal coherence of the object to be segmented. The tool allows artistic freedom without demanding intensive sequential processing. This thesis includes three specific extensions to Intelligent Scissors for application to video: 1. Leapfrogging, a robust method to propagate a user's single-frame selection over multiple frames by snapping each selection to its neighboring frame. 2. Histogram snapping, a method for training each frame's cost map based on previous user selections by measuring proximity to pixels in a training set and snapping to the most similar pixel's cost. 3. A real-time feedback and correction loop that provides an intuitive interface for a user to watch and control the selection propagation, with which input the algorithm updates the training data.



College and Department

Physical and Mathematical Sciences; Computer Science



Date Submitted


Document Type





video segmentation, tracking, boundary, selection, rotoscope, rotoscoping, training