Video segmentation, the process of selecting an object out of a video sequence, is a fundamentally important process for video editing and special effects. However, it remains an unsolved problem due to many difficulties such as large or rapid motions, motion blur, lighting and shadow changes, complex textures, similar colors in the foreground and background, and many others. While the human vision system relies on multiple visual cues and higher-order understanding of the objects involved in order to perceive the segmentation, current algorithms usually depend on a small amount of information to assist a user in selecting a desired object. This causes current methods to often fail for common cases. Because of this, industry still largely relies on humans to trace the object in each frame, a tedious and expensive process. This dissertation investigates methods of segmenting video by propagating the segmentation from frame to frame using multiple cues to maximize the amount of information gained from each user interaction. New and existing methods are incorporated in propagating as much information as possible to a new frame, leveraging multiple cues such as object colors or mixes of colors, color relationships, temporal and spatial coherence, motion, shape, and identifiable points. The cues are weighted and applied on a local basis depending on the reliability of the cue in each region of the image. The reliability of the cues is learned from any corrections the user makes. In this framework, every action of the user is examined and leveraged in an attempt to provide as much information as possible to guarantee a correct segmentation. Propagating segmentation information from frame to frame using multiple cues and learning from the user interaction allows users to more quickly and accurately extract objects from video while exerting less effort.



video segmentation, image segmentation, image matting, color modeling