Course
Projects at UMD
Detection of Independently Moving Objects in Video
In this project, an algorithm in Matlab was implemented to detect independently moving objects in a video. The problem consist of detecting moving objects by an independent moving observer. In general, the problem is extremely hard but the with the use of stereo information, it can be simplified. Using the fact that the depth calculated from stereo and that from optical flow will be considerably different for an independently moving object, we can perform the segmentation.
First, the camera was calibrated using images of a calibration pattern. The stereo images are then rectified so as to reduce the correspondence search to one dimension. For each pair of stereo image, disparities for each pixel is computed and depth map calculated using calibration info. Using the depth from stereo and optical flow between images, estimation of camera ego-motion is a linear problem which can be solved using robust techniques such as Least Median of Squares. The ego-motion estimates are then used to solve for depth using optical flow. Finally, depth from stereo and optical flow is compared and regions of discrepancy are marked as independently moving objects.
Original Video with Independently Moving Person Segmented Video
Video Manipulation: Removing Banners
The aim of this project was to manipulate a video by deleting and inserting objects in it. Given a ice-skating video showing a banner, the aim is to replace the existing banner with some other texture. This involves identifying the banner pixels (parts of it may be occluded by the skaters) for each frame and replacing them with some other texture at un-occluded pixels. Here we use the fact that the banner is planar, so its image in different frames are related by a planar homography.
First the corners of the banner are manually located in the first frame. Then feature points are extracted in first frame which lie inside the banner boundary automatically using a corner detector. These features are then tracked in subsequent frames using block matching and transferring the feature points using homography from the previous frame. Four points are necessary for calculating the homography between the frames but many more are used. Robust Least Median of Squares is used since there can be significant occlusion of banner by the skaters.
After calculating the homographies between the first frame and all other frames, all frames are warped with respect to the first frame and median of all warped images is obtained. The assumption is that any pixel is not occluded in more than 50% frames so that median is a good estimate. The median image gives the banner image totally un-occluded. Using the median image, pixels which are un-occluded are identified in each frame and are replaced by a new banner.
Original Skating Video with Baileys Banner Replaced Baileys Banner with Coca-Cola Banner
Video Manipulation: Removing Persons
The aim of this project was to remove a person from a video captured using a camera rotating around the Y axis. Since the camera is rotating with no translation, different frames are related by a planar homography. A moving object or person in the video would not adhere to the homography but any stationary object (background etc.) will, on the basis of which it can be identified and removed.
First inter-frame homographies are computed by detecting features using corner detectors in frames. Least Median of Squares is used for robust computation of homographies. Then all frames are warped with respect to a single frame and median of all warped frames is obtained. Again the assumption is that any pixel does not belong to an independent moving person in more than 50% of all frames. The median image is then the mosaic of the scene without the independent moving person. Using the homographies obtained earlier, one can then simply warp back the median mosaic image to all the frames and get the video without the person.
Original Video with
person Person
Removed from Video
Estimating relative camera rotation and translation from two images
The aim here was to solve the relative orientation problem. A pair of images from a camera is given. Feature points are detected and the fundamental matrix is computed. Using the camera calibration parameters, the essential matrix can be obtained which can be decomposed to give the camera rotation and translation. In general, there are 4 solutions but the constraint that the depth of scene points should be positive can be applied to identify the solution which gives maximum positive scene depths.
Structure from Motion (SFM) using Factorization theorem and Extended Kalman Filter (EKF)
From a set of corresponding feature points in N images, a feature matrix can be obtained which can be factorized to give the shape and motion under orthographic projection. This popular method of obtaining the shape and motion simultaneously is called Factorization theorem. It has been extended to para-perspective projections. Factorization theorem for both para-perspective and orthographic projection were implemented in Matlab. The factorization theorem is a batch procedure in that it uses all the feature points for all images simultaneously.
In certain situations, its is favorable to have a sequential method to save memory for SFM applications. EKF have been applied in such cases to refine the estimate of shape with each set of features in new frame and to estimate the inter-frame camera motion. The EKF based method was implemented in Matlab. The initialization for the EKF approach was done using a batch procedure and the covariance matrices were obtained. See the report for details.
Supervised and Unsupervised Texture Segmentation using Markov Random Fields
In this project, different techniques for texture segmentation were implemented in Matlab. The underlying texture was assumed to be a Markov random field and both supervised and unsupervised segmentation were implemented and compared. See report for details.