Graduation Year

2012

Document Type

Dissertation

Degree

Ph.D.

Degree Granting Department

Computer Science and Engineering

Major Professor

Sudeep Sarkar

Keywords

Correspondence, Epipolar Geometry, GPS, Magenetometer, Minimum Spanning Forest

Abstract

We propose algorithms for organization of images in wide-area sparse-view datasets. In such datasets, if the images overlap in scene content, they are related by wide-baseline geometric transformations. The challenge is to identify these relations even if the images sparingly overlap in their content. The images in a dataset are then grouped into sets of related images with the relations captured in each set as a basal (minimal and foundational) graph structures. Images form the vertices in the graph structure and the edges define the geometric relations between the images. We use these basal graphs for geometric walkthroughs and detection of noisy location (GPS) and orientation (magnetometer) information that may be stored with each image.

We have five algorithmic contributions. First, we propose an algorithm BLOGS (Balanced Local and Global Search) that uses a novel hybrid Markov Chain Monte Carlo (MCMC) strategy called 'hop-diffusion' for epipolar geometry estimation between a pair of wide-baseline images that is 10 times faster and more accurate than the state-of-the-art. Hops are global searches and diffusions are local searches. BLOGS is able to handle very wide-baseline views characteristic of wide-area sparse-view datasets. It also produces a geometric match score between an image pair. Second, we propose a photometric match score, the Cumulative Correspondence Score (CCS). The proposed photometric scores are fast approximations of the computationally expensive geometric scores. Third, we use the photometric scores and the geometric scores to find groups of related images and to organize them in the form of basal graph structures using a novel hybrid algorithm we call theCOnnected component DIscovery by Minimally Specifying an Expensive Graph (CODIMSEG). The objective of the algorithm is to minimize the number of geometric estimations and yield results similar to what would be achieved if all-pair geometric matching were done. We compared the performances of the CCS and CODIMSEG algorithms with GIST (means summary of an image) and k-Nearest Neighbor (k-NN) based approaches. We found that CCS and CODIMSEG perform significantly better than GIST and k-NN respectively in identifying visually connected images. Our algorithm achieved more than 95% true positive rate at 0% false positive rate. Fourth, we propose a basal tree graph expansion algorithm to make the basal graphs denser for applications like geometric walk-throughs using the minimum Hamiltonian path algorithm and detection of noisy position (GPS) and orientation (magnetometer) tags. We propose two versions of geometric walkthroughs, one using minimum spanning tree based approximation of the minimum Hamiltonian path on the basal tree graphs and other using the Lin-Kernighan heuristic approximation on the expanded basal graph. Conversion of a non-linear tree structure to a linear path structure leads to discontinuities in path. The Lin-Kernighan algorithm on the expanded basal graphs is shown to be a better approach. Fifth, we propose a vision based geometric voting algorithm to detect noisy GPS and magnetometer tags using the basal graphs. This problem has never been addressed before to the best of our knowledge.

We performed our experiments on the Nokia dataset (which has 243 images in the 'Lausanne' dataset and 105 images in the 'Demoset'), ArtQuad dataset (6514 images) and Oxford dataset (5063 images). All the three datasets are very different. Nokia dataset is a very wide-baseline sparse-view dataset. ArtQuad dataset is a wide-baseline dataset with denser views compared to the Nokia dataset. Both these datasets have GPS tagged images. Nokia dataset has magnetometer tags too. ArtQuad dataset has 348 images with the commercial GPS information as well as high precision differential GPS data which serves as ground truth for our noisy tag detection algorithm. Oxford dataset is a wide-baseline dataset with plenty of distracters that test the algorithm's capability to group images correctly. The larger datasets test the scalability of our algorithms. Visually inspected feature matches and image matches were used as ground truth in our experiments. All the experiments were done on a single PC.

Share

COinS