Graduation Year

2017

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Sudeep Sarkar, Ph.D.

Co-Major Professor

Anuj Srivastava, Ph.D.

Committee Member

Dmitry Goldgof, Ph.D.

Committee Member

Rangachar Kasturi, Ph.D.

Committee Member

Rajiv Dubey, Ph.D.

Committee Member

Andrew Raij, Ph.D.

Keywords

Activity Recognition, Compositional Approach, Graphical Models, Pattern Theory, Video Analysis

Abstract

Description of human activities in videos results not only in detection of actions and objects but also in identification of their active semantic relationships in the scene. Towards this broader goal, we present a combinatorial approach that assumes availability of algorithms for detecting and labeling objects and actions, albeit with some errors. Given these uncertain labels and detected objects, we link them into interpretative structures using domain knowledge encoded with concepts of Grenander’s general pattern theory. Here a semantic video description is built using basic units, termed generators, that represent labels of objects or actions. These generators have multiple out-bonds, each associated with either a type of domain semantics, spatial constraints, temporal constraints or image/video evidence. Generators combine between each other, according to a set of pre-defined combination rules that capture domain semantics, to form larger structures known as configurations, which here will be used to represent video descriptions. Such connected structures of generators are called configurations. This framework offers a powerful representational scheme for its flexibility in spanning a space of interpretative structures (configurations) of varying sizes and structural complexity. We impose a probability distribution on the configuration space, with inferences generated using a Markov Chain Monte Carlo-based simulated annealing algorithm. The primary advantage of the approach is that it handles known computer vision challenges – appearance variability, errors in object label annotation, object clutter, simultaneous events, temporal dependency encoding, etc. – without the need for a exponentially- large (labeled) training data set.

Share

COinS