Graduation Year


Document Type




Degree Granting Department

Computer Science

Major Professor

Sudeep Sarkar.


blob detection, normalized cut, connected component in XYT, auto regressive model, normalization, blob grouping


The ability to count people from video is a challenging problem. The scientific challenge arises from the fact that although the task is pretty well defined, the imaging scenario is not well constrained. The background scene is uncontrolled. Lighting is complex and varying. And, image resolution, both in terms of spatial and temporal is usually poor, especially in pre-stored surveillance videos. Passive counting of people from video has many practical applications such as in monitoring the number of people sitting in front of a TV set, counting people in an elevator, counting people passing through a security door, and counting people in a mall. This has led to some research in automated counting of people. The context of most of the work in people counting is in counting pedestrians in outdoor settings or moving subjects in indoor settings.

There is little work done in counting of people who are not moving around and very little work done in people counting that can handle harsh variations in illumination conditions. In this thesis, we explore a design that handles such issues at pixel level using photometry based normalization and at feature level by exploiting spatiotemporal coherence that is present in the change seen in the video. We have worked on home and laboratory dataset. The home dataset has subjects watching television and the laboratory dataset has subjects working. The design of the people counter is based on video data that is temporally sparsely sampled at 15 seconds of time difference between consecutive frames. Specific computer vision methods used involves image intensity normalization, frame to frame differencing, motion accumulation using autoregressive model and grouping in spatio-temporal volume. The experimental results show: The algorithm is less susceptible to lighting changes.

Given an empty scene with just lighting change it usually produces a count of zero. It can count in varying illumination conditions. It can count people even if they are partially visible. Counts are generated for any moving objects in the scene. It does not yet try to distinguish between humans and non-humans. Counting errors are concentrated around frames with large motion events, such as a person moving out from a scene.