Traffic Pattern Modeling and Prediction in Sensor Networks


Experimental Setup

Experimental Setup in a smart building. Five cameras are places in the intersections of stairways, elevators and hallways.

Sample Training Videos from five cameras

Sample sequences from camera 1,2,4,5

Sample frames from four cameras. The frames are captured approximately at 4 frames per second. The lighting condition is changing over time and space.

Pedestrain Detection Results

One advantage of using HOG to aid feature extraction is that we could handle multi-detection cases. During the training process, there are some cases that more than one subject are detected in a frame or the bounding box coordinates of two consecutive frames are far from each other. The below figures and videos show some mixed-detection examples. In figure (a) and (c), the background image is one selected frame of the sequence. The colored rectangles represent the bounding boxes of HOG detection result.







In (a) and (b), one subject remains to the right side of the view, another subject walks from left corner to right. In this sequence, their detected bounding boxes are not overlapped, although their detections are time-interleaved. It is obviously incorrect if we cluster all the detections in this sequence to one subject just based on the observation that they are time-continuous. Actually, in this case, we can still easily distinguish the detections to two clusters, as the bounding box are clustered.

In the sequence shown in (c) and (d), one subject first walks upstairs from first floor to third floor (the camera is placed on second floor), then another subject appears from the stair and walk towards right of the view. Although the detected bounding boxes are overlapped and time-interleaved, we can still make use of the direction cue, i.e., if the bounding box of the sequence is first moving leftwards, and suddenly appears to the right, then we cannot ensure that those frames belong to the same cluster. To verify it, we could perform a local feature matching, i.e., assume the turning point happens at time t, and we compute the distance between two parts of the detection (before and after t). If the distance is small, the two parts are classified as the same cluster, i.e., belonging to same subject.

Experimental Results

Human Representation

In the paper, we use normalized RGB histogram with 20 x 20 bins of upper human torso to represent human. After fitting an ellipse in the bounding box of a detected human to remove background pixels, we compute the normalized RG histogram for the foreground part (within the ellipse). The normalized RG histogram used has 64 x 64 bins (64 bins for R, G channel respectively). In the experiments (as shown in below three figures), we found that the components are mainly in (10:30,10:30) range. Therefore, to reduce computational cost and also to increase the matching capability, we use the 20 x 20 bins to represent human.

RG hist1 RG hist2 RG hist3

We have also conducted experiments to evaluate the matching performance using the histogram from upper torso (upper-part ellipse) or from the entire torso (the whole ellipse). Surprisely, the former proves to be better. The reason may be that the color of lower torso, i.e., the color of pants has little variation compared to the upper part. Including them in the matching process may instead reduce the discrimination. Besides, the lower part of the ellipse inevitably includes some background pixels (always more than the portion in upper-part), which further reduce the identification capability.

Matching Metric

We have done a series of experiments to find an appropriate matching metric. The candidates include histogram intersection, Bhattacharyya distance, chi-square distance, sum of squared differences (SSD) and EMD. Histogram intersection proves to be simple yet effective. For example, in a 6-subject matching experiment, with 20 x 20 normalized RG histogram, only histogram intersection and Bhattacharyya distance have perfect matching performance. And histogram intersection metric is more attractive for its simpleness if we want to have a real-time online system in the future.

A brief introduction of our work with some earlier results can be found here.

Back to Zaihong Shuai's Home Page