Skip to main content

Crowd Watching: Video Analytics Could Flag Crimes Before They Happen

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American


Soon after the investigation into Monday's Boston Marathon bombings began, law enforcement urged the public to e-mail any video, images or other information that might lead them to the guilty party. "No piece of information or detail is too small," states the F.B.I.'s Web site. Picking through all of this footage in search of clues has been no small task for investigators, given the size of the camera-carrying crowd that had assembled to watch the race, not to mention the video surveillance already put in place by the city and local merchants.

Law enforcement now say they have found video images of two separate suspects carrying black bags at each explosion site and are planning to release the images Thursday so that the public can help identify the men, the Boston Globe reports.

Whereas software for analyzing such video can identify and flag objects, colors and even patterns of behavior after the fact, the hope is that someday soon intelligent video camera setups will be able to detect suspicious activity and issue immediate warnings in time to prevent future tragedies.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


A team of New York University researchers is working toward that goal, having developed software they say can measure the "sentiment" of people in a crowd. So far, the technology has primarily been tested as a marketing tool at sporting events (gauging what advertisements capture an audience’s attention, for example), but the researchers are eyeing homeland security applications as well. The U.S. military, which is funding much of the N.Y.U. research, is interested in knowing whether this software could detect when someone is approaching a checkpoint or base with a weapon or explosives concealed under their clothing.

"So far, we can detect if they're eating or using their cell phones or clapping," says N.Y.U. computer science professor Chris Bregler. It's not an exact science, but monitoring crowd behavior helps marketers understand what creates a positive crowd response—whether they are high-fiving action on the field, responding to a call for "the wave" or laughing at an advertisement on the scoreboard. The software is programmed to detect only positive sentiment at this time. Negative sentiments—booing and impolite gestures--are next on the researchers' agenda.

The key to analyzing video in real time is programming the accompanying analytical software to look for certain cues--a rigid object under soft, flowing clothing, for example--and issue immediate alerts. First, the software must be "trained," Bregler says. This is done with the help of Internet services such as Amazon's Mechanical Turk digital labor marketplace, where participants are paid to analyze and tag video footage based on what's on the screen. Bregler and his team load these results into a computer neural network—a cluster of microprocessors that essentially analyzes relationships among data—so that the software can eventually identify this activity on its own.

One challenge for the researchers is developing its analytical software so that it can examine a variety of different types of video footage, whether it's professional-quality camerawork on the nightly news or someone recording an event with a shaky cell phone camera. "The U.S. military wants us to look at, say, Arab Spring footage and large demonstrations for early signs that they will turn violent," Bregler says.

Bregler's earlier research to identify specific movement signatures (see video below) used the same motion-capture technology used for special effects in the Lord of the Rings and Harry Potter movies. Bregler's motion-analysis research attracted the attention of the Pentagon's Defense Advanced Research Projects Agency (DARPA) in 2000 as a possible means of identifying security threats. Following 9/11 his researched ramped up thanks to funding from the National Science Foundation and the U.S. Office of Naval Research. Law enforcement and counterterrorism organizations already had facial-recognition technology but were looking for additional ways to better make sense of countless hours of surveillance footage.

Given that people don't normally walk around in tight-fitting motion-capture suits laden with reflective markers, the N.Y.U. team developed their technology to focus more on scanning a camera’s surroundings and identifying spots that are unique, such as the way light reflects off a shirt's button differently than it does off the shirt's fabric. The researchers' goal is for their software to be able to identify a person's emotional state and other attributes based on movement.

Without such advanced video analytics, investigators must essentially reverse-engineer the action depicted in the video they receive, Bregler says. In the case of the Boston Marathon, the researchers have been analyzing video of the explosions and then working backward to see who was in the area prior to the bombing. "Most likely the data needed to figure out what happened exists," he adds. "Investigators just need to find it, which is difficult given the volume of the video coming in."