Skip to main content

Engines of Inference: Peter Tu’s Tech Uses Computer Vision to Understand Emotions, Human Behavior

November 06, 2015
If Peter Tu’s technology takes off, an airplane could know when pilots get distracted in the cockpit and offer suggestions to help them make better decisions. Similarly, doctors could receive important tips on communicating and improving body language when presenting patients with a diagnosis.
These and many other advances are in the works thanks to research into computer vision, which harnesses powerful algorithms and the unflinching eye of cameras to analyze the real world. Computer scientists and engineers are pursuing the goal of giving machines human senses so that they can become aware of their surroundings.

There are few places where this far-out work is closer to commercialization than at GE Global Research's Computer Vision Lab, where Tu, the Oxford-trained senior principal scientist of the lab, works. Technologies than can enhance our minds and bodies will be also featured in the second episode of Breakthrough titled More than Human and directed by Paul Giamatti. The six-part TV documentary series, which is focusing on scientific progress and innovation, was developed by GE and National Geographic Channel. It airs on Sunday at 9pm ET on the NatGeo Channel.

PeterTu Lab1 “What we're trying to do is to create computer models of people — their digital twin," says Peter Tu. Image credit: GE Reports

Over the last few years, the technology has progressed from simple tasks — camera systems designed to make sure health care workers wash their hands before and after touching patients or to monitor a patient's face for signs that he is experiencing pain — to significantly more nuanced and complex activities.

Tu explains that the lab’s computer-connected cameras are becoming able to identify a range of physical characteristics such as facial expressions, body language, the direction of a person's gaze and the distance from another person. His team then uses programming to translate those characteristics into a general understanding of how a person or group feels at any particular moment.


Expressions such as a smile or frown are all machine-readable inputs that can be analyzed to reveal the level of trust or hostility between people, a person's confusion while operating a machine or whether a salesperson is developing rapport with a potential customer.

“What we're doing now is building inference engines that consume interactions and expressions between people to estimate their emotional state and the broader social context,” Tu says. “What we're trying to do is to create computer models of people — their digital twin. We ask, 'What is the internal state of an individual and how do their interactions reveal that state?'”

Tu believes that over time, computer vision systems could be deployed in health care settings to train doctors and retail employees on how to better interact with people, in crowd control, public safety and military applications, and for industrial uses.

Tu says the first commercial deployment will likely come in a year, when a GE system will be ready to measure heightened anxiety levels in crowds populating public spaces.

He says that over the next two or three years, such systems will become available to operate in what he calls the “man with machine” space — constantly watching a pilot or train engineer's face to detect signs of anger or exhaustion, or monitoring a medical-imaging technician or nuclear plant operator to look for indications of confusion while operating complex systems.

“We're looking at this ability as a way to detect situations before they turn into a catastrophe, like a pilot or machine operator who's multitasking and runs the risk of getting into a bad situation,” Tu says. “Not just watching if someone is falling asleep, but if they are overly taxed. If we can recognize that then we can save a lot of lives and prevent a lot of accidents.”

Computer vision systems could also analyze how people interact with each other. These will be able to identify the full suite of human expressions, body language, gaze, audio signals and proximity measurements to understand the dynamics of human interaction in groups. This will open complex group dynamics up to acute dissection through data analysis. How do people interact in teams? How can a doctor quickly develop a rapport with a patient? The system will provide real-time and continuous feedback as to how a person is doing his or her job. and also help with training.

The current iteration of the system in Tu’s lab uses a couple of desktop computers connected to eight pan-tilt-zoom security-type cameras and three specialized cameras that record color images and depth information at the same time. This is enough data for the computer vision algorithms to classify and analyze specific placement and movement of the human body.

Tu says that his work takes humanity down an interesting technological path that has philosophical dimensions about what intelligence and emotions mean. By accurately analyzing a person's behavior and then use that information to building a digital model that simulates the person's hidden inner state, a future, more capable computer vision system might be able to predict an action the person hasn't performed yet. This ability sits at the very core of what a human does during every interaction with another human.

Still, Tu says his team's work is focused not on creating sci-fi AI robots that can empathize with humans and feel sad, happy or angry. Instead, it's all in the service of making a world that works better. “If we can give empathy to machines so they can read and understand how behavior gives a window into a person's emotions, then they can be more aware of users and possibly give those users a better experience,” he says.