As I write this, I realize you are at a disadvantage. You can’t see my facial expressions or gestures, or hear the inflections of my voice that convey so much meaning. You have only my words.
Because so much communication is non-verbal, it’s easy to understand why interactions between people and computers are so truncated. We’ve progressed from punchcards to keyboards and, thanks to apps such as Siri, to speech interfaces, but machines still strain to understand us by our words alone.
That’s why recent advances in machine emotional intelligence are so awesome. Thanks to improvements in camera technology and computer vision algorithms, computers are poised to take a big leap in their ability to understand us from our facial expressions, the ways our eyes move, our gestures, the way we talk, and even how we cock our heads.
Imagine the possibilities: A virtual psychiatrist could help diagnose depression by analyzing the emotions we display during clinical interviews; it could even quantify changes in mood as the disease progresses or as therapies kick in. Marketers could better gauge how audiences respond to their products and ads, while teachers could assess whether a lesson plan was fully engaging students. Smartphones might alter directions and advice if they perceive that we are upset or confused.
In other words, our passionless devices will come to know us through the emotions we all wear on our sleeves.
Computer vision researchers have been pursuing this goal for decades. What has changed? Camera technology is part of the answer. To understand facial expressions, it is necessary to detect often subtle variations – the tension of a cheek muscle, the cock of an eyebrow, the set of the mouth. Until recently, however, a human face looked like a big blob of pink to most web cameras. But now even ordinary smartphones are boasting high quality cameras that can capture many of the facial movements that display emotions and intentions.
Another change has been an increase in the computational power and memory that is routinely available, making it practical to run emotion-sensing algorithms of increasing complexity and sophistication. And the computer vision algorithms themselves are becoming more accurate and more efficient – and capable of working in real-time. This is in part because of the availability of large, well-annotated video databases with which to train facial expression algorithms.
Here at Carnegie Mellon University, in our Robotics Institute, Fernando De la Torre has led development of some particularly powerful facial image analysis software, called IntraFace. His team has used machine learning approaches to teach IntraFace how to identify and track facial features in a way that is generalizable to most faces. They then created a personalization algorithm that enables the software to perform expression analysis on individuals. It’s not just accurate, but efficient; the software can even run on a smartphone.
De la Torre and Jeffrey Cohn, a psychologist at the University of Pittsburgh, already have had encouraging results in detecting depression in psychiatric clinical trials. Detecting depression “in the wild” requires the ability to capture subtle facial expressions, which they are doing.
None of this is foolproof, of course. An actor might successfully fake an emotion expression. However, because fake and spontaneous expressions have different timing, algorithms that attend to timing are not so readily fooled. Facial expressions, further, are embedded within other non-verbal behavior. Cohn and his colleagues found subtle differences in vocal timing discriminated between severe and remitted depression.
Another faculty member, Louis-Philipe Morency of our Language Technologies Institute, is using multimodal machine learning to assess a patient’s non-verbal behaviors to help clinicians better assess depression and anxiety disorders. He envisions this technology not only helping to diagnose disease, but also quantifying emotional responses in a way that helps doctors track a mental disorder, much as blood tests and X-rays help doctors monitor physical disorders.
If machines can understand our emotions, the interactions we have with those machines become much richer. Here at Carnegie Mellon, Justine Cassell studies the educational uses of virtual peers – practically life-size animations of children that can converse with students. She finds that students are more actively engaged and learn more when the virtual peer is able to react appropriately to the student’s emotional state – even razzing them on occasion.
It’s not hard to envision how businesses might use this capability. Advertisers, marketers and film producers could get much more fine-grained information from focus groups and screenings. And given that at some point we’re all stuck on a call with a company’s automated phone system, imagine what it would be like if the system could sense and respond when we are finally losing our patience or about to hang up.
We’ve been working a long time on such capabilities and it seems that we are on the brink of some major breakthroughs. I anticipate that 2016 will be a watershed year for machine emotional intelligence and that emotion will become a powerful new channel for interacting with our machines.