December 21, 2012

Edward, Bella, and McGurk: Why Bad Lip-Synching Is So Funny

"You slapped a fiiiish. Why would you do that?" "I wanted some seafood." At nearly 16,000,000 views at the time of this writing, this "bad lip-synching" of Edward and Bella is objectively hilarious.

By Kyle Hill

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

“You slapped a fiiiish. Why would you do that?”

“I wanted some seafood.”

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

At nearly 16,000,000 views at the time of this writing, this “bad lip-synching” of Edward and Bella is objectively hilarious. Funny lip-synching videos litter the Internet, putting ridiculous words in the mouths of everyone from Mitt Romney to Bane. A shared love of making fun of people seems to dictate that these videos are funny, and that’s that. But why? The best “bad lip-synchs” take advantage of how our brains process speech.

Not Just What We Hear

Speech recognition is a concatenation of many diverse internal pattern seeking programs, all looking for minute changes in everything from the tone and volume of speech to the physical motion of a person’s mouth. So it’s not just what you hear, but also what you see.

Even though speech is primarily auditory, we prioritize the kind of information we are getting depending on the context (both consciously and unconsciously). For example, during a particularly lengthy foreign film, we learn to ignore both the visual (i.e., the mouth movements) and the auditory aspect of speech to focus solely on the words on-screen. This isn’t speech as we typically recognize it, but anyone who has suffered through a badly-dubbed film knows that the more difficult it is to link those words on screen to the person speaking, the more aware you become of your aching backside. Our brains seek to synch even disembodied words to their owners.

Likewise, imagine that you see a friend across the room at a crowded “End of the World” party. They are barely audible above the din of “Play Gangnam Style!” requests, so you focus intensely on their mouth movements. Add the minimal auditory input to the “enhanced” visual input, and you can just make out that they want another beer.

Because diminishing either what we see or what we hear during speech diminishes the whole, this points to the fact that speech perception is an aggregation of more than just one sense like hearing. It is multimodal.

But the interpretation of speech is not the only case where our brain crosses multiple wires, so to speak. The taste of something is another multimodal perception. For example, when water spouting from a bubbler with iron piping tastes like iron you are in reality smelling the iron, which is then combined in your brain into the “irony” taste of the water (as the tongue has no “iron” taste receptors).

As another example of how strong this connection can be, just think about eating a green French fry or yellow steak. Even if the food were perfectly normal, I’d bet you would hesitate to bite into it. Or consider the sad case of Crystal Pepsi. In 1992 Pepsi decided to change the color of their soda from brown to clear, while maintaining the same flavoring and ingredients. The sale of the soda plummeted. It was pulled from the shelves in 1993.

Just as what you smell and what you perceive on your tongue can together make up what we taste, what we see and what we hear when people are speaking combine to form our perception of what someone is saying.

The McGurk Effect

There is perhaps nothing that makes us question how we actually sense the world more than illusions. Not only do they amaze us, they offer clues into how the brain processes sensory information. One of the most common optical illusions, the “Necker Cube” is so mystifying with its shifting depths because our brains have competing 3D models of what the cube should look like. As it arbitrarily flips between them (somewhat driven by attention to certain details), our pattern-seeking minds reveal their software. There are also speech illusions.

The McGurk Effect is a phenomenon where the auditory component of one sound is combined with the visual component of a second sound, resulting in a perceived third sound. To do this illusion effectively, you need a dubbed video. In it you have a speaker mouth the syllables “va/va/va” while playing the sounds of “ba/ba/ba” over the video. What you see then overrides what you hear, changing the played sound of “ba/ba/ba” to “va/va/va” in your mind, even though the audio never changes. You can watch this BBC video if you want to have your mind sufficiently blown by this illusion. The really amazing part is that, during the illusion, if you close your eyes and therefore shut off the visual part of your speech recognition, the illusion immediately dissipates! (The video linked to above does a great job in pointing this out.) The on/off switch to this illusion couldn’t make it any clearer: speech perception is much more than what we hear.

This of course brings us back to Twilight.

Making Fun of Sparkling, Pasty Vampires

To successfully mess with our speech perception, the words substituted in the “bad lip-synching” Twilight video need to have accompanying mouth movements that, when spoken, mimic the original lines in the movie. The humor then emerges from this tip-toeing on a tightrope of plausibility–a lip-synching that is close enough to confuse us yet far enough away from perfect is hilarious. It gets funnier as the words synch up with the mouth movements more squarely (and a good impression of each character helps, as in the case of this uncanny Bane impression). Combine all of this with the fact that you are watching Bella scold Edward for punching a fish and you get a viral video.

It’s not that you are seeing incorrect speech in these videos; you are in fact seeing a different speech. Just as coloring a French fry green can make it taste repulsive, a vampire talking about eating cake with the seemingly correct mouth movements is LOL-inducing because we tentatively perceive it as the genuine article.

Watch one of the videos again and notice how you are inevitably drawn to studying the mouths of the speakers to see just how close the match is, to examine if it is “real.” Even when the synching is not perfect, because we are looking to be entertained, we give leeway to the inevitable shoehorning of ridiculous words and phrases into the video; the asinine become the authentic.

And when we don’t have a mouth to examine, words can shape what we recognize as speech. Case in point, this video shows how easy (and hilarious) it is to mistake the classical composition “O Fortuna” for a song about men liking cheese. Combine both lip-synching and text put to what we hear, you get an elf who is sick of Barack Obama.

I think it all comes down to believability. Right off the bat we do not believe that Edward asked whether or not mice have “wee-wees.” But if the impression is decent, if the mouth movements synch-up, we suspend our disbelief and revel in a reality where teen-dream vampires ask such questions. Likewise, many of us know that most music videos are actually lip-synched, but we have gotten so good at synching them that nobody seems to mind. Bottom line for prospective video makers: take advantage of our multimodal speech perception well enough, and you can make a ventriloquists’ dummy out of anyone.

A particularly tight synching garners the immediate “It looks like that is what they are actually saying!” response. In a way it is, and it’s damn funny.

Further Watching:More “bad lip-synching” videos