Will Artificial Intelligence Win the Caption Contest?

Posted by blexy on April 27, 2016 4:00 am
Tags:
Categories: Blender

Will Artificial Intelligence Win the Caption Contest?

Neural networks have mastered the ability to label things in images, and now they’re learning to tell stories from a set of photos.

When social-media users upload photographs and caption them, they don’t just label their contents. They tell a story, which gives the photos context and additional emotional meaning.

A paper published by Microsoft Research describes an image captioning system that mimics humans’ unique style of visual storytelling. Companies like Microsoft, Google, and Facebook have spent years teaching computers to label the contents of images, but this new research takes it a step further by teaching a neural-network-based system to infer a story from several images. Someday it could be used to automatically generate descriptions for sets of images, or to bring humanlike language to other applications for artificial intelligence.

“Rather than giving bland or vanilla descriptions of what’s happening in the images, we put those into a larger narrative context,” says Frank Ferraro, a Johns Hopkins University PhD student who coauthored the paper. “You can start making likely inferences of what might be happening.”

Consider an album of pictures depicting a group of friends celebrating a birthday at a bar. Some of the early pictures show people ordering beer and drinking it, while a later photo shows someone asleep on a couch.

“A captioning system might just say, ‘A person lying on a couch,’” Ferraro says. “But a storytelling system might be able to say, ‘Well, given that I think these people were out partying or out eating and drinking, then this person may be drunk.’”