Training an artificial intelligence about humans with 600 hours of television
Watching TV isn’t necessarily strenuous, but we should give ourselves a bit of credit for how good we are at it. Thanks to the combination of our visually-oriented brains and propensity for learning social patterns, humans can cruise through episodes of familiar sitcoms without a lot of conscious effort to make sense of what’s going on. As long as that show doesn’t stray too far off the beaten path, we can probably predict the outcomes of many presented scenarios well before they’ve played out on screen. While computer scientists aren’t terribly worried about artificial intelligence (AI) that can predict cliffhangers or punchlines, they are using training algorithms to predict human behavior by watching television.
This might seem, for a variety of reasons, like a very bad idea. Few of us would claim that the characters and interactions in most television shows are representative of humanity, and wouldn’t want even an artificial intelligence to judge real life according to sitcom scripts. We don’t need to worry about that kind of reasoning (at this point,) because the challenge being taught through television watching is more about parsing visual cues. If you’re reading this, you’ve probably seen people reach forward with one hand enough times to guess when they’re going to shake hands versus pick something off a shelf, but for a computer that’s not so simple a task. Without recreating every aspect of a human’s visual neurological functions, scientists are instead aiming just to get computers to start with some basics about what humans are doing when they move in different ways, without learning every bit of culture and context we can employ without effort.
Spotting high-fives, hugs or handshakes
This particular lesson had an artificial intelligence watch and parse over 600 hours of television shows, looking for a small set of specific behaviors, like handshaking, hugging, and giving high-fives. When the computer saw the beginning of a specific action, it would make a prediction of what that action was going to be. With corrections along the way, the computer was able to guess correctly 43 percent of the time, compared to actual humans rate of 71 percent. With more data, these scores are expected to improve, as the AI will be able to better employ visual cues that hint at what’s about to happen.
This isn’t the first time this kind of study has been done, but it is one of the most successful. One difference is the level of detail the AI was programmed to start with. Rather than analyze every pixel in an image, the algorithm built a model of the major basic components of every scene, and focused on what they were doing. So rather than get caught on the color of a chair, or length of hair on a character, the AI would look for key scaffolding of a scene, identifying what was a face, an arm, and furniture. This would allow it to focus computational power on these likely points of interaction, rather than on minutiae unlikely to matter to an upcoming high-five.
Why watch TV
This work isn’t being done so that computers can understand and enjoy an episode of the Office, of course. Television was used simply because it was a convenient way to get many variations of human activity in a convenient format. The long-term goal is to have AI that can better understand spontaneous human activity in real-time, recognizing when someone is about to be at risk for injury, or just needs help opening a door. Even with socialized upbrinings and brains well-suited for the task, your average adult will still guess wrong about the intentions of others from time to time, so getting an AI to somehow catch up is no small feat. Good thing there’s so much TV out there to watch.
Source: A Computer Binge-Watched TV And Learned To Predict What Happens Next by Riley Beggin, All Tech Considered