The Alphabet-backed AI firm is using virtual games to help its digital creations move more like humans.
DEEPMIND’S ATTEMPT TO teach an AI to play soccer started with a virtual player writhing around on the floor—so it nailed at least one aspect of the game right from kickoff.
But pinning down the mechanics of the beautiful game—from basics like running and kicking to higher-order concepts like teamwork and tackling—proved a lot more challenging, as new research from the Alphabet-backed AI firm demonstrates. The work—published this week in the journal Science Robotics—might seem frivolous, but learning the fundamentals of soccer could one day help robots to move around our world in more natural, more human ways.
“In order to ‘solve’ soccer, you have to actually solve lots of open problems on the path to artificial general intelligence [AGI],” says Guy Lever, a research scientist at DeepMind. “There’s controlling the full humanoid body, coordination—which is really tough for AGI—and actually mastering both low-level motor control and things like long-term planning.”
An AI has to re-create everything human players do—even the things we don’t have to consciously think about, like precisely how to move each limb and muscle in order to connect with a moving ball—making hundreds of decisions a second. The timing and control required for even the most basic movements can actually be surprisingly tricky to nail down, as anyone who has ever played the browser game QWOP will remember. “We do that without thinking about it, but that’s a really hard problem for AI, and we’re not really sure exactly how humans do that,” Lever says.
DeepMind’s simulated humanoid agents were modeled on real humans, with 56 points of articulation and a constrained range of motion—meaning that they couldn’t, for instance, rotate their knee joint through impossible angles à la Zlatan Ibrahimovic. To start with, the researchers simply gave the agents a goal—run, for example, or kick a ball—and let them try and figure out how to get there through trial and error and reinforcement learning, as was done in the past when researchers taught simulated humanoids to navigate obstacle courses (with comical, quite unnatural results).
“This didn’t really work,” says Nicolas Heess, also a research scientist at DeepMind, and one of the paper’s coauthors with Lever. Because of the complexity of the problem, the huge range of options available, and the lack of prior knowledge about the task, the agents didn’t really have any idea where to start—hence the writhing and twitching.
So instead, Heess, Lever, and colleagues used neural probabilistic motor primitives (NPMP), a teaching method that nudged the AI model towards more human-like movement patterns, in the expectation that this underlying knowledge would help to solve the problem of how to move around the virtual football pitch.
“It basically biases your motor control toward realistic human behavior, realistic human movements,” says Lever. “And that’s learnt from motion capture—in this case, human actors playing football.”
This “reconfigures the action space,” Lever says. The agents’ movements are already constrained by their humanlike bodies and joints that can bend only in certain ways, and being exposed to data from real humans constrains them further, which helps simplify the problem.
“It makes useful things more likely to be discovered by trial and error,” Lever says. NPMP speeds up the learning process. There is a “subtle balance” to be struck between teaching the AI to do things the way humans do them, while also giving it enough freedom to discover its own solutions to problems—which may be more efficient than the ones we come up with ourselves.
Basic training was followed by single-player drills: running, dribbling, and kicking the ball, mimicking the way that humans might learn to play a new sport before diving into a full match situation. The reinforcement learning rewards were things like successfully following a target without the ball, or dribbling the ball close to a target. This curriculum of skills was a natural way to build toward increasingly complex tasks, Lever says.
The aim was to encourage the agents to reuse skills they might have learned outside of the context of soccer within a soccer environment—to generalize and be flexible at switching between different movement strategies. The agents that had mastered these drills were used as teachers. In the same way that the AI was encouraged to mimic what it had learned from human motion capture, it was also rewarded for not deviating too far from the strategies the teacher agents used in particular scenarios, at least at first. “This is actually a parameter of the algorithm which is optimized during training,” Lever says. “Over time they can in principle reduce their dependence on the teachers.”
With their virtual players trained, it was time for some match action: starting with 2v2 and 3v3 games to maximize the amount of experience the agents accumulated during each round of simulation (and mimicking how young players start off with small-sided games in real life). The highlights—which you can watch here—have the chaotic energy of a dog chasing a ball in the park: players don’t so much run as stumble forward, perpetually on the verge of tumbling to the ground. When goals are scored, it’s not from intricate passing moves, but hopeful punts upfield and foosball-like rebounds off the back wall.
However, although in games the agents were rewarded only for scoring goals, the researchers quickly saw properties like teamwork starting to emerge. “At the very beginning of training all of the agents just run to the ball, and at some point after a few days we’d actually see that the agents would realize that one of its teammates was in control of the ball and would turn around and run up the pitch, anticipating that its teammate would try and score or maybe pass the ball,” says Lever. It’s the first time such coordination and teamwork has been seen in such a complex and quick-acting AI. “That’s one of the breakthroughs that’s interesting to me,” Lever says.
As for the point of all this? It’s not about dominating the Robot World Cup; Heess is working on imbuing some of the lower-level skills the agents have learned into physical robots to make them move in ways that are more “safe and naturalistic” in the real world. That’s not just so they don’t freak out humans that interact with them, but also because the jittery, irregular motions that may be produced by unstructured reinforcement learning could damage robots that weren’t optimized to move in those ways, or just waste energy.
It’s all part of work on “embodied intelligence”—the idea that a general artificial intelligence might be required to move around the world in some sort of physical form, and that the nature of that form might determine the way it behaves. “It’s interesting both in simulated worlds, which increasingly feature physics-based simulation, but also to develop methods for robot learning,” says Heess.
Eventually these slightly slapstick digital players could help both robots and metaverse avatars move in ways that seem more human—even if they’ll still never beat us at soccer. “Football is not really an end goal in itself,” says Lever. “There are just lots of things you need to solve in order to get there.”