Human behavior is organized on various levels, from millisecond muscle twitches to cognitive judgments made in hundreds of milliseconds to longer-term socially informed goal-directed sequences that take place over seconds, minutes, or even years. Only an incomplete understanding exists of how low-level motor commands are structured to support cognitive-level judgments, high-level goals, and social coordination.
Although the need for integrating several levels of control into intelligent embodied systems has long been recognized, little is known about the guiding concepts that guide their development. Reinforcement learning (RL) can yield complicated solutions from a straightforward, high-level goal specification. However, it is still challenging to train embodied systems to exhibit intelligent behavior that spans numerous scales.
DeepMind introduces general techniques for mastering long-term coordinated decision-making and integrated high-dimensional motor control. The team integrated multi-agent RL with pretrained behavior representations at different scales to solve the difficulties of behavior specification at different scales, credit assignment, and exploration. When possible, the strategy uses learned knowledge through imitation, and the autocurriculum that develops in populations of learning agents during self-play enables the development of sophisticated and reliable solutions that would be challenging to define through reward or learning from imitation. This paradigm has some vague similarities to human learning due to the steady acquisition of abilities to increase complexity, the mixture of multiple learning methods that fall between imitation and deliberate practice, and the repurposing of existing skills.
By teaching teams of humanoids with 56 degrees of freedom to play a computer-generated football game with realistic physics, the team demonstrates how motor control and decision-making emerge in groups of autonomous embodied agents. Various studies and statistics, including those in real-world sports analytics, were employed to quantify the evolution of complicated locomotor behaviors and teamwork. Their findings show that agents learned how to play coordinated football. Following training, they could bridge the gap between team-directed behavior at tens of seconds and low-level motor control at a time scale of milliseconds during gameplay.
The researchers first developed a motor primitive module based on human motion capture clips to encourage naturalistic motions and facilitate exploration. This module delivered immediate human-like movement in response to an abstract motor command. It was employed as a network component in the football policies in the form of a low-level controller. It provided the freedom to build movement sequences not present in the original motion capture clips by restricting the space of learnable motions to a generic manifold of human-like movements while minimally restricting the generated behavior.
Secondly, they pretrained behavior priors to direct learning and promote exploration in the long-horizon sparse reward football task. The generated abilities were modeled as reusable football skill priors, pluripotent stochastic policies that can generate many examples of the behaviors connected to the soccer drills. They used these skill priors during multi-agent training to bias football policies toward these intermediate soccer activities by regularising behavior. A context-sensitive and adaptive imitation loss that penalizes a football policy’s Kullback-Leibler (KL) divergence from the skill priors served as the regularisation. PBT improved the relative weight of this decrease much more.
They also show an incremental training procedure that enables us to train agents to play two-versus-two (2v2) games and then transfer these players to games with a larger number of players for further training. Training costs significantly increase with the number of players per team. Their attention-based perception architecture makes this possible.
This Article is written as a research summary article by Marktechpost Staff based on the research paper article 'From motor control to team play in simulated humanoid football'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, article, reference article. Please Don't Forget To Join Our ML Subreddit
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.