Researchers at Meta have created Cicero, AI that can play Diplomacy like humans.
That’s a significant accomplishment in natural-language processing. It may help people forget about Galactica, a large language model Meta researchers trained on scientific papers that presented falsehoods as facts.
Diplomacy, a Hasbro game from the 1950s, focuses on communication and negotiation between seven European powers at the turn of the century. Some gamers regard it as an excellent method to lose buddies.
About the game
On a map of Europe, the game represents capturing territories. Players write their moves beforehand and carry them out concurrently rather than taking turns. Players converse privately with one another to avoid making plays that get blocked because an opponent makes a counter move. They deliberate over potential coordinated manoeuvres before committing their actions to paper, honouring or breaking promises made to other players.
The concentration on communication, trust, and betrayal in diplomacy makes it a different challenge from games like Chess and Go, which are more concerned with rules and resources. Cicero is a chatbot able to bargain with other Diplomacy players to make strategic movements in the game.
As explained by Meta in a blog post, diplomacy has long been considered an impossible grand challenge in artificial intelligence. Because it requires players to understand others, make complex plans and adjust strategies, use natural language to reach agreements with others, and more.
Cicero
Cicero uses a 2.7 billion parameter BART-like language model pre-trained on internet text and 40,000 online diplomatic games. In addition, the dialogue output of the AI agent is related to its strategic reasoning module, which generates “intents” indicating possible player moves.
Evaluation
According to the Meta researchers in a Science research article, “Cicero runs a strategic reasoning module. First, it guesses other players’ turns based on the board and dialogue. Then it chooses a policy for the current turn that responds optimally to the other players’ predicted policies.”
Whereas we can use reinforcement learning to teach AI agents for games like Chess through self-play, it needed a different method to represent the cooperative play of diplomacy. The traditional method, according to Meta, would entail supervised learning, in which an agent would be taught using labelled data from previous diplomacy games. However, supervised learning alone created a trusting AI agent that was simple to trick by dishonest players.
To improve an initial forecast of other players’ policies and planned movements based on communication between the bot and other players, Cicero contains an iterative planning algorithm called piKL. The algorithm considers several options that would result in better outcomes to enhance projected sets of movements for other players.
Conclusion
Between August 19 and October 13, 2022, Cicero played 40 “blitz” Diplomacy games on webDiplomacy.net. He was in the top 10% of multi-game players. And among the 19 who participated in at least five games, Cicero placed second. For all 40 games, Cicero’s average score was 25.8 per cent, which was more than double the average of its 82 opponents (12.4 per cent).
While Cicero still makes errors, the researchers at Meta expect their findings to apply to various applications, such as chatbots capable of having lengthy conversations and video game characters that comprehend player motives and can communicate more effectively.
Furthermore, the code of Cicero has been released under an open-source licence in the hopes that the AI developer community can further enhance it.