Google DeepMind has introduced SIMA 2, an advanced video-game-playing agent designed to navigate and solve challenges within various 3D virtual environments. This innovative agent represents a significant stride towards the development of more generalized AI agents and enhanced real-world robotics capabilities.
Building on the original SIMA, which stands for “scalable instructable multiworld agent,” SIMA 2 leverages the power of Gemini, DeepMind’s state-of-the-art large language model. This integration substantially elevates the agent’s functionalities, enabling it to perform more intricate tasks, autonomously devise solutions, and interact with users in a conversational manner. The agent learns through iterative experiences, honing its skills by attempting progressively difficult tasks.
According to Joe Marino, a research scientist at Google DeepMind, gaming has historically been a pivotal area for agent research, where even simple actions can entail complex sequences of tasks. The ultimate goal is to create agents capable of executing open-ended instructions in multifaceted environments, ultimately paving the way for their application in real-world robotics. SIMA 2’s training involved observing human gameplay across eight commercial titles, including Goat Simulator 3, as well as custom-developed virtual spaces. This hands-on experience allows the agent to correlate keyboard and mouse inputs with corresponding actions.
In experimental scenarios, SIMA 2 has shown promise, successfully navigating unfamiliar environments generated by Genie 3, DeepMind’s latest world model. The researchers noted that the agent could follow instructions and adapt by learning from its mistakes, utilizing feedback from Gemini to refine its performance. However, challenges remain; the agent struggles with multi-step tasks and retains only short-term memory to enhance responsiveness.
While some experts express optimism about the potential for SIMA 2 to contribute to future robotics, others maintain a more cautious stance. Matthew Guzdial, an AI researcher, highlights the similarities in game controls as a factor in SIMA 2’s performance, questioning its adaptability to more complex tasks in the real world. Despite differing opinions, the team at Google DeepMind is committed to further enhancing SIMA 2’s capabilities, envisioning a continuous training platform where the agent can evolve through guided trial and error. “We’ve only begun to explore the possibilities,” Marino concluded.
Source: Google DeepMind is using Gemini to train agents inside Goat Simulator 3 via MIT Technology Review
