Video: Google’s humanoid robot folds origami, zips bags like humans – Interesting Engineering

DeepMind’s AI models enhance robot perception and interaction, enabling precise movements. a day agoa day ago2 days ago2 days ago2 days ago2 days ago2 days ago3 days ago3 days ago3 days ago4 hours ago4 hours ago5 hours ago5 hours ago5 hours ago6 hours ago7 hours ago7 hours ago7 hours ago8 hours agoJijo MalayilThe models enable a variety of robots to perform a wider range of real-world tasks.Google DeepMind/YouTubeGoogle DeepMind has unveiled two AI models, Gemini Robotics and Gemini Robotics-ER, designed to enhance robot control.Gemini Robotics features “vision-language-action” (VLA) capabilities, enabling it to interpret visuals, comprehend language commands, and execute movements.In contrast, Gemini Robotics-ER emphasizes “embodied reasoning,” offering advanced spatial awareness and allowing roboticists to integrate it with existing robot control systems for improved functionality.These models aim to improve how robots of various forms perceive and interact with their surroundings, enabling more precise and delicate movements and potentially advancing humanoid assistants and other robotic applications.To create truly useful AI-driven robots, three key qualities are essential: generality, interactivity, and dexterity.Gemini Robotics, according to DeepMind, greatly improves all three. It makes use of Gemini’s worldly expertise to adjust to new circumstances, managing strange jobs, things, and surroundings with ease. Tests reveal that, in comparison to earlier vision-language-action models, it more than doubles performance on generalization measures.Gemini 2.0’s sophisticated linguistic capabilities, which enable it to comprehend realistic, conversational commands in a variety of languages, recognize changes in its surroundings, and modify its behavior accordingly, are the source of its interactivity. This flexibility improves human-robot cooperation in a variety of contexts.Dexterity is another breakthrough; according to Google, Gemini Robotics can perform intricate, multi-step activities that call for fine motor skills, such as folding origami or sealing a Ziploc bag.Its adaptability allows it to function on a variety of robotic platforms, including humanoid robots like Apptronik’s Apollo and bi-arm ALOHA 2 and Franka-based systems, hence increasing its practical uses.Alongside Gemini Robotics, DeepMind introduces Gemini Robotics-ER, an advanced vision-language model designed for “embodied reasoning.” The model enhances Gemini’s spatial understanding, crucial for robotics, and allows roboticists to integrate it with existing low-level controllers. According to Google, by improving key abilities like pointing and 3D detection, Gemini Robotics-ER significantly advances Gemini 2.0’s capabilities.A major strength of Gemini Robotics-ER is its ability to combine spatial reasoning with coding, enabling robots to develop new capabilities instantly. For instance, when shown a coffee mug, the model can determine the best two-finger grip for the handle and plan a safe approach to grasp it. This allows robots to interact with objects more naturally and efficiently.Designed for full-spectrum robot control, Gemini Robotics-ER handles perception, state estimation, spatial reasoning, planning, and code generation. In end-to-end testing, it achieves success rates two to three times higher than Gemini 2.0. When code creation isn’t enough, the model can use in-context learning to improve its methodology using a small number of human demos.By enhancing robotic precision and adaptability, Gemini Robotics-ER expands the potential for AI-powered automation and improves interoperability with a variety of robotic systems, claims DeepMind. Google is also taking a layered approach to AI and robotics safety, addressing concerns from low-level motor control to high-level decision-making. Roboticists rely on safety measures like collision avoidance and force limitation, and Gemini Robotics-ER integrates with these low-level controllers to enhance safety. It also builds on Gemini’s capabilities to assess whether an action is safe in context.To advance safety research, DeepMind is releasing a dataset to evaluate semantic safety in embodied AI. Inspired by Asimov’s Three Laws, they have developed a framework for data-driven constitutions—natural language rules guiding robot behavior—allowing users to create and modify safety protocols.DeepMind claims it’s collaborating with the Responsibility and Safety Council and external experts to assess societal impacts. According to the firm, partners like Apptronik, Boston Dynamics, and Agility Robotics are testing Gemini Robotics-ER. These efforts aim to develop AI-driven robots that are safer, more adaptable, and aligned with human values.Jijo Malayil Jijo is an automotive and business journalist based in India. Armed with a BA in History (Honors) from St. Stephen’s College, Delhi University, and a PG diploma in Journalism from the Indian Institute of Mass Communication, Delhi, he has worked for news agencies, national newspapers, and automotive magazines. In his spare time, he likes to go off-roading, engage in political discourse, travel, and teach languages.Stay up-to-date on engineering, tech, space, and science news with The Blueprint.By clicking sign up, you confirm that you accept this site’s Terms of Use and Privacy Policya day agoa day agoa day agoa day agoLoading opportunities…PremiumIE PROFollow
Source: https://interestingengineering.com/innovation/google-deepmind-humanoid-robot-masters-moves