Google's new AI model RT-2 translates vision and language into robotic actions

Devdiscourse News Desk | California | Updated: 28-07-2023 16:32 IST | Created: 28-07-2023 16:32 IST

Country:
United States

In a groundbreaking development, Google has introduced Robotics Transformer 2 (RT-2) - a first-of-its-kind vision-language-action (VLA) model for robots to more easily understand and perform actions, in both familiar and new situations.

Trained on both web and robotics data, RT-2 translates this knowledge into generalised instructions for robotic control, while retaining web-scale capabilities. Simply put, the new AI model translates vision and language into robotic actions.

Notably, RT-2 shows that with a small amount of robot training data, the system is able to transfer concepts embedded in its language and vision training data to direct robot actions - even for tasks it’s never been trained to do. For example, previous systems required explicit training to recognize and dispose of trash. In contrast, RT-2, having been trained on a vast corpus of web data, already possesses the concept of what trash is, enabling it to identify and handle it effortlessly.

The capabilities and semantic and visual understanding of RT-2 are evident in over 6,000 robotic trials. On tasks within its training data, also known as "seen" tasks, RT-2 functioned as well as its predecessor, RT-1. However, the most striking finding was that it nearly doubled its performance on novel, previously unseen scenarios, achieving a commendable 62%, compared to the previous model's 32%.

"Not only does RT-2 show how advances in AI are cascading rapidly into robotics, it shows enormous promise for more general-purpose robots. While there is still a tremendous amount of work to be done to enable helpful robots in human-centred environments, RT-2 shows us an exciting future for robotics just within grasp," Google said.

RT-2 signifies a significant step toward the development of a general-purpose robot that can operate effectively in a real-world scenario. By incorporating vision, language, and action comprehension in a single model, RT-2 opens up exciting possibilities for robots to reason, problem-solve, and interpret information, paving the way for their application in a diverse range of tasks and scenarios.