**Original Title:**

Original Title:

UC Berkeley’s transformer-based robot control system generalizes to unseen environments

Rephrased Title:
UC Berkeley’s Advanced Robot Control System Adapts to New and Unfamiliar Settings

Stay updated with our daily and weekly newsletters featuring the latest in industry-leading AI advancements.

Researchers at the University of California, Berkeley, have developed a flexible control system for humanoid robots, enabling them to skillfully traverse various terrains and obstacles. Inspired by the deep learning frameworks that transformed large language models (LLMs), this AI system operates on a straightforward idea: using recent observations to predict future actions and states.

Trained entirely in a simulated environment, this system has demonstrated its effectiveness in unpredictable real-world situations. By learning from its past interactions, the AI can adapt its behavior to handle new scenarios it never encountered during training.

Humanoid robots, designed to resemble humans, have the potential to become valuable assistants, helping with a range of physical and cognitive tasks. However, creating flexible control systems for these robots is challenging. Traditional robotic control systems lack flexibility, often being task-specific and struggling with the unpredictability of real-world conditions. This limitation restricts their usefulness to controlled environments.

To overcome this, there’s increasing interest in learning-based methods for robotic control. These systems can adapt their behavior based on data from simulations or direct interactions with the environment.

The new control system from U.C. Berkeley is designed to help humanoid robots navigate diverse situations effortlessly. Implemented on Digit, a general-purpose full-sized humanoid robot, it has shown impressive outdoor walking abilities, reliably moving through everyday human environments like walkways, sidewalks, tracks, and open fields. This system enables the robot to handle different terrains—including concrete, rubber, and grass—without falling.

The researchers noted that their controller could reliably walk across all tested terrains and felt confident deploying it without a safety gantry. During a week of full-day outdoor testing, the robot did not experience any falls.

Additionally, the robot’s ability to withstand disturbances has been rigorously tested. It can manage unexpected steps, obstacles in its path, and objects thrown at it. It maintains stability even when pushed or pulled.

The unique aspect of this system is how the AI model was trained and deployed. The control model was trained purely in simulation within Isaac Gym, a high-performance GPU-based physics simulation platform, across thousands of domains and billions of scenarios. This extensive simulated experience was then transferred to real-world contexts without further fine-tuning—a process known as sim-to-real transfer. In real-world operation, the system exhibited emergent abilities, managing complex scenarios such as navigating steps, which were not explicitly included in its training.

Central to this system is a “causal transformer,” a deep learning model that processes the history of proprioceptive observations and actions. This transformer excels at identifying relevant information, like gait patterns and contact states, within the robot’s data. Transformers, known for their success in large language models, have an inherent ability to predict subsequent elements in long data sequences. The causal transformer used here learns from sequences of observations and actions, predicting the consequences of these actions with high precision and adjusting behavior dynamically to achieve more favorable future outcomes. This capability allows it to adapt its actions based on the environment, even in previously unseen conditions.

The researchers propose that the history of observations and actions implicitly contains information about the world that a powerful transformer model can use to adapt its behavior dynamically during testing. This idea, called “in-context adaptation,” is similar to how language models use interaction context to learn new tasks on the fly and refine their outputs during inference.

Transformers have shown themselves to be superior learners over other sequential models like temporal convolutional networks (TCN) and long short-term memory networks (LSTM). Their architecture can scale with more data and computational power and can be enhanced by incorporating additional input modalities.

In the past year, transformers have become valuable to the robotics community, with several models using their versatility to improve robot functionality. Benefits of transformers include better encoding and integration of various data types and translating high-level natural language commands into specific planning steps for robots.

The researchers believe that transformers could play a crucial role in future advancements for real-world humanoid locomotion, similar to their impact on vision and language fields.

Stay informed! Receive the latest news directly in your inbox every day. By subscribing, you agree to our Terms of Service.