close
close

first Drop

Com TW NOw News 2024

Using photos or videos, these AI systems can create simulations that train robots to function in physical spaces
news

Using photos or videos, these AI systems can create simulations that train robots to function in physical spaces

Researchers working on large artificial intelligence models like ChatGPT have vast amounts of internet text, photos, and videos to train systems. But roboticists training physical machines face barriers: Robot data is expensive, and without large populations of robots roaming the world, there simply isn’t enough data to make them perform well in dynamic environments like people’s homes.

Some researchers have turned to simulations to train robots. But even that process, which often involves a graphic designer or engineer, is labor-intensive and expensive.

Two new studies from University of Washington researchers introduce AI systems that use video or photos to create simulations that can train robots to function in real-world environments. This could significantly reduce the cost of training robots to function in complex environments.

In the first study, a user quickly scans a space with a smartphone to capture its geometry. The system, called RialTo, can then create a “digital twin” simulation of the space, where the user can input how different things function (e.g., opening a drawer). A robot can then virtually repeat movements in the simulation with slight variations to learn to perform them effectively. In the second study, the team built a system called URDFormer, which pulls images of real-world environments from the internet and quickly creates physically realistic simulation environments for robots to train in.

The teams presented their studies – the first on July 16 and the second on July 19 – at the Robotics Science and Systems conference in Delft, Netherlands.

“We’re trying to enable systems that can go from the real world to simulation inexpensively,” said Abhishek Gupta, a UW assistant professor in the Paul G. Allen School of Computer Science & Engineering and co-senior author on both papers. “The systems can then train robots in those simulation scenes, so that the robot can function more effectively in a physical space. That’s good for safety — you can’t have poorly trained robots breaking things and hurting people — and it potentially increases access. If you can have a robot operate in your house by just scanning it with your phone, that democratizes the technology.”

While many robots are currently well suited for use in environments such as assembly lines, teaching them to interact with people and in less structured environments remains a challenge.

“In a factory, for example, there’s a lot of repetition,” said lead author of the URDFormer study Zoey Chen, a UW doctoral student in the Allen School. “The tasks may be difficult to perform, but once you program a robot, it can do the task over and over again. Whereas homes are unique and constantly changing. There’s a diversity of objects, tasks, floor plans, and people moving through them. That’s where AI becomes really useful for roboticists.”

The two systems address these challenges in different ways.

RialTo — which Gupta created with a team at the Massachusetts Institute of Technology — has a person walk through an environment and take videos of its geometry and moving parts. For example, in a kitchen, they’d open cabinets, the toaster oven and the refrigerator. The system then uses existing AI models — and a human doing some quick work via a graphical user interface to demonstrate how things move — to create a simulated version of the kitchen shown in the video. A virtual robot trains itself through trial and error in the simulated environment by repeatedly performing tasks like opening that toaster oven — a method called reinforcement learning.

By going through this process in the simulation, the robot improves at that task, working around disturbances or changes in the environment, such as a mug next to the toaster. The robot can then transfer that knowledge to the physical environment, where it is almost as accurate as a robot trained in a real kitchen.

The other system, URDFormer, is less focused on relatively high accuracy in a single kitchen; instead, it quickly and cheaply calls up hundreds of generic kitchen simulations. URDFormer scans images from the internet and matches them to existing models of how, say, those kitchen drawers and cabinets are likely to move. It then predicts a simulation of the first real-world image, allowing researchers to quickly and cheaply train robots in a wide range of environments. The downside is that these simulations are significantly less accurate than those generated by RialTo.

“The two approaches can complement each other,” Gupta said. “URDFormer is very useful for pre-training hundreds of scenarios. RialTo is particularly useful if you’ve already pre-trained a robot and now you want to deploy it to someone’s home and have it be maybe 95 percent successful.”

In the future, the RialTo team wants to deploy the system in people’s homes (it’s been largely tested in a lab). Gupta said he wants to integrate small amounts of real-world training data into the systems to improve success rates.

“Hopefully, a small amount of real-world data can sort out the errors,” Gupta said. “But we still have to figure out how best to combine data collected directly in the real world, which is expensive, with data collected in simulations, which is cheap but somewhat flawed.”

The other co-authors on the URDFormer paper are Aaron Walsman, Marius Memmel, Alex Fang of the UW — all doctoral students at the Allen School; Karthikeya Vemuri, an undergraduate at the Allen School; Alan Wu, a master’s student at the Allen School; and Kaichun Mo, a research scientist at NVIDIA. Dieter Fox, a professor at the Allen School, was a co-senior author. The other co-authors on the URDFormer paper are Marcel Torne, Anthony Simeonov, Tao Chen of MIT — all doctoral students; Zechu Li, a research assistant; and April Chan, an undergraduate. Pulkit Agrawal, an assistant professor at MIT, was a co-senior author. The URDFormer research was funded in part by Amazon Science Hub. The RialTo research was funded in part by the Sony Research Award, the U.S. government, and Hyundai Motor Company.