share_log

斯坦福炒虾机器人爆火全网!华人团队成本22万元,能做满汉全席还会洗碗

Stanford's fried shrimp robot explodes all over the network! The Chinese team costs 220,000 yuan. They can cook all the seats and wash the dishes

新智元 ·  Jan 5 11:31

Today, everyone was screened by this cooking robot from the Stanford Chinese team. With just 50 demos, the robot can complete all kinds of complex tasks. Most importantly, the construction cost is 220,000, and all projects are open source.

Today, the Stanford robot Mobile ALOHA, which will cook a large table of dishes, will scan the entire network.

Shrimps with scrambled eggs, grilled chicken with scallops, and lettuce with oyster sauce come in all colors and flavors, and are very attractive to look at.

Take the dish of scrambled eggs and shrimp, for example. Mobile ALOHA first beat 3 eggs during the boiling process, then blanch the shrimp in water, pour in the egg liquid in a frying pan, then add the shrimp, stir a few times, and the dish is ready.

Let's also take a look at the process of making dried scallop roast chicken.

First, fry the boneless chicken thigh until golden on both sides, then add seasoning such as dried scallops, then simmer for 20 minutes.

At the end of the plate, sprinkle with a small pinch of green onions, perfect.

As for lettuce in oyster sauce, the robot “Chef” is also very skilled.

You can even chop the garlic.

Netizens said after watching it, that we are simply living in the future! It won't be long before the job of making hamburgers at fast food restaurants will be completely replaced by robots!

Even Pytorch's father praised it, it's a cool new robotics platform, and it's great to see more work in this direction!

This new mobile robot, Mobile ALOHA, developed by a team of three people at Stanford, can perform various complex tasks through imitation and learning.

It can not only operate autonomously, but also supports full-body remote control.

It's worth mentioning that the robot costs only $32,000 (about 220,000), and even the software and hardware are all open source.

Using only 50 demonstrations for each task, the researchers made the Mobile ALOHA robot consistently do one thing, such as wiping wine spilled on the table 9 times and riding the elevator 5 times in a row.

It can also put the pot in the cupboard if disturbed. Even chairs that are not visible in the training data can be adjusted.

How can 50 demos make robots so capable of learning?

The author explains that the key point is to use static ALOHA data to jointly train imitation learning algorithms. This provides a continuous increase in performance, especially in tasks that require precise operation.

Let's watch another cool mobile ALOHA demo!

Clean the pan:

Give people high fives:

Fried shrimps:

In addition, Mobile ALOHA can also be remotely controlled to complete some more detailed tasks.

For example, take out the paper and wipe the glass.

As well as sweeping the floor with a broom, etc.

Strong start to the first year of robotics

Before 2024 arrived, many bosses were predicting robots. This year, in addition to big models, another important field of research.

Yes, 2024 will be the first year of robots.

Generally speaking, a very promising approach to developing general-purpose robots is to learn by imitating from demonstrations provided by humans.

This kind of “behavioral cloning” allows robots to learn a variety of primitive skills, from simple pick-and-place operations to more detailed operations.

However, in real life, many tasks require coordinated mobility and dexterity rather than individual movement or operation.

In this paper, the authors examined the feasibility of extending “imitation learning” to tasks requiring full-body control of a two-armed mobile robot.

Currently, there are two main factors that hinder the widespread application of “imitation learning” in the mobile operation of two-armed robots.

First, there is a lack of plug-and-play “full-body remote control hardware”.

If you buy a ready-made one, the cost of a two-arm mobile operator will be very high. Robots such as the PR2 and Tiago cost more than 200,000 US dollars. In order to enable remote control on these platforms, additional hardware and calibration are also necessary

Second, in previous robot learning studies, high-performance two-arm movement operations for complex tasks have not been proven.

In this paper, researchers seek to address the challenge of applying “imitation learning” to two-arm movement operations.

On the hardware side, the author introduced the robot Mobile ALOHA, a low-cost full-body remote operating system for collecting data on moving both arms.

By installing it on a wheeled base, Mobile ALOHA expands the capabilities of the original ALOHA, a low cost and dexterous two-person puppet maneuvering setup.

The user then attaches the body to the system and drives the wheels in reverse to allow the base to move.

When the user controls ALOHA with both hands, the base can move independently. The researchers simultaneously recorded base speed data and arm manipulation data to form a full-body remote control system.

The cost is only 30,000 dollars

It's worth mentioning that the Stanford team spent only $30,000 on all the costs of building Mobile ALOHA.

In specific robot designs, they comprehensively consider four key factors:

- Movement: The system moves at a speed comparable to that of a human being, approximately 1.42 meters per second.

- Stability: Able to maintain stability when handling heavy household items such as pots and cabinets.

- Full body remote control: All degrees of freedom can be operated remotely at the same time, including the arm and mobile base.

- Untethered: Onboard power and computing

As shown in the image below, the technical specifications of Mobile ALOHA can be clearly seen.

Mobile ALOHA has 2 wrist cameras and 1 top camera, and is equipped with onboard power and computing.

Also, the remote control device can be removed, and the Mobile Aloha only uses 2 ViperX 300 when running autonomously. The minimum/maximum height of the two arms is 65 cm/200 cm, respectively, and they extend 100 cm from the base.

The researchers chose the AgilEx Tracer AGV (Tracer) as the mobile dock, which is specially designed for warehouse logistics.

It can move at a speed of 1.6 m/s, which is close to the average walking speed of a human being. It has a maximum payload of 100 kg and a height of 17 mm.

It's worth mentioning that the Tracer sells for $7,000 in the US, which is more than 5 times cheaper than Clearpath's AGV with the same speed and payload.

The researchers then tried to design a full-body remote control system based on the Tracer mobile dock and the Aloha robotic arm, that is, a remote control system that can simultaneously control the base and two robotic arms.

The design that ties the operator's waist to the mobile base is the easiest and most direct solution. This allows the wheel to be driven in reverse, and the wheel has very little friction when the torque is turned off.

To improve ergonomics and expand the work space, the team also installed 4 ALOHA arms, all facing forward, unlike the ALOHA arms originally facing inward.

Furthermore, in order to keep Mobile ALOHA unfettered, the author equipped the bottom with a 1.26 kilowatt-hour battery that weighs 14 kg. At the same time, it can also play a balancing role to avoid tipping over.

All calculations in the data collection and inference process were performed on a consumer-grade laptop equipped with an Nvidia 3070ti GPU (8gb VRAM) and an Intel i7-12800H.

These are the key components of Mobile ALOHA design.

Some development details

List of material prices

Interested friends can check out their official documentation: https://docs.google.com/document/d/1_3yhWjodSNNYlpxkRCPIlvIAaQ76Nqk2wsqhnEVM6Dc/edit

Collaborative learning to improve “imitation learning” performance

Now that we have the hardware, the next step is to use the data for collaborative training.

In the paper, the researchers used a collaborative training pipeline to use an existing static ALOHA data set to improve the performance of simulating learning in mobile operations, especially two-arm operations.

The static ALOHA data set has a total of 825 demonstration tasks, including sealing a bag, picking up a fork, packing candy, tearing a tissue, opening a plastic cup with a lid, playing table tennis, using a coffee machine, flipping a pencil, securing a velcro cable, installing a battery, and operating a screwdriver.

The researchers then selected 7 tasks for Mobile ALOHA to complete.

Robots require mobility and hand dexterity for tasks that require cleaning up red wine spilled on a table.

Specifically, the robot needs to first navigate to the faucet, pick up the towel, and then navigate back to the table.

Then lift the wine glass with one arm, and wipe the table and bottom of the glass with a towel with the other arm. This task is impossible to complete in static ALOHA, and a single-arm mobile robot takes more time to complete.

To fry shrimp, the robot needs to fry a raw shrimp on both sides and then put it in a bowl.

Mobility and hand flexibility are also necessary for this task: the robot needs to move from the stove to the kitchen table, turn the shrimp with a shovel, and tilt the pan with the other arm.

This task requires more alcohol than rubbing because turning half-cooked shrimp requires more precision.

Similarly, Mobile ALOHA is also proficient in tasks such as cleaning pans, storing pots, riding elevators, pushing chairs, and high-fives.

The figure below shows the robot's navigation and movement trajectory during the execution of the task.

50 demos, 80% + success rate

In the experimental evaluation, the researchers mainly wanted to answer two core questions:

(1) Can Mobile ALOHA master complex mobile operation skills through collaborative training and a small amount of mobile operation data?

(2) Can Mobile ALOHA use different types of imitation learning methods, including ACT, diffusion strategies, and search-based VINN?

Research has found that collaborative training can improve ACT performance. Collaborative training with static ALOHA data sets continuously improves the success rate of ACT in 7 challenging mobile operation tasks.

This is particularly important for sub-tasks such as pressing a button when riding an elevator and turning on a faucet when cleaning a pot, because in these tasks, accurate operation is a bottleneck.

Additionally, Mobile ALOHA is compatible with “imitation learning” methods.

VINN, diffusion strategies, and ACT with blocks all achieved good performance on mobile ALOHA, and benefited from collaborative training with static ALOHA.

Cooperative training targets different data combinations, and its performance is also very stable. The following is the success rate after training for the task of wiping alcohol using ACT.

The comparison of the effects of cooperative training and pre-training is as follows. The performance of cooperative training in the task of wiping alcohol has a 95% success rate, which is far superior to the 40% success rate of pre-training.

Also, when users use Mobile ALOHA to remotely control unseen tasks, they can quickly approach expert speed.

All in all, with a budget of only 32,000 US dollars, Mobile ALOHA only needs 20-50 demonstrations to learn various complex tasks through simulated learning with static ALOHA data collaborative training.

Stanford Mobile ALOHA showed everyone the potential of robots in various application scenarios, and even made robots open source so that everyone can replicate them.

Netizens said that robotics is a system study that requires both hardware and algorithms. I'm guessing in 2024 we'll see more and more robots in the real world.

Author introduction

Zipeng Fu (project co-leader)

Zipeng Fu is a computer science doctoral student at Stanford University's AI Lab. Her mentor is Chelsea Finn. He is also working as a student researcher at Google DeepMind and collaborated with Jie Tan.

Previously, he studied for a master's degree in machine learning at Carnegie Mellon University (CMU) and worked as a student researcher at the Robotics Institute (Robotics Institute). His mentors were Deepak Pathak and Jitendra Malik.

He received his bachelor's degree in computer science and applied mathematics from the University of California, Los Angeles (UCLA), and was supervised by Song-Chun Zhu.

His research interests focus on the intersection of robotics, machine learning, and computer vision. We are also committed to studying robot systems that achieve stable performance and can be practically deployed in a complex and changing open world.

His research is supported by a Stanford Graduate Scholarship and is also a Pierre and Christine Lamond scholarship recipient.

Tony Z. Zhao (project co-leader)

Tony Z. Zhao is a computer science doctorate student at Stanford University, supervised by Chelsea Finn. He also works as a part-time research assistant at Google DeepMind.

Prior to that, he received his bachelor's degree in Electronics and Computer Science (EECS) from the University of California, Berkeley (UCB) in 2021, with mentors Sergey Levine and Dan Klein. He also interned at Tesla Autopilot and Google X Intrinsic.

His goal is to enable robots to complete complex and detailed control tasks.

Chelsea Finn

Chelsea Finn is an assistant professor of computer science and electrical engineering at Stanford University. My research interest is the intelligent behavior that robots and other intelligent bodies can display through learning and interaction.

Her lab, IRIS, is dedicated to studying intelligence through large-scale robot interaction, and is a collaborative laboratory between SAIL and ML Group. At the same time, she is also working as a researcher on the Google Brain team.

Previously, she earned her doctorate in computer science from the University of California, Berkeley (UCB) and a bachelor's degree in electrical engineering and computer science from the Massachusetts Institute of Technology (MIT).

edit/new

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment