share_log

特斯拉大裁员后的豪赌!FSD v12价值几何?

A big gamble after Tesla's layoffs! What is the value of FSD v12?

晚點LatePost ·  May 8 15:50

Source: Late LatePost

The layoffs of more than 10,000 people, drastic reduction of the 4680 battery team, the senior vice president responsible for the three-power system and the departure of other executives... The major adjustments initiated by Tesla CEO Elon Musk on April 15 are just a prelude.

Half a month later, Tesla continued to abolish key projects: the 4680 battery project continued to be laid off, the North American supercharging pile team was completely disbanded, the 9,000-ton integrated die-casting machine project was halted, and a large number of relevant executives left their jobs. Next June, Tesla will lay off more than 6,000 employees in California and Texas.

Musk's new bet is fully automated driving. The Robotaxi (Robotaxi) project was given the highest priority. Musk announced that it will release the product on August 8, and this year it will invest 10 billion US dollars to purchase GPUs and develop automotive chips to improve autonomous driving systems. He has said many times that as long as the system continues to be iterated, driverless driving will be achieved, making Tesla a $10 trillion company.

In China, Tesla's second-largest market, Musk also expects to use this system to turn the market around. At the end of April, Musk visited China and was received by government leaders. Soon after, he said in an internal letter that Tesla had obtained permission to test some assisted driving systems in China.

The FSD v12 autonomous driving system, which began rolling out on a large scale this year, does show some extraordinary potential. The feedback from car owners is very close: “Just like a human driving”. Compared with the previous generation, it is more comfortable to drive and overtake cars on narrow roads.

Tesla FSD v12 gracefully handles complex road conditions. Photo by X @Rebellionair3
Tesla FSD v12 gracefully handles complex road conditions. Photo by X @Rebellionair3

After experiencing FSD v12 in the US in March of this year, Zhou Guang, CEO of autonomous driving company Yuanrong Qixing, admitted that he still underestimated its ability: “Before I went, I thought it might have been 80 points, but I actually achieved 90 points.”

After experiencing it, the head of a domestic first-tier new energy company believes that Tesla's autonomous driving will have a revolutionary breakthrough. Competitors didn't dare to miss it. Just around the end of April, companies such as Xiaopeng, Huawei, Great Wall, and Shangtang Zhuiying announced that they would launch an autonomous driving system similar to FSD v12. In the same period, SoftBank, Nvidia, and Microsoft spent $1.08 billion to invest in Wayve, a British autonomous driving company on the same route as Tesla.

Following Tesla's route, a new autonomous driving race is on. This time it wasn't just about solving technical problems; it was also a resource race. On the day he came to China, Musk set the entry threshold on social media: “Any company, if it doesn't invest 10 billion dollars in computing power... I can't participate in this round of competition.”

Principle: Cut 300,000 lines of code and let data determine how to drive a car

In the 2000s, DARPA hosted 3 unmanned vehicle challenges in the desert, which were the origin of the development of modern driverless technology. Google brought in the winners and came up with a workable solution to split autonomous driving into multiple components:

Sensors such as lidar and cameras are used to collect data on the vehicle's surrounding environment, and handed over to models trained using manual labeling data to identify common important targets and various obstacles (sensing modules), then use high-precision maps to let the system understand how the road will change. Finally, it relies on rules written by engineers in code to determine how to drive the car (prediction and planning module).

Initially, Tesla also followed the path pioneered by Google for autonomous driving. In order to save costs and rapidly expand the scope of use, they developed solutions that relied on cameras rather than expensive lidars and high-resolution maps. Before launching v12, Tesla's autonomous driving system workflow was probably:

  • The visual module responsible for sensing works first, processes road condition data captured by sensors such as cameras, and identifies what is on the road, how it is distributed, what is moving, what is not moving, which are lane lines, what areas can be driven, etc.

  • Then, the predictive planning control module uses information processed by the sensing model to predict how dynamic targets such as people and cars will act in the next few seconds, combine the model and the rules written in advance by the engineer to plan a safe driving route, then control the steering wheel, accelerator, or brake pedal, and drive along the route.

Tesla unveiled the FSD system architecture at AI Day 2021. Picture from Tesla
Tesla unveiled the FSD system architecture at AI Day 2021. Picture from Tesla

To respond as much as possible to the various situations encountered along the way, Tesla's hundreds of engineers wrote 300,000 lines of C++ code to develop rules — equivalent to 1.7 times the amount of code in the early Linux operating system.

This is not a way for people to learn how to drive. People don't need to recognize what a large number of objects may appear on a road, and they don't need to set various rules in advance for every complicated scene to get on the road.

It is difficult to guarantee absolute safety for an autonomous driving system created in this way. The real world is ever-changing, and no matter how many engineers there are, it's hard to run out. Currently, commercialized unmanned taxis can only operate in a limited area. There are no safety personnel in the car; it's just the operator who moves them to the cloud and watches them remotely.

Until 2021, Google's driverless subsidiary Waymo's driverless cars are likely to stop and refuse to drive until 2021 when they encounter a row of traffic cones on the road. At this point, Google has invested hundreds of billions of dollars with the entire industry. In those two years, a group of companies shut down driverless projects that had already cost billions of dollars.

“If you put in 20% of the effort, you can get 80% of your abilities.” Liu Langechuan, head of autonomous driving AI at Xiaopengyuan, said at an academic event last year that traditional autonomous driving solutions are easy to develop, but it is difficult to continue improving them. Now he's on the Nvidia smart car team.

Tesla FSD v12 learns to drive more like a human. The biggest change is the use of an “end-to-end” architecture: one end inputs data obtained by sensors such as cameras, and the other end directly outputs how to drive the car.

When training this system, the machine learned how to drive from a large number of vehicle driving videos and data from human drivers hitting the steering wheel and stepping on the accelerator pedal in different environments.

In FSD v12, almost all of the rules written by Tesla engineers were eliminated, leaving only more than 2000 lines of 300,000 rule codes, less than 1% of the original.

The way an end-to-end autonomous driving system learns to drive is just a bit like a human; no system can really understand the world like a human being. Therefore, after learning for a few days, people can drive safely on the road, but FSD has to watch a lot of video learning. Musk talked about how important the data is at an earnings conference last year: “Training with 1 million video cases is barely enough; 2 million is slightly better; 3 million makes you feel Wow; when it reaches 10 million, it becomes incredible.”

“Traditional autonomous driving systems are like a funnel, and layers of information are lost.” An autonomous driving algorithm engineer said that in the sensing phase of traditional solutions, engineers usually set up a “white list” to focus on identifying important targets such as pedestrians, vehicles, lane lines, and traffic lights to save computing power. When it comes to the predictive planning process, the engineer will make settings in advance and call the information output from the sensing module to complete the work according to requirements, and the information will be damaged once again. Therefore, traditional solutions are difficult to use sufficient information to decide how to drive like humans; they rely on rules written in advance by engineers.

In the end-to-end solution, all information obtained by sensors such as cameras is transmitted to the decision-making process. “Information is transmitted without loss. The model can capture more information from perception data to complete decisions, improving the system's ability to cope with various complex scenarios.” The autonomous driving algorithm engineer mentioned above said that because it is an end-to-end architecture, the model's decisions will also directly affect the perception process, allowing it to later capture more data that people are unaware of but useful for driving.

In many scenarios, Tesla FSD v12 has improved significantly. An autonomous driving practitioner (Zhihu @EatElephant) told us that after his experience, he felt that compared to v11, the v12 controls the speed and steering of the vehicle were “very smooth”, and “even when sitting in the back row, I hardly felt any setbacks when turning at the intersection.” In order to ensure safety, traditional autonomous driving solutions use brakes from time to time when driving.

In an article, he wrote that in the face of a person riding a bicycle in front on the right, “v11 will be overly careful and plan a very outrageously large detour route. The v12 is calm and calm, the detour range is close to the choice of a human driver, and the speed control and determination are also very reasonable.”

FSD v12 has made significant progress in dealing with scenarios that are difficult to describe with rules. He gave an example, when he met an Amazon delivery truck driving double flashing on the side of the road, he was able to quickly determine that there were no cars in the opposite direction and immediately took a detour. Most of the time, traditional solutions stop, or wait a while before considering action.

After the FSD v12.3 update was released, a group of car owners uploaded videos on YouTube where the vehicle calmly coped with various complicated road conditions, such as crossing the crowded Fifth Avenue in New York at night without touching the steering wheel for 30 minutes.

Faced with excited car owners, the US Highway Traffic Safety Administration sent a letter on May 6 requesting Tesla to explain in detail how to prevent car owners from abusing driver assistance systems, such as how to remind drivers to “put their hands on the steering wheel.”

Foundation: In the most difficult years, I still insisted on pre-installing hardware, developing chips, and collecting data

At the beginning of 2018, when Tesla was mired in a production capacity crisis and faced a test of life and death, Musk sent an email to OpenAI management hoping OpenAI would be merged into Tesla to jointly develop a “fully autonomous driving solution based on large-scale neural network training.”

He believes that AI research and development requires huge capital, and OpenAI needs to establish a profit model to compete with giants. Meanwhile, Tesla has already used Model 3 and its supply chain to build the “first stage” of the rocket. If OpenAI can be integrated into Tesla, it will accelerate driverless research and development to create the “second stage” of the rocket. Tesla will sell more cars as a result, and OpenAI will also have enough revenue to carry out artificial intelligence research.

Musk's proposal was rejected and he finally left the OpenAI board. But before that, he had already taken Andrej Karpathy (Andrej Karpathy) from OpenAI to be responsible for autonomous driving technology research and development, leading a team to train more effective models.

Many autonomous driving practitioners believe that Capati's addition to Tesla is the beginning of its development of a v12 version of an end-to-end autonomous driving model.

Born in 1986, Capati is a person who experienced the wave of artificial intelligence over the past 10 years, and is also an artificial intelligence scientist who grew up from it. While studying for his PhD at Stanford University in 2011, he worked with his mentor Li Feifei to complete the ImageNet competition data set that gave birth to AlexNet, published several computer vision papers at various academic conferences, and opened the first deep learning course at Stanford University. He was one of the first people to join OpenAI after graduating from his PhD.

In November 2017, Capati published the famous “Software 2.0” article, saying “Software is eating up the world, and software 2.0 based on artificial intelligence is eating up software.” At that time, computer vision models trained with a large amount of data were more accurate than the human eye in identifying objects. AlphaGo learned from data how to beat human Go champions.

He believes that with massive amounts of data, artificial intelligence in most valuable verticals “is better than any code you can think of, at least in the fields involving image/video and sound/voice.”

Before Capati arrived, Tesla had already completed the data infrastructure for autonomous driving.

Using large amounts of data to train stronger models is an ideal technology development route for Tesla. However, Tesla has invested a lot of resources to develop autonomous driving technology, and Musk has never lacked the determination to take risks.

Beginning in 2016, every Tesla car shipped was equipped with hardware that can run the Autopilot Driver Assistance System, and it took money to buy software to activate the function. Until now, few car brands have done this. The more common practice is to divide the same car into different versions and sell models equipped with autonomous driving hardware to interested customers.

When assisted driving is standard, Tesla enables “Shadow Mode” (Shadow Mode). Even if the driver does not purchase Autopilot functions, the system will run in the background to record driving data and plan driving routes. Musk said in an interview at the time that its role is to prove that the system is more reliable than humans and provide data support for regulators to approve technology.

After Capati was added, the shadow mode became the core source for Tesla to obtain training model data — when there is a clear deviation between the route selected by the system and the driver's choice, the data return mechanism is triggered. The system automatically records data captured by the camera, vehicle driving data, etc., and is uploaded to Tesla's server after connecting to WiFi. By the end of 2018, Tesla relied on this system to collect 1.6 billion kilometers of data, surpassing the vast majority of car companies currently developing autonomous driving technology.

Tesla's autonomous driving team focused most of their energy on data and set up a data processing system to specifically analyze and screen the collected data. At first, people were used, and then most of the data was labeled with machines, then fed to models to continuously improve the autonomous driving system. In order to train models using large amounts of data, Tesla purchased a large number of GPUs to build a computing power center called Dojo before 2019, and continued to expand. Up to now, it has accumulated computing power equivalent to 35,000 H100 sheets.

In April 2019, Tesla released HW 3.0 hardware, equipped with two first-generation FSD chips, with 144 TOPS computing power, nearly 7 times that of Nvidia's automotive chip Xavier at the time. As in the past, regardless of whether users purchased an assisted driving assistance system or not, Tesla installed this hardware in every Tesla car, and helped old users who bought the assisted driving system upgrade for free.

“Not only does it allow us to run our current neural networks (models) faster, but more importantly, it allows us to deploy larger, more computationally expensive models in vehicles.” Capati said. HW 3.0 is also the foundation on which Tesla can now launch the FSD v12 system on a large scale.

When Tesla built this infrastructure, it was also the most funded period since it began mass production of vehicles. From 2017 to early 2019, Tesla was in deep trouble with Model 3 production capacity.

By March 2019, Tesla had only $2.2 billion in cash reserves, enough to burn for less than half a year. “The Biography of Musk” records that Musk said to his wife at the time, “We have to raise money or it will be over.”

After thinking about it for a few nights, Musk decided to host an event for investors, called Tesla's “Autonomous Driving Day.” He told Wall Street investors that driverless cars can help Tesla achieve huge profits in the future, and that over the next year, 1 million driverless taxis will be deployed to reshape people's daily lives.

No one believes Tesla's driverless cars will arrive anytime soon. More than a month after this campaign ended, Tesla's stock price fell 30%. Thanks to the smooth expansion of Model 3 production capacity and the rapid completion of the Shanghai factory, Tesla was able to slow down. However, the next 5 years will be the stage where Tesla's basic technology for autonomous driving will advance the fastest.

Implementation: Start by simulating the human eye and expand step by step to the entire system

Watching a video to learn how to drive sounds simple, but there are countless problems to be solved in the middle.

From 2020 to 2022, Tesla will release a version of the “perception” model every year, and each version goes one step further to simulate the “human eye.”

In February 2020, at an academic conference, Capati presented Tesla's “multi-task model” HydraNet, which trains 48 neural networks, which can identify more than 1000 targets, such as cars, bicycles, lane lines, school areas, etc.

Using the ResNet model released by Microsoft Research Asia in 2015 as the backbone, HydraNet extracts common features of images captured by 8 cameras around the car body and hands it to different branches of algorithms to complete different tasks. This avoids using different models to repeatedly extract features from the same screen, and saves computing power.

This was the choice of academia and most companies developing large-scale computer recognition systems at the time. Tesla made it larger and engineered it. But there are limitations to doing so. HydraNet can only individually extract information from images captured by cameras at different angles, and the camera may only capture part of the surrounding objects. Just as it is difficult for novice drivers to rely on rearview mirrors to smoothly reverse cars and get into storage, it is also difficult for autonomous driving systems to achieve true driverless driving, and they also have to rely on various radars and high-precision maps to assist.

The Capati team, which did not use lidar, chose to use a series of algorithms to assemble images collected by 8 cameras in different directions into a 360° bird's eye view (BEV) mode, and then let the model “understand the world” and plan driving routes. However, if you want this system to work well, you must ensure that the ground is as flat as possible, and that the surrounding environment of the car is simple; otherwise, it will be difficult for the system to accurately understand the correlation between the images seen by different cameras.

“When we used it to achieve FSD, we soon found that it didn't work as expected.” Andrea Capatti said at Tesla AI Day 2021 that he introduced a new model developed using the Transformer architecture, which can accurately spell targets across multiple cameras more accurately and stably.

The top three perspectives are images taken by Tesla's on-board camera. The bottom left is the BEV road map spelled out by traditional methods, and the bottom right is the Transformer method
The top three perspectives are images taken by Tesla's on-board camera. The bottom left is the BEV road map spelled out by traditional methods, and the bottom right is the Transformer method

Furthermore, using the model made of the Transformer architecture, the output information can be directly used in subsequent predictive planning modules, which also lays the foundation for FSD v12 to make an end-to-end model.

Cooperating with the new model, Capati also shared an architecture called “Spatial RNN”. When training the model with video, the model can obtain a brief “memory” ability to understand how the surrounding scene changes over time, so it has the ability to correct blind spots in the camera's field of view and construct local maps in real time.

This technological iteration allows Tesla's assisted driving system to drive a car well without a high-precision map, once again pushing the upper limit of autonomous driving capabilities and getting closer to the human eye.

By the time Tesla AI Day 2022 was held, Capati had already left Tesla. Tesla's autonomous driving system continues to iterate. The successor, Ashok Elluswamy (Ashok Elluswamy), introduced the “Occupancy Network” (Occupancy Network), which introduced “height” elements based on the Transformer architecture, which can restore images captured by cameras from different angles into 3D scenes, calculate how many points an object occupies in space, and thus infer its shape.

With the Occupancy Network, Tesla's autonomous driving system can identify obstacles it hasn't seen without lidar and only relies on cameras to gather information, which is seen as a victory for the “pure vision solution.”

After years of research and development, Tesla has finally realized the first requirement put forward by Musk many years ago: humans can recognize and restore a 3D environment with both eyes, and a car camera should also be fine.

Tesla Occupancy Network identifies obstacles around vehicles. Image via Tesla's 2022 AI Day.
Tesla Occupancy Network identifies obstacles around vehicles. Image via Tesla's 2022 AI Day.

In the process, Tesla is also gradually trying to let neural networks decide how to drive a car. On AI Day 2021, Tesla presented a “neural network planning model” trained with large amounts of data. At the time, it was only used as an aid to provide a reference for the final planning decision module. By v12, the neural network officially took over the predictive planning module and completed the end-to-end puzzle.

Question: Can autonomous driving have scaling laws now

FSD v12 is far from true driverless driving. Like ChatGPT, it has glittering moments, but mistakes are common. After the launch of the acclaimed v12.3 version, the vehicle had low-level errors such as hitting road teeth and damaging the wheels. However, in the previous generation of solutions, a similar situation rarely occurred.

Tesla didn't dare to fully rely on v12 either. A Tesla owner discovered from the FSD package that v12 only works on city streets, and v11 is still used in high-speed scenarios.

“The lower limit for an end-to-end system is actually very low.” An autonomous driving engineer said that high-speed driving is faster, the rules are simpler, and the traditional solution that has been refined over a long period of time may be safer than the current end-to-end solution. “Only by raising the lower limit of the end-to-end solution and processing simple scenarios better than the original solution can we really improve performance.”

End-to-end requires more investment to achieve the results of traditional solutions. The picture is shared by Liu Lan Gechuan, head of Xiaopengyuan Autonomous Driving AI, at last year's CVPR
End-to-end requires more investment to achieve the results of traditional solutions. The picture is shared by Liu Lan Gechuan, head of Xiaopengyuan Autonomous Driving AI, at last year's CVPR

“End-to-end models will definitely have 'guardrails' before they go live. It's like being a PhD student in the future, but as they grow up, they need elementary school and middle school teachers to take them, and it takes time to grow.” Wu Xinzhou, head of Nvidia's automotive division, believes that before the end-to-end model becomes mainstream, it is also necessary to work with the original model to ensure safety.

Musk wants to go a little faster. In April of this year, Musk said at the first quarter earnings conference that they can see the model results after three or four months, which can already be called FSD v13. “It is stronger than the current car version, but there are some problems that need to be solved.”

He believes Tesla has found “scaling laws” (laws of scale) applicable to autonomous driving: as long as it continues to expand model parameters, invest more data and computing power, and continuously improve the model architecture, it will have better results.

Over the years, Scaling Laws have been viewed as the secret to OpenAI's ambition to develop larger, more effective models. However, in the field of computer vision, where autonomous driving is located, because the data required for training models is videos related to the physical world, and models need to understand more physical rules, many researchers worry that using more data and computing power to train larger models will fall into a bottleneck, and their capabilities will not improve; on the contrary, they will decline.

“We can estimate future progress based on past trends. Judging from past data, the estimates are usually correct.” Eluswami said at the earnings conference that Tesla trains hundreds of models that can generate different driving routes every week, then tests millions of video clips collected from users and testers. If the results are better, it will be tested by dedicated road testing teams and employees, and finally pushed to more users, and the iteration speed will get faster and faster.

We learned that Tesla's v12 system is currently not capable of solving problems not found in training data like GPT-4, and also needs to learn how to deal with complex scenarios from large amounts of data.

As model capabilities increase, more data is needed to improve the model. Musk said today that for every 10,000 kilometers of driving data, only 1 kilometer can train the model. Moreover, every training session consumes a lot of computing power.

This isn't a problem for Tesla. The millions of Tesla cars on the road can continuously provide it with all kinds of data. Tesla is also developing a more powerful simulation system to generate various data training models. At last year's CVPR conference on computer vision, Eluswamy presented the “World Model” (World Model) that Tesla uses collected data to train. Based on reminders and past videos, it can generate videos of what scenes the car continues to meet and experience, such as how lane lines continue and how intersections change in cameras from different perspectives.

But an automated system built on an end-to-end architecture is a “black box,” and even its creators can't figure out how it can turn a bunch of data into results. What people can do is process the data for it, let the algorithm refine the rules by itself, and process the new data accordingly. If something goes wrong, give it more data and let it fix itself.

This isn't a problem unique to autonomous driving; any application using deep learning is the same. It's just that people don't care that much about Douyin's algorithm that pushes you a few uninteresting videos. They can also put up with ChatGPT's “nonsense” sometimes, but they really care about why the 2-ton car is abnormal on the road.

“It can 'fail silently, 'and when a problem breaks out, it's often difficult to analyze and troubleshoot because the model has become very large.” Capatti talked about the flaws in the “Software 2.0” article. This will be a multiple choice question: “Use the method we understand that works 90%. It's still a model that we don't understand and that works 99% of the time.”

Tesla has made a choice with action. They believe that a pure visual model using an end-to-end neural network and trained on billions of kilometers of real-world data is the right way to achieve large-scale driverless driving.

Musk's order for the autonomous driving team is to do everything possible to increase the distance FSD v12 can travel without humans. They set up a gong in the office; every time they solve a problem, the gong rings once. Musk believes that as long as there is clear data to prove that autonomous driving is more reliable than human driving, there won't be too many regulatory barriers.

Over the past few months, Tesla lowered FSD prices, gave American car owners a free trial, and aggressively brought the v12 version to the market, driving 500 million kilometers in one quarter.

Since Tesla began developing driver-assistance systems, Musk has been extremely optimistic about driverless cars. In 2016, when Tesla first placed 8 cameras around the vehicle with a 360° view, Musk arranged for the team to carefully prepare a video to promote the imminent arrival of driverless driving.

After that, every year or two, Musk would update the upcoming schedule for driverless cars, which then proved to be overly optimistic. But every time, autonomous driving technology will go one step further.

Editor/Jeffrey

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment