share_log

特斯拉AI Day全记录+解读:算力怪兽DOJO,纯视觉FSD强在哪

Tesla AI Day Full Record+Interpretation: Computing Power Monster DOJO, Where Is Pure Vision FSD Strong

42號車庫 ·  Aug 21, 2021 07:56

If you see the live link of Tesla, Inc. AI DAY on the social network today and happen to enter the live broadcast at about 2 hours and 6 minutes, then you are probably as surprised as I am by what you see in the next few minutes.

The surprise will first come from the sci-fi humanoid robot below, named Tesla Robot, which is the egg of AI DAY, a robot that shares a vision camera and neural network computing chip with Tesla, Inc. car.

图片

图片

But just as my expectations were instantly raised and I dreamed of seeing the robot walk onto the stage and say the word "So it is with considerable pride that I introduce a man who's been like a father to me-Elon Musk", something like this happened again:

图片

Seeing this COS dance, those viewers who like me are looking forward to being pushed up should be confused three times in their minds:

"What? "

"that's it? "

"RNM, refund the money! "

图片

But joking aside, if you watched the live broadcast of AI DAY from beginning to end, then you should know that this dance is one of the few relaxing moments in today's three-hour press conference.

The information density of the whole conference, the wide range of technical fields involved, and the heavy accent may take me several weeks to interpret in depth. In this article, let's briefly summarize the information of this conference.

01 the basis of the realization of autopilot

One of the abilities: vision

At the beginning of the conference, Tesla, Inc. AI Director Andrej Karpathy came on stage to introduce what Tesla, Inc. was doing: to build a vision-based computer neural network system like the human brain.

It is best to find a replacement for the eyes. In Tesla, Inc. 's current model, this part consists of eight ADAS cameras that can obtain a 360 °non-dead angle view around the car body.

图片

After that, the whole system also needs a series of complex neural networks, such as retina, optic chiasma, optic lobe nerve bundle and so on, which mainly need to be realized by software and algorithms.

图片

In the process of recognizing visual features, the brain reads information through the retina, and the computer identifies it by calculating the arrangement of pixels. Tesla, Inc. has done the division of labor and cooperation among different regions in the software feature extraction layer of this link, so in the feature recognition, we can speculate those features that are not obvious enough according to the environmental conditions. for example, the vehicle in the following picture is basically mosaic resolution.

图片

Tesla, Inc. establishes many such feature recognition instructions for different types of features, such as traffic lights, traffic lines, traffic participants and so on, which can be used for multi-task feature recognition of the same material. Tesla, Inc. calls this recognition network "HydraNet".

Then there is the environment modeling tracker "occupancy tracker" in the old version of the software, which can realize cross-picture image stitching with time axis to form the environment modeling around a car body, but there are two problems, one is the huge workload of environment modeling, and the other problem is that the modeling accuracy is not enough.

图片

Therefore, Tesla, Inc. hopes to change his strategy. The original approach was to predict the images of each camera first, and then to splice and fuse the information. Now the idea is to directly splice the materials of the eight cameras, and then to synthesize a real-time three-dimensional space and then to make various predictions.

This process seems simple and difficult to do, and after solving many of the key problems, the final multi-camera vision has a significant improvement in perception accuracy.

图片

However, there are other problems that cannot be solved by multi-camera vision.Prediction when features are obscuredAnd forA continuous memory that has been marked by the road

图片

At this time, Tesla, Inc. added to the prediction model.Prediction of feature moving over time, andDistance memory of road markings. After such a measure, when the field of vision is temporarily obscured, the system can still "speculate" the trajectory of the object behind the occlusion according to the trajectory of the features before occlusion, and record the ability to drive through various road signs.

图片

After that, a "Spatial RNN" spatial recurrent neural network is added to the system to selectively predict and record some kinds of features in the environment (which can be carried out at the same time) within the vehicle field of view. One example cited by Andrej Karpathy is that the system does not record the road environment when there is a vehicle occlusion, and will not record it until the obscured car walks away, which is understood to be "less useless work". Then take a few more trips on the same road and these recorded environmental features can also form a feature map.

The overall effect of the above measures working together is very considerable, such as inDepth of fieldSpeedIn detection, the green line in the picture is the data of millimeter wave radar, the yellow line data predicted by single camera vision is relatively general, and the blue line data predicted by multi-camera vision is basically the same as that of radar. So in Andrej's words, multi-camera vision has been able to replace millimeter wave radar.

图片

These are the important contents of Tesla, Inc. 's prediction of environmental awareness, which Andrej said at the press conference that there is still room for improvement, such as latency, the team is still exploring pre-fusion awareness strategies, and the cost of processing data, and so on.

图片

The second ability: regulation and control

The core goal of vehicle regulation isAchieve the best balance among safety, comfort and efficiency

Corresponding to the two major challenges, one is that the optimal solution of the regulation and control algorithm has deep localization characteristics, and the optimal solution of area A may not be applicable to area B, and can not be treated equally in different areas.

The second challenge is that there are many variables that affect the regulation and control strategy in actual driving, and the vehicle needs to control a lot of parameters, and the vehicle needs to plan what to do in the next 10-15 seconds, which requires a lot of real-time calculation.

图片

Taking the scene in the figure as an example, the vehicle needs to turn left and line to the Blue Line lane twice after the intersection and complete the left turn, but now it is faced with these considerations:

  1. There are two cars approaching quickly behind the left lane.

  2. Before the next intersection, it is necessary to successfully merge the line twice in a short distance.

图片

The system will simulate a variety of strategies, and then find out the strategies that can achieve the above requirements. And in the actual driving, in addition to planning their own driving path, it is also necessary to predict the paths of other traffic participants. In the feasible strategy, the path optimization is carried out according to the principle of "the best balance of safety, efficiency and comfort". When the plan is done, all that is left is to control the vehicle to run according to the plan.

图片

However, in more open and disordered road scenarios, the complexity of regulation and control will be much higher. For example, in the parking lot scenario shown below, if the path search logic set is Euclidean distance algorithm, the system needs to try.398,320It takes the second time to successfully calculate the path to enter the parking space.

图片

If you do some optimization and add a "follow the landmark direction of the parking lot" to the search logic, then the system tries22,224After that, the path into the parking space can be found, which reduces the number of trial and error of the first strategy by 94.4%.

图片

Then go a little further, the algorithm is changed to Monte Carlo tree search, the logic is changed to neural network strategy and value equation, and finally the system only needs to try288The path to the parking space can be found in the second time, and there is a further 98.7% reduction in trial and error compared with the second scheme that has been optimized.

图片

In this case, it can also be seen that the logic and algorithms used in the regulation and control system in different scenarios have a great impact on the final amount of calculation. if the method is right, get twice the result with half the effort.

At the end of the self-driving framework diagram of the two key capabilities of perception and regulation and control, this article will not expand too much.

图片

02 AI driving School

With the framework, what needs to be done is to train the neural network in the framework to a higher ability, just like human beings need to accumulate driving experience and learn driving skills after they have eyes as perception and brain plus hands and feet as control system. To let a machine learn to drive, you also need an AI driving school, and Tesla, Inc. 's AI driving school is naturally not low.

Data tagging is a big job.

The data needs to be tagged before it is thrown to the system for learning. Tesla, Inc. did not outsource the manual tagging part, and the company has a 1000-person data marking team to do this.

图片

With the passage of time, the data marked by Tesla, Inc. is also increasing day by day, and the marker type has evolved from marking on the 2D image to marking directly on the 4D space with time coordinates.

图片

However, the focus of data marking is automatic marking, such as the system can automatically mark lanes, shoulders, roads, sidewalks and so on after inputting driving materials.

图片

On this basis, when there are enough Tesla, Inc. models passing through the same area, the roads in this particular area can be marked. The marked data can be used to reconstruct the model of road environment.

图片

These data are not used for high-precision maps, nor will they be kept on the vehicle system all the time, but only for training. In order to ensure the quality of the reconstructed road model, it will need to manually eliminate and optimize some noises.

图片

The characteristics of markings are not limited to common lanes and shoulders, but also to fences, roadblocks, and so on.

Another thing that is very useful for occlusion prediction in the previous regulation control algorithm isOcclusion perspective marker. In the image below, the objects that are actually obscured by the green circle will be marked in perspective, and the system can know how the objects move when they are obscured, and then have corresponding learning strategies.

图片

Finally, a very realistic environment model can be constructed by using these marking measures, and specific and targeted algorithm training can be carried out in such modeling.

图片

When a scene is done, you can search for similar scenarios from the vehicle-marked data. For example, cases in which the car in front is obscured by smoke or other disturbing factors.Ten thousand actual scenes can be found in the queue in a week.Then we can use these "similar questions" to carry on the fast generalization training to the neural network.

图片

In addition to the real test paper, there is also a simulation paper.

Musk said on Autopilot Day in 2019 that in addition to collecting real road environment training algorithms, Tesla, Inc. also did a lot of simulation tests, and the simulator created by Tesla, Inc. may be one of the best in the world.

This time, the AI DAY official also introduced this system. First of all, Tesla, Inc. explained three situations in which the simulator is very helpful:

  1. Rare and rare scenes, such as the owner running on the highway with his pet.

  2. Scenes where it is difficult to mark, such as when people cross a road without traffic lights

  3. The end of a certain section of the road.

图片

Generally speaking, my understanding is that there are scenes with abnormal behavior, scenes that can not be marked with features, and scenes that rarely go, which can be supplemented by a customizable simulator.

And because of the high degree of customization in the simulator, it can be used in the testing of sensors.Man-made challengeSuch as setting noise, exposure, reflectivity, thermal air refraction, dynamic blur, optical distortion, etc., to verify the anti-interference of the system.

In order to simulate more scenes, thousands of models of vehicles, pedestrians and other props have been made in this simulator, with a total road length of more than 2000 miles.

So a set ofScene reconstructionThe process comes out: first, when you encounter a real scene, the first layer is rebuilt by automatic tagging, and then the scene is restored in the simulator on the basis of the first layer reconstruction.

图片

In such an AI driving school, Tesla, Inc. continuously collects all kinds of "real questions" materials through the vehicles on the road. After these materials have been marked, simulated and reconstructed, there are "simulation questions". The experience of the system is improved after crawling and rolling in the "simulation questions" with zero loss in the examination, and the ability to do "real questions" is also improved. According to the "examination outline" developed by the developer, you can also produce some "improvement papers" for special scenarios.

Current training equipment

Currently used in the market Tesla, Inc. model on the FSD Computer, that is, HW 3.0 we should also be familiar with, this was born in 2019 double 72 TOPS computing power SoC chip is the core computing unit of the vehicle, using a dedicated neural network architecture for accelerated computing. Other contents will not be introduced too much in this article.

图片

AI verification testOn the other hand, Tesla, Inc. has prepared more than 3000 FSD Computer, specialized equipment management software and customized test plans, running more than 1 million algorithm verification tests a week.

图片

Tesla, Inc. used three major computing centers for neural network training, of which the automatically tagged computing centers had 1752 GPU, and the other two training centers had 4032 GPU and 5760 GPU.

图片

Objectively speaking, the performance of these computing centers is already very powerful, but it is still not enough for Tesla, Inc., so Tesla, Inc. designed a special hardware for machine learning and training.

03 special supercomputer & robot

A super fast training computer

There have been rumors about Dojo for some time, but we still start with the goal at the beginning of our research and development. The three directions are the strongest AI training performance, which can lead to larger and more neural networks, high energy efficiency and low cost.

图片

A very important point in the detailed design of D1 chip is that "dedicated core", layout, bandwidth capacity, node architecture and so on are all implemented around the realization of the best neural network training.The floating point computing power of 7nm D1 chip is 362TFLOPs under BF16/CFP8 and 22.6TFLOPs under FP32.

图片

Finally, in the calculation part, the calculation power of the "calculation brick" composed of 25 D1 chips reaches 9 PFLOPs,I/O bandwidth to 36 T shock S, and the heat dissipation power can reach 15 kW.

图片

A supercomputing system composed of 120 such "calculating bricks"The computing power can reach 1.1 EFLOP. At the same cost, the performance has been improved by 4 times, the energy efficiency has been increased by 30%, and the footprint has been reduced by 80%.

图片

I can't describe what this concept is. I can't understand it, but I'm shocked.

Tesla Bot

Finally, there is the beginning of the robot, the dance part is a human COS, the actual Tesla Bot parameters are shown in the picture.

图片

To my surprise and reason, the robot uses Autopilot's camera as visual perception and FSD Computer as the computing core.

图片

So there is a very amazing thing: in a series of content such as multi-camera neural network, neural network-based programming, automatic marking, algorithm training, and so on, Tesla Bot has a lot of content that can be used off-the-shelf.Although none of them has been built yet, it may already have the strongest scale advantage of intelligent robots in the world.

In my opinion, this is tantamount to giving competitors a silent reprieve.

04 is written at the end

If you have forgotten the title of the article when you see it here, you have agreed with what the title is trying to convey.

In fact, Tesla, Inc. 's visual perception scheme can do much more than the general public thinks. the idea of scale effect has begun to be reflected in the visual scheme based on neural network.

But we have to say that Tesla, Inc. did blow the bull out a long time ago, but the depth of field and speed detection that visual perception is not good at is only past the inflection point of exceeding millimeter wave radar until a few months ago, and when it will arrive in China is still unknown.

As a domestic consumer, I may not experience the convenience brought by these technologies for a long time, but whether it is the realization of the technical route or the layout of considerable strategic planning, Tesla, Inc. is still in the leading state, and this gap may be widened under the blessing of Dojo.

Then, about Tesla, Inc. doing Dojo supercomputing and robots, it is another way to reduce the marginal cost of training through economies of scale, and the economies of scale of the two complement each other.

Beat different opponents in the same wayPerhaps it is also the place where AI DAY touches me the most.


The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment