Whale Dialogue | Why is TuSimple shifting from autonomous driving to AIGC?

lanjinger.com · Dec 23, 2024 19:43

图片来自视觉中国

蓝鲸新闻12月23日讯（记者武静静）昔日自动驾驶明星公司图森未来在经历退市等一系列风波后，已经重新选择了创业航向。

2024年8月，公司首次透露将瞄准AIGC方向，四个月后的12月17日，公司宣布启用全新品牌CreateAI，并发布了其首个图生视频大模型产品“Ruyi”。

目前，Ruyi-Mini-7B版本已经在huggingface上正式开源，用户可以从下载使用，据了解，“Ruyi”专为在消费级显卡（例如 RTX 4090）上运行。

从自动驾驶转向AIGC，好调头吗？

公司为何从自动驾驶迈向截然不同的视觉大模型赛道？

图森未来技术负责人在接受蓝鲸新闻专访时告诉蓝鲸新闻，此次转型主要是从公司转型和业务发展的角度出发。一方面，公司此前在做自动驾驶时已经在AI领域积累了算法、算力和数据经验；另一方面，联合创始人陈默在游戏圈有资源积累，有机会快速将技术落地。

陈默也在此前接受媒体采访时提到，图森未来在试图寻找一条更快能够依靠现有资源“救活”公司的路，随着视觉模型技术的不断推进和公开，AIGC成为在商业化空间、技术可达性方面都符合条件的一个目前的最佳选择。

据蓝鲸新闻了解，图森未来做视觉模型的团队就是此前公司做自动驾驶的团队，一些技术经验可以复用。

图森未来技术负责人谈道，AI视觉模型和自动驾驶都依赖于算法、算力和数据这“三驾马车”来推动技术发展，且两者都依赖大量的数据进行训练和优化。而视频生成技术与自动驾驶中的感知模块非常相似，两者都以数据驱动为主，研发路径相对较短，依赖的技术基础也比较清晰。

在他看来，自动驾驶技术涉及感知、定位、规划、控制等多个算法模块，以及软件系统、硬件设计、车辆结构设计等多个领域。相比之下，视频模型技术的研发路径更短，技术面更窄，主要集中在数据处理和模型训练方面。

“数据的重要性甚至超过算法，”图森未来技术负责人强调，他提到图森在自动驾驶领域积累了丰富的数据标注经验，拥有自建的标注团队和标注平台，并建立了完整的数据处理流程。 “这些经验和工具可以直接应用到视频模型的数据准备阶段，从而节省大量时间和成本。”

当然，在很多视觉模型的技术和效果上，图森还需要从零开始探索。目前，公司视频模型主要围绕五大关键指标进行：生成质量、一致性、可控性、易用性和成本。

图森未来技术负责人告诉蓝鲸新闻，模型生成质量是首要目标，确保生成的视频内容在画面、动作、细节等方面达到高水平。“公司采取螺旋上升式的研发策略，在保证生成质量和一致性的前提下，逐步提升模型的可控性、易用性和成本效益。”

图森选择第三条路：不靠模型赚钱，自己下场做内容

当下，视觉模型领域新进展不断，当地时间12月9日，美国OpenAI公司正式发布了最新版本的视频生成大模型Sora-Turbo，它能够根据文本、图像或视频输入生成新的视频内容。此外，国内市场，不论是字节、快手等大型科技巨头，还是创业公司Pika、爱诗科技、生数科技等都在持续有紧锣密鼓的推进技术和产品迭代。

图森的入局是否是想在竞争激烈的视觉大模型赛道分一杯羹？

从目前公司的业务进展和采访中得知，这个答案是否。图森未来技术负责人告诉蓝鲸新闻，更准确的说法是，图森未来要成为一家内容公司而非大模型技术公司，和快手等平台以及Pika等创业公司的方向并不相同。

目前，市面上视觉大模型的商业模式无非两种：一类是Runway、Pika等公司，面向C端用户，提供付费的视频生成工具或服务，让创作者付费自制内容；另一类是面向B端影视文娱以及游戏等公司，帮助产业降本增效。

图森未来技术负责人向蓝鲸新闻谈道，如果定位成纯粹的视频模型公司，to C和to B都存在明显挑战：

一方面，to C方向，视频生成工具的目标用户群是专业创作者，而非普通大众，收费模式和盈利前景都不明确，且视频模型需要大量的算力支持，运营成本高，国内市场，很难在短期内依靠收费，吸引用户并实现盈利。

另一方面，单纯面向B端的技术赋能落地挑战巨大，因为技术公司很难深入了解具体场景的需求，也很难将技术有效地融入到实际的制作流程中，控制内容的质量和风格。

相比很多视频模型正在将重心瞄准在追求技术的通用性上，图森选择了不一样的第三条路：直接把模型技术开源，不靠模型赚钱，并购入了经典IP，自己下场用大模型做内容。

据蓝鲸新闻了解，公司目前有专门的动漫和游戏团队已经在开发新项目。

“我们希望打造一家以AI技术驱动的视频内容创作公司，打造端到端的视频内容生成链条。最终通过优质内容来吸引用户和实现商业价值。”图森未来技术负责人说道。“技术只是工具，最终的目标是为用户提供内容。”

目前，图森已经布局动漫、游戏板块，其全新品牌CreateAI已经获著名武侠IP《金庸群侠传》正版授权，将开发一款大型武侠开放世界RPG游戏。2024年8月，公司也官宣了和上海三体动漫有限公司达成合作，共同开发《三体》系列的首部动画长篇电影和视频游戏。据悉，公司也会在12月推出 SLG 游戏工具和游戏本身。

“我们现在拥有「金庸群侠传」和「三体」这两个顶级 IP，我们的目标是在 2027 年实现 10 亿美元的收入。”陈默在最近一次接受采访时说到了图森在AIGC方向的未来目标。

Picture from Visual China

Blue Whale News, December 23 (Reporter Wu Jingjing) TuSimple, a former autonomous driving star company, has re-chosen the path of entrepreneurship after experiencing a series of storms such as delisting.

In August 2024, the company first revealed that it would target the AIGC direction. Four months later, on December 17, the company announced the launch of a new brand, CreateAI, and released its first big TuSen video model product “Ruyi”.

Currently, the Ruyi-mini-7B version is officially open source on HuggingFace, and users can download and use it. It is understood that “Ruyi” is designed to run on consumer video cards (such as RTX 4090).

Is it okay to switch from autonomous driving to AIGC?

Why is the company moving from autonomous driving to a vastly different visual model circuit?

In an exclusive interview with Blue Whale News, the person in charge of future technology at Tucson told Blue Whale News that this transformation mainly starts from the perspective of company transformation and business development. On the one hand, the company has already accumulated algorithm, computing power, and data experience in the AI field when doing autonomous driving; on the other hand, co-founder Chen Mo has accumulated resources in the gaming industry and has the opportunity to quickly implement technology.

Chen Mo also mentioned in an interview with the media earlier that in the future, Tucson is trying to find a faster way to “save” the company with existing resources. With the continuous advancement and disclosure of visual model technology, AIGC is currently the best choice in terms of commercial space and technical accessibility.

According to Blue Whale News, Tucson's future visual model team is the company's previous autonomous driving team, and some technical experience can be reused.

The person in charge of future technology at Tucson said that AI visual models and autonomous driving all rely on the “troika” of algorithms, computing power, and data to drive technological development, and both rely on large amounts of data for training and optimization. However, video generation technology is very similar to sensing modules in autonomous driving. Both are mainly data-driven, the R&D path is relatively short, and the technical foundation they rely on is also relatively clear.

In his view, autonomous driving technology involves various algorithm modules such as perception, positioning, planning, and control, as well as various fields such as software systems, hardware design, and vehicle structure design. In contrast, video model technology has a shorter development path and narrower technical aspects, mainly focusing on data processing and model training.

“The importance of data even surpasses algorithms,” said TuSimple technical director. He mentioned that Tucson has accumulated rich experience in data labeling in the field of autonomous driving, has a self-built labeling team and labeling platform, and has established a complete data processing process. “These experiences and tools can be applied directly to the data preparation phase of the video model, saving significant time and costs.”

Of course, in terms of the technology and effects of many visual models, Tucson still needs to explore from scratch. Currently, the company's video model mainly revolves around five key metrics: production quality, consistency, controllability, ease of use, and cost.

TuSimple Technology Director told Blue Whale News that the quality of model generation is the primary goal to ensure that the generated video content reaches a high level of image, motion, and detail. “The company adopts a spiraling R&D strategy to gradually improve the controllability, ease of use and cost efficiency of the model on the premise of ensuring the quality and consistency of production.”

Tucson chose the third path: don't rely on models to make money, end up doing the content yourself

Currently, new developments continue in the field of visual models. On December 9, local time, OpenAI of the United States officially released the latest version of the video generation large model Sora-Turbo, which can generate new video content based on text, image, or video input. Furthermore, in the domestic market, whether it is large technology giants such as Byte and Kuaishou, as well as startups Pika, Aishi Technology, and Shengshu Technology, etc., continue to actively promote technology and product iteration.

Did Tucson enter to get a share of the pie on the highly competitive visual supermodel circuit?

I learned from the company's current business developments and interviews that the answer is no. The person in charge of future technology at Tucson told Blue Whale News that the more accurate statement is that the TuSimple to become a content company rather than a big model technology company is not the same direction as platforms such as Kuaishou and startups such as Pika.

Currently, there are only two types of large-scale visual models on the market: one is Runway, Pika, and other companies, which provide paid video generation tools or services for C-end users, so that creators can pay to create their own content; the other is for B-side film, entertainment, and gaming companies to help reduce costs and increase efficiency in the industry.

TuSimple Technology Director told Blue Whale News that if positioned as a pure video model company, TOC and TOB both have obvious challenges:

On the one hand, in the TOC direction, the target user group of video generation tools is professional creators, not the general public. The fee model and profit prospects are unclear, and the video model requires a large amount of computing power support. The operating cost is high. The domestic market makes it difficult to rely on fees to attract users and achieve profits in the short term.

On the other hand, the implementation of technology empowering the B-side alone is a huge challenge, because it is difficult for technology companies to thoroughly understand the requirements of specific scenarios, and it is also difficult to effectively integrate technology into actual production processes to control the quality and style of content.

Compared to many video models that are focusing on the versatility of technology, Tucson chose a different third path: directly open source model technology, not rely on models to make money, buy classic IPs, and end up using big models as content themselves.

According to Blue Whale News, the company currently has a dedicated animation and game team already developing new projects.

“We want to build a video content creation company driven by AI technology to create an end-to-end video content generation chain. Ultimately, high-quality content is used to attract users and achieve commercial value.” Tucson's Head of Future Technology said. “Technology is just a tool; the ultimate goal is to provide users with content.”

Currently, Tucson has set up an animation and game section, and its new brand, CreateAI, has been authorized by the famous martial arts IP “The Legend of Jin Yong Xia” to develop a large-scale martial arts open world RPG game. In August 2024, the company also officially announced that it had reached a cooperation with Shanghai Three Body Animation Co., Ltd. to jointly develop the first animated feature film and video game in the “Three Body” series. It is reported that the company will also launch SLG game tools and the game itself in December.

“We now have two top IPs, “Legend of Jin Yong” and “Trisomy,” and our goal is to achieve $1 billion in revenue by 2027.” In a recent interview, Chen Mo talked about Tucson's future goals in the AIGC direction.

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

鲸对话｜图森未来为何要从自动驾驶驶转向AIGC？

Whale Dialogue | Why is TuSimple shifting from autonomous driving to AIGC?

Risk Disclaimer

Statement