The large-scale streaming multi-modal interaction model “Japan-Japan 5.5” was released, the first domestic benchmark against GPT-4o. On July 5, Shangtang Technology held the “Love Without Borders · Xiang Xinli” artificial intelligence forum at WAIC 2024 and released the first large-scale model “Nishi-Nissin SenseNova 5.5,” which has streaming native multi-modal interaction capabilities. The overall performance was 30% higher than “Nishi-Nisshin 5.0” two months ago, and the interaction effect and multiple core indicators were benchmarked against GPT-4O. The main update points of “Nishi-Nisshin 5.5” include: (1) Overall performance improvement of the 600 billion parameter-based model. Extensive use of synthetic high-level thought chain data enhances deductive thinking ability, and significantly enhances ability in mathematical logic, English, and instruction following. (2) It pioneered the launch of the first “what you see is what you get” model in China, the “Daily New 5o”, which provides streaming multi-modal interaction, bringing a new AI interaction model.
(3) The end-side model was fully upgraded, and the “Nishi-Nisshin 5.5 Lite” was released. Compared with the April 5.0 model, the accuracy of the model was increased by 10%, the inference efficiency was increased by 15%, and the first package delay was reduced by 40%. In particular, in terms of multi-modal capabilities, “Nichi-Nisshin 5.5” targets and even surpasses GPT-4o in most of the core test set indicators.
As the AI model evolves, innovative interaction models will take the lead in defining the development of the industry. By integrating cross-modal information, based on various forms such as sound, text, images, and video, the “Nishi-Nisshin 5o” brings a real-time streaming multi-modal AI interaction experience. The user experience is as direct as human communication itself. You can directly see what the customer sees and understand the customer's needs. This interactive mode is highly adaptable for multi-tasking, can handle various tasks naturally in the same model, and adaptively adjust behavior and output according to different contexts. From scene understanding and analysis, object information descriptions, book graphic summaries, and even rough stick figures and facial emotions, “Nichi-Nisshin 5o” can accurately grasp, interact smoothly, and interact with people in playful language.
We are paying close attention to end-side AI and industry applications, and AI commercial implementation is accelerating. According to Shang Tang, for everyone to be able to use the big AI model, it is necessary to start with a terminal. The “Nishi-Nisshin 5.5 Lite” end-side large language model “Discuss SenseHat Lite-5.5” has been fully upgraded in all dimensions, and is currently the end-side model with the best overall performance. At the same time, it is compatible with the cloud model to guarantee both performance and speed. At present, Shangtang's “Ririxin” end-side model has penetrated into various industries, initiated commercial connections with more than 150+ customers, covering many IoT device deployment applications such as smartphones, tablets, VR all-in-one machines, in-vehicle computers, and smart desk lamps.
Connect to the merchant soup “New day by day? “Discussion” End-side large model, the cost of a single device is as low as 9.9 yuan/year. The large model on the end side of Shangtang has many advantages, including: (1) It can be supported. Various vertical business directions, such as optimization in different fields such as writing and encyclopedia knowledge. (2) Availability. It also supports end-side deployment and cloud-side calls.
(3) Low threshold. The end-side SDK is easy to integrate and can support rapid deployment. At present, Shangtang's “Ri-Nissin” large model system has demonstrated practical value in a large number of application scenarios and vertical industries: in the field of programming, providing functions such as intelligent code completion through large models, which can significantly improve the daily work efficiency of programmers; in the medical field, from pre-diagnosis to health consultation to post-diagnosis follow-up, the empowerment of large models improves patients' experience throughout the medical treatment process; in the financial field, Shangtang has cooperated with banks, insurance, brokerage firms and asset management customers in multiple modes and scenarios; in the consumer sector, Shangtang has cooperated with many leading domestic manufacturers to transform the scenario into a large capacity model Chemical services, such as through Copilot helps users generate forms, analyze data, and write copywriting to improve personal productivity. Furthermore, in order to help more enterprise users access at a lower threshold, Shangtang recently launched the “Big Model 0 Yuan Go” plan. All newly registered “Nisshin” users will receive a number of free service packages involving mobility, migration, and training. At the same time, 50 MillionTokens packages will be given free of charge, and an exclusive moving consultant will be dispatched to provide a series of migration training from OpenAI to “Nisshin”.
Vimi, a large model for controllable character video generation, was released, and AI+ video 2C is being implemented at an accelerated pace. According to Vimi Camera's official WeChat account, Shang Tang released the first large-scale model for video generation of controllable characters, Vimi, at WAIC 2024, and was selected as the highest honor “Treasure of the Town Hall” at the WAIC exhibition, making it the most innovative exhibit at this conference.
Based on the powerful capabilities of Shang Tang Ri-ri's big model, Vimi can generate videos of people with the same movements as the target through just one photo of any style. Not only can it achieve accurate facial expression control, but it can also control the natural body changes of the person in the photo within the bust area. It also supports various driving methods, and can be driven by various elements such as existing character videos, animations, sounds, and text. The advantage of the Vimi model is that it has accumulated over many years of facial tracking technology and precise control of facial details, making the character's expressions more vivid. Compared to other models on the market, Vimi is more accurate in controlling the face and upper body, and can produce videos with high consistency and harmony of light and shadow. Furthermore, Vimi has excellent stability. Especially in long video scenarios, it can stably keep the character's face controllable, and can generate single-camera character videos for up to 1 minute or more. The picture effect does not deteriorate or distort over time, truly meeting the needs of stable video generation for entertainment and interaction over a long period of time. In character video scene generation, Vimi can change the entire environment according to body control, including generating reasonable hair shaking. It can even simulate the angle of the input lens. For example, the input lens is gradually getting closer, and the output can also have the effect of gradually getting closer naturally. The natural and smooth flow of hair, costume changes, and background environment creation can be presented by Vimi, making the resulting video more realistic and vivid. Additionally, it supports simulating changes in lighting and shadows, making every scene in the video full of cinematic quality. The Vimi model can stably maintain control over the character's face, especially in long video scenes. In addition, the Vimi model can also control the angle of the lens and generate reasonable hair shaking effects, providing video creators with more creative freedom. The Vimi camera is the first C-end product in Vimi's controllable character video model system, which can meet the entertainment needs of a wide range of female users. Users only need to upload high-definition character images from different angles to automatically generate digital aliases and photos and videos in different styles, providing various generation styles such as beautiful photography style and fantasy style, so that users can feel as if they have traveled through different dimensions and enjoy the immersive visual effects of a large film. For emoji users who are passionate about emoji packs, the Vimi camera can drive the generation of various fun character memes from a single image. The gameplay is diverse, and creative freedom is achieved. We believe that the release of Vimi has pushed the company into a new era in the field of AI+ video. Vimi's functions have further broadened the boundaries of AI large model applications and laid a solid foundation for the company's business expansion.
“Sensechat” released a local version in Hong Kong, and AI is increasingly targeting market segments. In July, Shang Tang's “Sensechat” mobile app and web version were offered free of charge to Hong Kong users. “Sensechat” is based on the Cantonese version of the “Discussion Multiple Modality Big Model” launched by Shang Tang in May of this year. Relying on Shang Tang's outstanding language and multi-modal abilities, as well as his in-depth understanding of Cantonese and local culture and hot topics, “Sensechat” is positioned as a “caring cotton jacket for Hong Kong users”. Users can directly chat with it in their most familiar Cantonese, directly type in text or voice, ask questions, search for things, generate images, and write copywriting. From life, study to work, “Sensechat” can bring a truly authentic AI experience. It is very clear about the latest local information and social topics, and even uses local buzzwords flexibly. Download the “Sensechat” iOS mobile app through the App Store and register with your Hong Kong phone number or email to experience the smartest, fastest, and authentic AI experience anytime, anywhere for free. The Android version will also be launched soon. The “Sensechat” app supports text or voice input, making the experience convenient. The main features include: (1) Localized experience. “Sensechat” has an in-depth understanding of local culture, customs and popular social topics in Hong Kong.
Users can naturally and smoothly answer questions in Cantonese mixed English and “Sensechat” in the mobile app. (2) Multimodal Q&A. Users can directly upload a file or image “Sensechat” to thoroughly analyze the content, generate a summary, and answer the user's questions about the file. (3) Real-time search. “Sensechat” can integrate multiple sources of information, so users can quickly obtain the latest information, including real-time news, weather conditions, etc. Users can also conduct further searches. (4) Image generation.
With just a simple description, “Sensechat” can quickly generate images in various styles, let users share them with friends in real time, or upload them to their own social platforms, making creations more casual. (5) Copywriting. Whether it's advertising copy, business plans, or academic writing, users can get professional copywriting advice through “Sensechat” to inspire writing. Furthermore, the web version of “Sensechat” has powerful multi-modal file processing capabilities, and the ability to understand, think, and generate ultra-long texts, and supports uploading up to 50 files. Whether you want to ask about life tips, solve math problems, analyze images, and write code, the online version of “Sensechat” can easily do it. We believe that the release of the Hong Kong version of “Sensechat” is an important experiment for the company to launch in the market segment. The adaptation to the Cantonese environment also highlights the company's leading technical strength in the big model. The company's future AI commercial implementation is worth looking forward to.
Actively cooperating with Huawei, Shengteng helped launch Shangtang AI. During WAIC 2024, Shengteng Artificial Intelligence Industry Summit 2024 was successfully held, focusing on big model reasoning and customer partner best practices, and exploring ways to accelerate big model innovation and application implementation. Yang Fan, co-founder of Shangtang Technology and president of the Big Device Business Group, was invited to attend and deliver the keynote speech “Ecological Connectivity Leads the Wave of Innovation in the Big Model Era”, sharing the native development practices of the full-stack technical capabilities of the Shangtang Rishin Big Model System based on Shengteng AI's basic software and hardware platform, leading the wave of innovation in the big model era. As an important engine for accelerating AI innovation, native development is gradually becoming the focus of the industry. Gong Ruihao, Research Director of Shangtang Science and Technology Model Research, was invited to attend the “Shengteng AI Partner Native Development Results Release”. Together with the partners, Shangtang University Device will work with partners to jointly promote technological innovation and integrated industrial development. It is worth mentioning that at the WAIC 2024 Shangtang Artificial Intelligence Forum, the Shengteng Native Model Cooperation Signing Ceremony was held. Shangtang Technology and Huawei Technology Co., Ltd. signed a cooperation agreement to push the native development of big models to a new level. From infrastructure construction, to breakthroughs in large-scale models, to application prosperity, everything is inseparable from close collaboration between upstream and downstream ecosystems. Over the past year, Shang Tang has worked closely with Shengteng and Shengsi teams to jointly build the next big model base and a new big model training ecosystem. For example, Shangtang can achieve industry-leading computing power utilization rates on clusters of more than 3,000 calories, so it can serve downstream enterprises with cluster capabilities with higher performance and higher efficiency. Previously, the Shangtang University Device AI Cloud, the Ririxinxin·Negotiation Language Model, and the Shangtang Medical Model “Big Doctor” all passed mutual compatibility tests with Atlas series servers, which can provide customers with safer, more efficient, and reliable artificial intelligence full-stack solutions and application experiences. Yang Fan said, “The deep integration of Shangtang's platforms, algorithms, and software capabilities in industry scenarios with Shengteng's hardware and underlying basic software capabilities will provide greater value and more diverse solutions for future artificial intelligence services to various industries and industries.” In the future, Shang Tang will continue to deepen cooperation with Huawei to create more efficient, low-cost, and low-threshold artificial intelligence infrastructure, better serve more industries and more scenarios, bring more and better intelligent services to individual consumers and enterprises, and promote the continuous development of artificial intelligence technology and industry in China. We believe that through active cooperation with Huawei, the company has obtained important domestic computing power partners. Along with the development of Shengteng's ecosystem, the implementation of Shangtang AI is also expected to receive significant support.
Profit forecasting and investment advice. We believe that the release of the streaming multi-modal interaction model “Japan-Japan 5.5” has been benchmarked for the first time in China, which further demonstrates Shangtang's strong technical capabilities, and has also laid a solid foundation for AI commercial implementation. The release of Vimi has also driven the company's AI+ video business into a new era, along with the continuous iteration of the new Japanese model in the future, driving the continuous development of the company's related AI applications. A new space for the company's growth has opened up, and future development is worth looking forward to. Combining various factors, we gave Shangtang Group a rating of 16-20 times PS in 2024, corresponding to the company's reasonable value range of HK$2.27-2.84 (HK$1 = RMB 0.9315), and gave it a “superior to the market” rating.
Risk warning. Risks that AI commercialization falls short of expectations; risks that the company's internationalization falls short of expectations, etc.