focus on technology stocks

Released on the same day! Google and OpenAI, "face-to-face confrontation."

来源：证券时报

3月26日凌晨，谷歌正式推出了旗下新一代大语言模型Gemini 2.5。

$谷歌-C (GOOG.US)$ 将Gemini 2.5定义为公司迄今为止“最智能的AI模型”，Gemini 2.5 Pro实验版本在多项基准测试中全面超越OpenAI o3-mini、Claude3.7 Sonnet、Grok-3和DeepSeek-R1。谷歌DeepMind首席技术官Koray Kavukcuoglu表示，Gemini 2.5代表了谷歌让“人工智能更智能、推理能力更强”的目标的下一步。

值得注意的是，就在谷歌发布Gemini 2.5大约一小时后，OpenAI就紧急发布了迄今为止最先进的图像生成器GPT-4o图像生成技术。据介绍，GPT-4o图像生成功能可精准文本渲染、严格遵循指令提示、深度调用4o知识库及对话上下文——包括对上传图像进行二次创作或将其转化为视觉灵感。OpenAI创始人兼CEO山姆·奥特曼在直播中还现场用GPT-4o自拍生成了一张漫画图片。

谷歌新推理模型，编码推理能力优秀

据谷歌介绍，公司长期以来都在探索如何通过强化学习、思维链提示等技术，让人工智能变得更聪明、推理能力更强。去年12月，谷歌推出了Gemini 2.0 Flash Thinking模型，这一多模态推理模型具备快速且透明的处理能力。今年1月22日，谷歌正式发布了其Gemini 2.0 Flash Thinking推理模型的增强版。

此次最新发布的Gemini 2.5系列模型，是谷歌挑战OpenAI“o”系列推理模型的尝试。作为该系列模型中最先进的复杂任务模型，Gemini 2.5 Pro实验版在多项基准测试中全面超越OpenAI o3-mini、Claude 3.7Sonnet、Grok-3和DeepSeek-R1，并且以显著的优势在LMArena（一个用于评估大型语言模型的开源平台）上排名第一。不过，谷歌并未放出Gemini 2.5 Pro与OpenAI o1、OpenAI o1-Pro和OpenAI o3等模型在基准测试中的对比。

在编码性能上，Gemini 2.5比2.0有了很大的飞跃，擅长创建视觉上引人注目的网页应用程序和代理代码应用程序，以及代码转换和编辑。在代理代码评估的行业标准SWE-BenchVerified上，Gemini 2.5Pro使用自定义代理设置得分为63.8%。

据谷歌发布的演示视频，Gemini 2.5 Pro可以利用其推理能力通过从单行提示生成可执行代码来创建视频游戏。例如，能够在指定编程语言的情况下，设计出一款恐龙小游戏，生成了像素化的恐龙图像和有趣的游戏背景。

在推理能力方面，Gemini 2.5 Pro在一系列需要高级推理的基准测试中都处于领先地位。在“人类的最后考试”中（注：“人类的最后考试”是一个由数百名学科专家设计的数据集，旨在捕捉人类知识和推理的前沿），它在未使用工具的模型中也获得了18.8%的最高分数，这是目前最先进的成绩。

此外，Gemini 2.5 Pro具备原生多模态处理能力和超长上下文窗口，支持文本、图像、音频、视频及代码的多模态输入，上下文窗口达100万token（约75万单词），可解析完整《指环王》系列文本，未来将升级至200万token。

OpenAI紧急推出4o图像生成功能

在谷歌深夜上线旗下最强推理模型Gemini 2.5的一个小时后，OpenAI也紧锣密鼓地推出了GPT-4o全新的图像生成功能。

在此之前，OpenAI旗下的文生图模型主要是DALL-E系列。与DALL-E不同，此次OpenAI的全新图像生成器基于其原生多模态GPT-4o模型，奥特曼在直播活动中宣布，原生图像生成功能基于GPT-4o模型，不再需要调用独立的DALL-E文生图模型。

据介绍，基于GPT-4o的多模态能力，ChatGPT在图像生成时能更加精确地遵循指示、更精确地渲染图像上的文字，轻松创作出虚实结合的场景。目前，该功能已经作为ChatGPT中的默认图像生成器向Plus、Pro、Team和免费用户陆续推出，企业和教育用户将很快允许访问。

据OpenAI官方发布的案例，GPT-4o图像生成功能可以生成手写字，精准理解提示词中的每一个细节，而且图像清晰度可与高清照片媲美。

例如，当输入提示词“这是用手机拍摄的玻璃白板的广角图像，拍摄地点是一间俯瞰海湾大桥的房间。视野中可以看到一位女士正在写字，她身穿一件印有大型OpenAI标志的T恤。笔迹看起来很自然，但有点凌乱，我们可以看到摄影师的倒影”后，最终生成的图片将“海湾大桥”“印有大型OpenAI标志的T恤”“摄影师的倒影”等细节均有体现。

GPT-4o图像生成功能还能成为实用的生产力工具。比如，要为餐厅设计一份菜单图片，用户在提示词中写明不同菜品的名字、价格、主要特点，GPT-4o即可生成一张符合要求、可以商用的菜单图片。

不过，OpenAI也承认模型并不完美，依然在裁剪、幻觉、精确绘图等方面存在多个限制，例如在上下文信息较少的提示情况下，图像生成功能可能会编造信息，在复杂度高的情况下难以渲染菲拉丁语言，并产生错误的字符等。OpenAI表示，将在首次发布后通过模型改进来解决这些问题。

一方面是谷歌发布迄今最智能的推理模型，向OpenAI的“o”系列推理模型发出挑战；另一方面是OpenAI推出GPT-4o图像生成功能，应对来自谷歌“全家桶”多模态能力带来的压力，两大硅谷科技巨头竞相发布AI新产品的背后，是全球AI竞争的持续升级。随着AI竞争日趋激烈，各厂商都在加快研发速度，无论是推理模型、多模态大模型还是AI智能体，接下来或都将不断迎来新的技术进展与突破。

编辑/Rocky

Source: Securities Times

In the early morning of March 26, Google officially launched its new generation large language model, Gemini 2.5.

$Alphabet-C (GOOG.US)$ Gemini 2.5 is defined as the "smartest AI model" the company has developed to date, with the Gemini 2.5 Pro experimental version significantly outperforming OpenAI's o3-mini, Claude3.7 Sonnet, Grok-3, and DeepSeek-R1 in multiple benchmark tests. Alphabet-C DeepMind's Chief Technology Officer Koray Kavukcuoglu stated that Gemini 2.5 represents the next step towards Google's goal of making AI "smarter and more capable of reasoning."

Notably, just about an hour after Alphabet-C released Gemini 2.5, OpenAI urgently launched the most advanced image generator to date, the GPT-4o image generation technology. According to reports, the GPT-4o image generation features accurate text rendering, strict adherence to prompt instructions, deep calls to the 4o knowledge base and conversation context - including re-creation of uploaded images or transforming them into visual inspiration. OpenAI founder and CEO Sam Altman even generated a cartoon image live using GPT-4o.

Google's new reasoning model excels in its encoding reasoning capabilities.

According to Google, the company has long been exploring how to make AI smarter and more capable of reasoning through techniques such as reinforcement learning and chain-of-thought prompting. In December last year, Google released the Gemini 2.0 Flash Thinking model, a multimodal reasoning model with fast and transparent processing capabilities. On January 22 of this year, Google officially launched an enhanced version of its Gemini 2.0 Flash Thinking reasoning model.

The latest release of the Gemini 2.5 series model is Google's attempt to challenge OpenAI's "o" series reasoning models. As the most advanced complex task model in the series, the Gemini 2.5 Pro experimental version has comprehensively outperformed OpenAI o3-mini, Claude 3.7 Sonnet, Grok-3, and DeepSeek-R1 in multiple benchmark tests, and ranks first with a significant advantage on LMArena (an open-source platform for evaluating large language models). However, Google has not provided comparisons of Gemini 2.5 Pro with models such as OpenAI o1, OpenAI o1-Pro, and OpenAI o3 in benchmark tests.

In terms of coding performance, Gemini 2.5 has made a significant leap compared to 2.0, excelling in creating visually appealing web applications and proxy code applications, as well as code conversion and editing. In the industry-standard SWE-BenchVerified for proxy code evaluation, Gemini 2.5 Pro achieved a score of 63.8% using custom proxy settings.

According to a demo video released by Alphabet-C, Gemini 2.5 Pro can utilize its inference capabilities to create video games by generating executable code from single-line prompts. For example, it can design a dinosaur mini-game, generating pixelated dinosaur images and interesting game backgrounds given a specified programming language.

In terms of reasoning ability, Gemini 2.5 Pro leads in a series of benchmark tests requiring advanced reasoning. In the "Final Exam of Humanity" (Note: "Final Exam of Humanity" is a dataset designed by hundreds of subject experts aimed at capturing the forefront of human knowledge and reasoning), it also achieved the highest score of 18.8% among models that did not use tools, which is currently a state-of-the-art performance.

In addition, Gemini 2.5 Pro has native multimodal processing capabilities and an ultra-long context window, supporting multimodal inputs of text, images, audio, video, and code, with a context window of up to 1 million tokens (approximately 0.75 million words), capable of parsing the complete "The Lord of the Rings" series text, and it will be upgraded to 2 million tokens in the future.

OpenAI urgently launched the 4o image generation feature.

Just an hour after Alphabet-C launched its most powerful reasoning model Gemini 2.5 late at night, OpenAI also quickly rolled out the brand-new image generation feature of GPT-4o.

Prior to this, the text-to-image models under OpenAI were mainly the DALL-E series. Unlike DALL-E, this new image generator from OpenAI is based on its native multimodal GPT-4o model. Ultraman announced during a live event that the native image generation feature is based on the GPT-4o model, eliminating the need to call a separate DALL-E text-to-image model.

According to reports, based on the multimodal capabilities of GPT-4o, ChatGPT can more accurately follow instructions during image generation and render text on images more precisely, easily creating scenes that combine reality and fantasy. Currently, this feature has been gradually rolled out as the default image generator in ChatGPT to Plus, Pro, Team, and free users, with enterprise and education users soon to be allowed access.

According to a case released by OpenAI, the image generation function of GPT-4o can generate handwritten text, accurately understanding every detail of the prompt, and the image clarity can rival that of high-definition photos.

For example, when the prompt "This is a wide-angle image of a glass whiteboard taken with a phone, shot in a room overlooking the Bay Bridge. In the view, a woman can be seen writing, wearing a t-shirt with a large OpenAI symbol. The handwriting looks natural but a bit messy, and we can see the photographer's reflection" is input, the final generated image will reflect details such as "Bay Bridge," "t-shirt with a large OpenAI symbol," and "photographer's reflection."

The GPT-4o image generation function can also become a practical productivity tool. For instance, to design a menu image for Restaurants, the user specifies the names of different dishes, prices, and key features in the prompt, and GPT-4o can generate a compliant, commercially usable menu image.

However, OpenAI also admits that the model is not perfect and still has multiple limitations in areas such as cropping, hallucination, and precise drawing. For example, in cases where there is little contextual information, the image generation function may fabricate information, and in more complex situations, it struggles to render the Palatino language and produce erroneous characters. OpenAI stated that it will address these issues through model improvements after the initial release.

On one hand, Alphabet-C has released the most intelligent reasoning model to date, challenging OpenAI's "o" series reasoning models; on the other hand, OpenAI has launched the GPT-4o image generation function to respond to the pressure from Alphabet-C's suite of multimodal capabilities. The competition between these two Silicon Valley tech giants to release new AI products reflects the ongoing escalation of Global AI competition. As AI competition intensifies, companies are accelerating their R&D efforts, whether in reasoning models, multimodal large models, or AI agents, and new technological advancements and breakthroughs are expected to continue.

Editor/Rocky

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

Views 12.3k

Recommended

Write a comment

Discussing

北水爆買！中國資產能否延續漲勢？

3月17日早盤，地產代理、物業服務及管理等板塊漲幅明顯，貝殼-W早盤漲逾4%，碧桂園服務漲逾9%。政策消息面上，兩部門發文落實專項債支持收地，中房協組織民營房企座談會。中國資產本輪火爆行情還能持續多久？你會如何投資？ Show More

北水狂掃港股！近期如何操作？

71%

29%

看好！繼續加倉

我恐高，逢高減倉

16K votes

年頭旺到年尾

Feb 27 16:09

Review on February 27...

$Hang Seng Index (800000.HK)$ $HSI Futures Current Contract (HSIcurrent.HK)$ The day before yesterday's review mentioned that the estimated previous top of 23,700 was not the peak. Yesterday it immediately broke through, and the increase was unexpectedly close to 1,000 points, as the short-term trading underestimated the extent of the rise. Therefore, many positions were previously entered in a bearish way, but in the end, the bears exited with stop losses at the close.

Today, after hitting the high near 24,000 in the early session and entering bearish positions, the index fell sharply by nearly over 600 points, immediately recouping yesterday's losses significantly.

Moreover, today it broke the new high again, reaching a maximum of 24,076, but by the end of the market, it fell back by about 70 points, producing a bearish candle. The current trend has not yet been broken, but from the previous low until now, it has risen close to 6,000 points. It is believed that those with positions can continue to hold until there is a clear trend reversal for profit-taking. Those without positions can wait for a pullback to get in. Actually, it is hoped for a quick pullback, as it allows for entry and also provides a healthy breath.

Currently, the outlook remains the same as before. It is believed that even if there is a pullback, it shouldn't be too deep. However, if Futures fail to stabilize and close below 22,350, there may still be room for decline. The chance of Futures falling below 21,400 in the short term should be low, so it is considered that if a significant pullback occurs, it presents a good opportunity to incrementally go long. Recently, there has been a consistent approach to not hold positions overnight, only focusing on immediate trades, as there is no high chasing and no casual short selling.
Support and resistance can be referenced based on spot prices.
Support levels are 23150, 23250, 2...

focus on technology stocks

同日发布！谷歌和OpenAI，“正面硬刚”

Released on the same day! Google and OpenAI, "face-to-face confrontation."

谷歌新推理模型，编码推理能力优秀

OpenAI紧急推出4o图像生成功能

Google's new reasoning model excels in its encoding reasoning capabilities.

OpenAI urgently launched the 4o image generation feature.

Risk Disclaimer

Statement