OpenAI Co-Founder Ilya Sutskever Rings Alarm Bells: AI's 'Fossil Fuel' Is Running Out As World Reaches 'Peak Data'

OpenAI聯合創始人伊利亞·蘇茨克弗警示：人工智能的『化石燃料』正在耗盡，全球已達『數據峯值』。

Benzinga · 12/17 16:41

OpenAI co-founder Ilya Sutskever is sounding the alarm on a looming data crisis that could reshape the artificial intelligence industry's future.

What Happened: Speaking at the Conference on Neural Information Processing Systems (NeurIPS) in Vancouver on Friday, Sutskever warned that the critical resource powering AI development is running dry, reported the Observer.

"Data is the fossil fuel of AI," Sutskever said at the conference. "We've achieved peak data and there will be no more."

The warning comes amid growing evidence of data access restrictions. A study by the Data Provenance Initiative found that between 2023 and 2024, website owners blocked AI companies from accessing 25% of high-quality data sources and 5% of all data across major AI datasets.

This scarcity is already forcing industry leaders to adapt. OpenAI CEO Sam Altman has proposed using synthetic data – information generated by AI models themselves – as an alternative solution. The company is also exploring enhanced reasoning capabilities through its new o1 model.

Why It Matters: The data shortage concerns echo recent observations from venture capital firm Andreessen Horowitz. Marc Andreessen noted that AI capabilities have plateaued, with multiple companies hitting similar technological ceilings.

Sutskever, who left OpenAI earlier this year to launch Safe Superintelligence with $1 billion in backing from investors including Andreessen Horowitz and Sequoia Capital, believes AI will evolve beyond its data dependency.

"Future AI systems will understand things from limited data, they will not get confused," he said, though he declined to specify how or when this would occur.

The increasing difficulty in accessing diverse and high-quality datasets for AI training has prompted companies like OpenAI, Meta Platforms Inc (NASDAQ:META), NVIDIA Corp (NASDAQ:NVDA), and Microsoft Corp (NASDAQ:MSFT) to adopt data scraping practices, though not without controversy.

For example, Microsoft's LinkedIn was recently scrutinized for using user data to train its AI models before updating its terms of service.

Similarly, Meta has been using publicly available social media posts from Europe to train its Llama large language models, though privacy concerns have prompted legal challenges.

Nvidia, too, has been scraping videos from YouTube and Netflix, including those from popular tech YouTuber Marques Brownlee, to train its AI systems. While these companies argue their practices comply with copyright laws, the ethical implications of scraping data without explicit consent have raised alarm across the industry.

OpenAI co-founder Ilya Sutskever is sounding the alarm on a looming data crisis that could reshape the artificial intelligence industry's future.

OpenAI 聯合創始人伊利亞·蘇茨克維爾正在發出警告，預示着一個即將到來的數據危機，這可能重塑人工智能行業的未來。

事件是什麼：蘇茨克維爾在週五於溫哥華舉行的神經信息處理系統大會（NeurIPS）上發言時警告說，推動人工智能發展的重要資源正在枯竭，觀察者報道。

"Data is the fossil fuel of AI," Sutskever said at the conference. "We've achieved peak data and there will be no more."

"數據是人工智能的化石燃料，"蘇茨克維爾在大會上表示，"我們已經達到了數據的頂峯，將不再有更多。"

這一警告是在數據訪問限制日益明顯的背景下發出的。數據來源倡議組織的一項研究發現，在2023年到2024年之間，網站所有者阻止人工智能公司訪問25%的高質量數據源，以及5%的主要人工智能數據集中的所有數據。

這種稀缺性已經迫使行業領袖進行適應。OpenAI CEO山姆·奧特曼提議使用合成數據——由人工智能模型自身生成的信息——作爲替代解決方案。該公司還在探索通過其新推出的o1模型增強推理能力。

爲什麼這很重要：數據短缺的擔憂與創投公司安德森·霍洛維茨的近期觀察相呼應。馬克·安德森指出，人工智能能力已經達到了瓶頸，多個公司已經遇到了類似的技術天花板。

蘇茨克維爾今年早些時候離開OpenAI，與安德森·霍洛維茨和紅杉資本等投資者一起以10億的資金啓動安全超級智能項目，他相信人工智能將超越對數據的依賴。

"Future AI systems will understand things from limited data, they will not get confused," he said, though he declined to specify how or when this would occur.

"未來的人工智能系統將能夠理解有限數據的信息，它們不會感到困惑，"他說，但他拒絕具體說明這一變化將何時或如何發生。

獲取多樣化和高質量數據集以進行人工智能訓練的難度不斷增加，這促使像OpenAI、Meta Platforms（納斯達克：META）、英偉達（納斯達克：NVDA）和微軟（納斯達克：MSFT）等公司採用數據抓取實踐，儘管這並不沒有爭議。

For example, Microsoft's LinkedIn was recently scrutinized for using user data to train its AI models before updating its terms of service.

例如，微軟的LinkedIn最近因在更新其服務條款之前使用用戶數據來訓練其人工智能模型而受到審查。

Similarly, Meta has been using publicly available social media posts from Europe to train its Llama large language models, though privacy concerns have prompted legal challenges.

類似地，Meta一直在使用來自歐洲的公開社交媒體帖子來訓練其Llama大型語言模型，儘管隱私問題已引發法律挑戰。

英偉達也在抓取YouTube和奈飛的視頻，包括來自流行科技博主Marques Brownlee的內容，以訓練其人工智能系統。雖然這些公司辯稱他們的做法符合版權法，但在沒有明確同意的情況下抓取數據的倫理影響在行業板塊引發了警惕。

譯文內容由第三人軟體翻譯。

以上內容僅用作資訊或教育之目的，不構成與富途相關的任何投資建議。富途竭力但無法保證上述全部內容的真實性、準確性和原創性。

OpenAI Co-Founder Ilya Sutskever Rings Alarm Bells: AI's 'Fossil Fuel' Is Running Out As World Reaches 'Peak Data'

OpenAI Co-Founder Ilya Sutskever Rings Alarm Bells: AI's 'Fossil Fuel' Is Running Out As World Reaches 'Peak Data'

風險及免責聲明

聲明