IBM Brings the Speed of Light to the Generative AI Era With Optics Breakthrough
IBM Brings the Speed of Light to the Generative AI Era With Optics Breakthrough
YORKTOWN HEIGHTS, N.Y. – Dec. 9, 2024: IBM (NYSE: IBM) has unveiled breakthrough research in optics technology that could dramatically improve how data centers train and run generative AI models. Researchers have pioneered a new process for co-packaged optics (CPO), the next generation of optics technology, to enable connectivity within data centers at the speed of light through optics to complement existing short reach electrical wires. By designing and assembling the first publicly announced successful polymer optical waveguide (PWG) to power this technology, IBM researchers have shown how CPO will redefine the way the computing industry transmits high-bandwidth data between chips, circuit boards, and servers.
紐約鎮高地,紐約州 - 2024年12月9日:IBM(紐交所:IBM)發佈了在光學科技方面的突破性研究,這可能會顯著改善數據中心訓練和運行生成AI模型的方式。研究人員開創了一種新型的共封裝光學(CPO)工藝,這是一種下一代光學技術,使數據中心內部的連接能夠通過光速實現,從而補充現有的短距離電力線路。通過設計並組裝首個公開宣佈成功的聚合物光波導(PWG)來爲這項技術供能,IBM研究人員展示了CPO如何重新定義計算行業在芯片、電路板和服務器之間傳輸高帶寬數據的方式。
Today, fiber optic technology carries data at high speeds across long distances, managing nearly all the world's commerce and communications traffic with light instead of electricity. Although data centers use fiber optics for their external communications networks, racks in data centers still predominantly run communications on copper-based electrical wires. These wires connect GPU accelerators that may spend more than half of their time idle, waiting for data from other devices in a large, distributed training process which can incur significant expense and energy.
如今,光纖技術以高速攜帶數據,跨越長距離,管理幾乎所有全球的商業和通信流量,使用光而不是電力。儘管數據中心爲其外部通信網絡使用光纖,但數據中心中的機架仍然主要依賴於基於銅的電線進行通信。這些電線連接着可能會花費超過一半的時間閒置的gpu芯片加速器,等待來自其他設備的數據,這在大型分佈式訓練過程中可能會產生可觀的費用和能耗。
IBM researchers have demonstrated a way to bring optics' speed and capacity inside data centers. In a technical paper, IBM introduces a new CPO prototype module that can enable high-speed optical connectivity. This technology could significantly increase the bandwidth of data center communications, minimizing GPU downtime while drastically accelerating AI processing. This research innovation, as described, would enable:
IBM研究人員已經展示了一種將光學速度和容量引入數據中心的方法。在一篇技術論文中,IBM介紹了一種新的CPO原型模塊,可以實現高速光連接。這項技術可以顯著提高數據中心通信的帶寬,最小化gpu芯片的閒置時間,同時大幅加快人工智能處理的速度。正如所描述的,這項研究創新將使以下事項成爲可能:
- Lower costs for scaling generative AI through a more than 5x power reduction in energy consumption compared to mid-range electrical interconnects [1], while extending the length of data center interconnect cables from one to hundreds of meters.
- Faster AI model training, enabling developers to train a Large Language Model (LLM) up to five times faster with CPO than with conventional electrical wiring. CPO could reduce the time it takes to train a standard LLM from three months to three weeks, with performance gains increasing by using larger models and more GPUs.[2]
- Dramatically increased energy efficiency for data centers, saving the energy equivalent of 5,000 U.S. homes' annual power consumption per AI model trained.[3]
- 通過將能耗相比於中等範圍的電力互聯的功耗降低超過5倍,降低生成AI的擴展成本,同時將數據中心互聯電纜的長度從一米延長到數百米。
- 更快的AI模型訓練,使開發者能夠使用CPO以比傳統電力線路快五倍的速度訓練大型語言模型(LLM)。CPO可以將訓練一個標準LLM所需的時間從三個月縮短到三週,隨着使用更大的模型和更多的gpu芯片,性能提升將更爲顯著。
- 大幅提高數據中心的能源效率,每訓練一個人工智能模型就節省相當於5000個美國家庭年度電力消耗的能源。
"As generative AI demands more energy and processing power, the data center must evolve – and co-packaged optics can make these data centers future-proof," said Dario Gil, SVP and Director of Research at IBM. "With this breakthrough, tomorrow's chips will communicate much like how fiber optics cables carry data in and out of data centers, ushering in a new era of faster, more sustainable communications that can handle the AI workloads of the future."
"隨着生成性人工智能對能源和處理能力的需求增加,數據中心必須進化——而複合光學器件可以讓這些數據中心具備未來的適應性," IBM研究高級副總裁兼董事Dario Gil說。"通過這一突破,明天的芯片將像光纖傳輸數據進出數據中心一樣進行通信,從而開啓一個能夠處理未來人工智能工作負載的更快、更可持續的通信新時代。"
Eighty times faster bandwidth than today's chip-to-chip communication
比今天的芯片間通信快80倍的帶寬
In recent years, advances in chip technology have densely packed transistors onto a chip; IBM's 2 nanometer node chip technology can contain more than 50 billion transistors. CPO technology aims to scale the interconnection density between accelerators by enabling chipmakers to add optical pathways connecting chips on an electronic module beyond the limits of today's electrical pathways. IBM's paper outlines how these new high bandwidth density optical structures, coupled with transmitting multiple wavelengths per optical channel, have the potential to boost bandwidth between chips as much as 80 times compared to electrical connections.
近年來,芯片技術的進步使得晶體管密集地堆疊在芯片上;IBM的2納米節點芯片技術可以容納超過500億個晶體管。CPO技術旨在通過使芯片製造商能夠添加連接電子模塊中芯片的光學通路,來提升加速器之間的互連密度,超越如今電氣通路的極限。IBM的論文概述了這些新型高帶寬密度光學結構如何與每個光學通道傳輸多個波長相結合,相較於電氣連接,芯片之間的帶寬有可能提升多達80倍。
IBM's innovation, as described, would enable chipmakers to add six times as many optical fibers at the edge of a silicon photonics chip, called "beachfront density," compared to the current state-of-the-art CPO technology. Each fiber, about three times the width of a human hair, could span centimeters to hundreds of meters in length and transmit terabits of data per second. The IBM team assembled a high-density PWG at 50 micrometer pitch optical channels, adiabatically coupled to silicon photonics waveguides, using standard assembly packaging processes.
按照描述,IBM的創新使芯片製造商能夠在硅光子芯片的邊緣增加六倍的光纖數量,這種現象被稱爲"濱海密度",相較於當前最先進的CPO技術。每根光纖約爲人類頭髮寬度的三倍,長度可以從幾厘米到幾百米不等,每秒可傳輸TB級的數據。IBM團隊在50微米間距的光波導(PWG)上組裝了一種高密度的PWG,與標準組裝包裝工藝相結合,逐漸耦合到硅光子波導上。
The paper additionally indicates that these CPO modules with PWG at 50 micrometer pitch are the first to pass all stress tests required for manufacturing. Components are subjected to high-humidity environments and temperatures ranging from -40°C to 125°C, as well as mechanical durability testing to confirm that optical interconnects can bend without breaking or losing data. Moreover, researchers have demonstrated PWG technology to an 18-micrometer pitch. Stacking four PWGs would allow for up to 128 channels for connectivity at that pitch.
論文還指出,這些在50微米間距的PWG的CPO模塊是第一個通過製造所需的所有壓力測試的組件。這些元件在高溼度環境和-40°C至125°C的溫度下進行測試,還經過機械耐久性測試,以確認光互連可以在不破損或丟失數據的情況下彎曲。此外,研究人員已將PWG技術的間距演示到18微米。疊加四個PWG將允許在該間距下實現多達128個連接通道。
IBM's continued leadership in semiconductor R&D
IBM在半導體研發方面繼續保持領先地位
CPO technology enables a new pathway to meet AI's increasing performance demands, with the potential to replace off-module communications from electrical to optical. It continues IBM's history of leadership in semiconductor innovation, which also includes the first 2 nm node chip technology, the first implementation of 7 nm and 5 nm process technologies, Nanosheet transistors, vertical transistors (VTFET), single cell DRAM, and chemically amplified photoresists.
CPO科技爲滿足人工智能日益增長的性能需求提供了一條新途徑,具有將電力通信替換爲光學通信的潛力。這延續了IBM在半導體創新方面的領導歷史,其中還包括首個2納米節點芯片技術、首個7納米和5納米工藝技術的實現、超薄納米片晶體管、垂直晶體管(VTFET)、單電芯DRAm以及化學增光光刻膠。
Researchers completed design, modeling, and simulation work for CPO in Albany, New York, which the U.S. Department of Commerce recently selected as the home of America's first National Semiconductor Technology Center (NSTC), the NSTC EUV Accelerator. Researchers assembled prototypes and tested modules at IBM's facility in Bromont, Quebec, one of North America's largest chip assembly and test sites. Part of the Northeast Semiconductor Corridor between the United States and Canada, IBM's Bromont fab has led the world in chip packaging for decades.
研究人員在紐約阿爾巴尼完成了CPO的設計、建模和仿真工作,該地區最近被美國商務部選爲美國首個國家半導體科技中心(NSTC)的所在地,即NSTC EUV加速器。研究人員在IBM位於魁北克布羅蒙特的設施中組裝了原型並測試了模塊,布羅蒙特是北美最大的芯片組裝和測試場地之一。作爲美國和加拿大之間的東北半導體走廊的一部分,IBM的布羅蒙特工廠在芯片封裝方面領先全球數十年。
About IBM
關於IBM:IBM是全球領先的混合雲、人工智能和諮詢專家提供商。我們幫助客戶在175多個國家利用自己的數據洞見,簡化業務流程,降低成本,並在各自的行業中獲得競爭優勢。超過4,000家金融服務、電信和醫療保健等重要基礎設施領域的政府和企業實體依靠IBM的混合雲平台和Red Hat OpenShift,快速、高效且安全地實現數字化轉型。IBM在人工智能、量子計算、行業專用的雲解決方案和諮詢方面的突破性創新爲客戶提供了開放和靈活的選擇。所有這些都得到IBM長期承諾的信任、透明、責任、包容和服務的支持。訪問ibm.com了解更多信息。
IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. More than 4,000 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service. Visit for more information.
IBM是全球混合雲和人工智能以及諮詢專業知識的領先供應商。我們幫助超過175個國家的客戶利用他們數據的洞察力,簡化業務流程,降低成本,並在各自的行業中獲得競爭優勢。在金融服務、電信和醫療等關鍵基礎設施領域,超過4000個政府和企業實體依賴於IBM的混合雲平台和紅帽OpenShift,迅速、高效、安全地實現他們的數字化轉型。IBM在人工智能、量子計算、行業特定雲解決方案和諮詢方面的突破性創新,爲我們的客戶提供開放和靈活的選項。所有這一切都得益於IBM對信任、透明、責任、包容性和服務的長期承諾。欲了解更多信息,請訪問。
[1] A reduction from five to less than one picojoule per bit.
[1] 從五減少到小於一個皮焦每比特。
[2] Figures based on training a 70 billion parameter LLM using industry-standard GPUs and interconnects.
[2] 基於使用行業標準gpu芯片-雲計算和互連訓練700億參數的LLm。
[3] Figures based on training a large LLM (such as GPT-4) using industry-standard GPUs and interconnects.
[3] 這些數據基於使用行業標準的gpu芯片-雲計算和互連技術訓練大型LLm(例如GPt-4)得出。
Media Contacts:
媒體聯繫方式:
Bethany Hill McCarthy
IBM Research
bethany@ibm.com
貝瑟尼·希爾·麥卡錫
IBm 研究
bethany@ibm.com
Willa Hahn
IBM Research
willa.hahn@ibm.com
威拉·漢
IBm 研究
willa.hahn@ibm.com
譯文內容由第三人軟體翻譯。