Source: Semiconductor Industry Watch. At yesterday's Conputex conference, Dr. Lisa Su released the latest roadmap. Afterwards, foreign media morethanmoore released the content of Lisa Su's post-conference interview, which we have translated and summarized as follows: Q: How does AI help you personally in your work? A: AI affects everyone's life. Personally, I am a loyal user of GPT and Co-Pilot. I am very interested in the AI used internally by AMD. We often talk about customer AI, but we also prioritize AI because it can make our company better. For example, making better and faster chips, we hope to integrate AI into the development process, as well as marketing, sales, human resources and all other fields. AI will be ubiquitous. Q: NVIDIA has explicitly stated to investors that it plans to shorten the development cycle to once a year, and now AMD also plans to do so. How and why do you do this? A: This is what we see in the market. AI is our company's top priority. We fully utilize the development capabilities of the entire company and increase investment. There are new changes every year, as the market needs updated products and more features. The product portfolio can solve various workloads. Not all customers will use all products, but there will be a new trend every year, and it will be the most competitive. This involves investment, ensuring that hardware/software systems are part of it, and we are committed to making it (AI) our biggest strategic opportunity. Q: The number of TOPs in PC World - Strix Point (Ryzen AI 300) has increased significantly. TOPs cost money. How do you compare TOPs to CPU/GPU? A: Nothing is free! Especially in designs where power and cost are limited. What we see is that AI will be ubiquitous. Currently, CoPilot+ PC and Strix have more than 50 TOPs and will start at the top of the stack. But it (AI) will run through our entire product stack. At the high-end, we will expand TOPs because we believe that the more local TOPs, the stronger the AIPC function, and putting it on the chip will increase its value and help unload part of the computing from the cloud. Q: Last week, you said that AMD will produce 3nm chips using GAA. Samsung foundry is the only one that produces 3nm GAA. Will AMD choose Samsung foundry for this? A: Refer to last week's keynote address at imec. What we talked about is that AMD will always use the most advanced technology. We will use 3nm. We will use 2nm. We did not mention the supplier of 3nm or GAA. Our cooperation with TSMC is currently very strong-we talked about the 3nm products we are currently developing. Q: Regarding sustainability issues. AI means more power consumption. As a chip supplier, is it possible to optimize the power consumption of devices that use AI? A: For everything we do, especially for AI, energy efficiency is as important as performance. We are studying how to improve energy efficiency in every generation of products in the future-we have said that we will improve energy efficiency by 30 times between 2020 and 2025, and we are expected to exceed this goal. Our current goal is to increase energy efficiency by 100 times in the next 4-5 years. So yes, we can focus on energy efficiency, and we must focus on energy efficiency because it will become a limiting factor for future computing. Q: We had CPUs before, then GPUs, now we have NPUs. First, how do you see the scalability of NPUs? Second, what is the next big chip? Neuromorphic chip? A: You need the right engine for each workload. CPUs are very suitable for traditional workloads. GPUs are very suitable for gaming and graphics tasks. NPUs help achieve AI-specific acceleration. As we move forward and research specific new acceleration technologies, we will see some of these technologies evolve-but ultimately it is driven by applications. Q: You initially broke Intel's status quo by increasing the number of cores. But the number of cores of your generations of products (in the consumer aspect) has reached its peak. Is this enough for consumers and the gaming market? Or should we expect an increase in the number of cores in the future? A: I think our strategy is to continuously improve performance. Especially for games, game software developers do not always use all cores. We have no reason not to adopt more than 16 cores. The key is that our development speed allows software developers to and can actually utilize these cores. Q: Regarding desktops, do you think more efficient NPU accelerators are needed? A: We see that NPUs have an impact on desktops. We have been evaluating product segments that can use this function. You will see desktop products with NPUs in the future to expand our product portfolio.
Intel has been facing too many painful things recently. On one hand, its foundry business cannot take off, and on the other hand, its AI business is not meeting expectations. The datacenter market is also facing attacks from AMD and Arm. Even its proud consumer market has encountered a little trouble.
After the release of the financial report for the third quarter of 2024, Intel CEO Pat Gelsinger stated that the recently launched Lunar Lake architecture by Intel was designed as a niche, one-time product with no direct successor. During the financial report conference call, he explained that the adoption of external process nodes and the complexity of integrating LPDDR5X memory into the package led to low profit margins, affecting Intel's future product line decisions.
According to Gelsinger, the upcoming Panther Lake CPU will have over 70% of its parts manufactured by Intel's own fabs and will be the first client CPU to use Intel's 18A process node. This product is planned to be released in the second half of 2025 and will not include integrated memory in the package. The NOVA Lake, which will succeed Panther Lake, will also not include integrated memory in the package.
This means that integrated memory in the package was just a brief meteor in Intel's processor history. It was sentenced to death shortly after its birth, leaving people not only regretful but also questioning: Why cut off the integrated memory that was just recently adopted and used?
ARM puzzles Intel.
Representing the x86, Intel has repeatedly fought against rivals with streamlined instruction set architectures like ARM. The earliest rivalry can be traced back to the last century's PowerPC. It once fought against IBM, Apple, and Motorola on its own and eventually achieved glorious victories, realizing success in both consumer and server markets, putting immense pressure on other manufacturers.
Yet, even the formidable x86 empire fails to cover all aspects of consumer electronics, especially in the low-power field represented by embedded systems. Interestingly, ARM seems to have been born for this purpose.
The earliest ARM processor originated from the Acorn project codenamed Acorn RISC, designed by two talented computer scientists, Sophie Wilson and Steve Furber, from the University of Cambridge under extremely limited resources to create Acorn's own 32-bit microprocessor.
The structure of ARM1 was very simple, created in 808 lines of Basic language, with only 25,000 transistors, and even without a multiplier component. Quoting Sophie in a telegram interview: "We achieved this by thinking things through very, very carefully in advance," although it did not create much impact at the time, it preserved a spark of processors in the United Kingdom.
However, unlike the popular 386 processor at the time, the initial design concept of the ARM processor was low-cost, low-power, and high-performance, which obviously aligned with the emerging mobile market, even though a true mobile market had not yet developed.
A few years later, Apple seemed to see the potential of ARM and jointly invested with Acorn and VLSI to create ARM. At that time, Apple was looking for a low-power processor for the project codenamed Newton, whose ultimate goal was to create the world's first tablet.
Unfortunately, the Newton project was too futuristic, and the tablet produced was much weaker in performance compared to today, quickly leading to its failure. However, ARM did not falter from this setback but instead, through this failure, found a much broader horizon.
From 1993 to 1995, $Cirrus Logic (CRUS.US)$、$Texas Instruments (TXN.US)$、$Nokia Oyj (NOK.US)$、$Sharp (6753.JP)$Companies like Samsung, NEC, and others have successively joined the ARM camp. Through collaboration, ARM invented the 16-bit Thumb instruction set, truly creating a SoC business model based on ARM, and also ushered in the most important processor core since the company's establishment — ARM7. The die size used by ARM7 is one-sixteenth of the 80486, with a price of only about $50. The smaller die size allows the ARM7 processor to achieve lower power consumption, suitable for handheld applications.
Why do so many companies, including large electronics companies that already produce their own chips, want to sign with ARM? Part of the reason is the cost advantage — ARM licenses are not expensive, and certainly cheaper than spending years hiring hundreds of engineers to start designing new chips from scratch. Another reason is the technological legacy created by Sophie Wilson and Steve Furber. ARM chips are manufactured quickly, simply, and have low power consumption.
Furthermore, ARM has another ace up its sleeve: it is not just a chip manufacturing company. When ARM collaborates with other companies, it becomes a partner to help design custom solutions tailored to the specific needs of other companies. Many companies have successfully developed processors that meet their own needs through collaboration with ARM, leading to commercial success.
ARM's low power consumption, simplicity, and affordable licensing fees have allowed it to secure a foothold outside of Intel processors.
However, the real success of ARM processors came with the four product lines that followed Apple: iPod, iPhone, iPad, and ARM Mac.
Among these, the most noteworthy and interesting is the ARM processor used in the first-generation iPhone.
Steve Jobs once asked Intel's CEO Paul Otellini if they were interested in bidding to manufacture chips for the phone Apple was about to launch. At that time, Intel, a manufacturing giant, had a strong sales momentum for desktop x86 CPUs and also owned an ARM-based business – XScale acquired from Digital Equipment Corporation (DEC) in 1998. Thus, Intel could have easily met Apple's needs.
However, Otellini declined the proposal. He calculated that the highest price Apple was willing to pay per CPU was lower than Intel's production cost, and he was uncertain about the potential high sales volume of Apple's phone. In addition, he had concerns regarding supporting the XScale business, especially as Intel was developing a low-power version of x86 Atom. Consequently, he decided to focus on x86 and sold the XScale division in 2006.
After Intel rejected the collaboration opportunity, Apple turned to Samsung, which agreed to manufacture a powerful new ARM chip for Apple's upcoming phone. This chip was the S5L8900, an SoC (System on Chip) featuring an ARM11 core running at a lowered frequency of 412 MHz, with 128MB of memory, up to 16GB of storage, and an integrated PowerVR MBX Lite 3D graphics processor. This processor harked back to the ARM 250 "Archimedes on a chip" from 1991, but this time, it was not for a desktop computer, but for a revolutionary phone.
Starting from this year, with its low power consumption characteristics, ARM swiftly captured the smartphone market. Following the release of the iPad, ARM further dominated the tablet market, while Intel's highly anticipated Atom fell short, quickly losing market share to rock bottom levels.
The loss in the mobile market has been distressing for Intel. After Apple, there was also an attack on Intel and x86 in the PC market: in November 2020, Apple officially released the M1 chip, announced the MacBook with this chip, and declared that Macs would gradually move away from Intel's x86 platform towards Apple's self-developed ARM platform.
The biggest advantage of the M1 chip, as well as the advantage that ARM has continued since the 1980s, is its low power consumption.
Low power consumption does not mean low performance. It means consuming less power for the same performance, or achieving higher performance with the same power consumption. In the PPT presented by Apple in 2020, the peak power consumption of the M1 CPU was about 18W. In contrast, x86 chips have peak power consumption in the range of 35-40W, and the conclusion is that M1 achieved higher performance at low core frequencies: at peak-to-peak comparison, M1's performance increased by about 40% compared to x86 products, while consuming only 40% of the power.
Intel has never suffered such a heavy blow in its decades of development history.
Dispelling the ARM efficiency myth
I believe many people have had this question: Is x86 necessarily more power-hungry than ARM at the same performance level?
The answer is definitely no. There is no inherently low-power architecture. ARM's current low power consumption is the result of continuous guidance and optimization over many years. x86 has also made attempts at low power consumption. For example, Atom mentioned earlier is a product line that Intel used to compete with ARM's low power consumption.
Over the years, in apple and$Qualcomm (QCOM.US)$Through the unremitting efforts of the company, the efficiency of the ARM architecture has been built into a myth, to the point that many consumers have formed such an inherent impression, but Intel has decided to break this myth themselves.
On the eve of Computex in June this year, Intel held the Intel Tech Tour in Taipei, detailing its upcoming mobile processor codenamed Lunar Lake. The new chip aims to achieve various goals, from higher efficiency to on-device artificial intelligence. Intel also specifically mentioned that they hope to 'break the myth that x86 cannot be as efficient as ARM.'
During the event, Intel did not shy away from discussing ARM chips, nor did they attempt to ignore this 'elephant in the room.' Qualcomm and Apple are continuously eroding the market share originally belonging to Intel and x86, how can Intel recover from the outdated practices of the past decade?
Firstly, it is important to clarify that x86 is an extremely powerful architecture. x86 processors are based on Complex Instruction Set Computing (CISC) architecture, which includes more complex instructions that consume more power. Some x86 instructions may even take multiple cycles to execute, which increases power consumption but reduces efficiency.
Due to having more complex instructions, x86 processors may also have more complex pipelines. For example, x86 uses a variable-length instruction set, with instruction lengths ranging from 1 byte to 15 bytes, while ARM instruction lengths are fixed (although Thumb instructions can be variable). Due to instruction complexity, branch prediction in x86 processors also plays a particularly important role, as these instructions are often converted into simpler RISC-like micro-operations. These branch predictors are very advanced, as the costs of misprediction and subsequent stalling may be much higher than in ARM architecture.
In addition, ARM requires fewer transistors per instruction, which is one of the reasons for its lower power requirements. These are just some of the methods ARM uses to achieve efficiency, but there are still many subtle differences between the two architectures that give ARM the advantage in efficiency. However, fewer transistors per instruction also means reduced complexity, which is precisely the shining point of x86 as a powerful architecture that can meet huge computational demands.
In order for x86 to be as efficient as ARM, Intel needs to do a lot of work. Firstly, from a power consumption perspective, the instruction set of x86 itself is quite 'expensive', as fetching, decoding, and executing instructions on x86 takes more cycles than on ARM. Combining simple instructions into a single micro-operation is also helpful, especially in reducing overhead.
In contrast, the RISC architecture of ARM is a huge advantage, especially since each instruction is designed to be faster and easier to execute in ARM. ARM also uses fixed-length instructions, making decoding simpler, and the lower-bit Thumb instructions can reduce code size, requiring less memory space. Smaller Thumb instructions mean less memory fetching during execution, and more instructions can be placed in the processor's cache.
In addition, ARM chips are usually part of larger system-level chips (SoC) rather than standalone CPUs interfacing with the main board and other parts of the computer. The direct connection of ARM CPUs to memory controllers, GPUs, and other key components of computer hardware can also improve efficiency. This is exactly how Apple's unified memory works, and is one of the reasons for its excellent battery life.
Intel's actual practice also draws heavily from Apple to a considerable extent. Let's take a look at Intel's Lunar Lake architecture.
Similar to last year's Meteor Lake architecture Core Ultra 100 series chips, Lunar Lake also packages multiple small chips together using Intel's Foveros technology. In Meteor Lake, Intel used Foveros technology to combine multiple silicon chips manufactured by different companies - Intel manufactured the computing unit where the main CPU core is located, while$Taiwan Semiconductor (TSM.US)$is responsible for the manufacturing of graphics, I/O, and other functional modules.
In Lunar Lake, Intel still uses Foveros technology, allowing different chips to communicate and connect the entire chip through a 'base unit' as an interposer layer. However, this time the CPU, GPU, and NPU are integrated in the same computational unit, while I/O and other functions are handled by the platform control unit (previously known as PCH, platform control hub in earlier Intel CPUs). There is also an 'Filling Unit' just to make the final product rectangular. Both the compute unit and platform control unit are manufactured by Taiwan Semiconductor.
Intel still divides its CPU cores into efficient E cores (Efficiency Core) and high-performance P cores (Performance Core), but the overall core count has decreased compared to the previous generation Core Ultra chips and the earlier 12th and 13th generation Core chips.
Lunar Lake has four E cores and four P cores, a configuration more common in Apple's M-series chips than in Intel's. For example, Meteor Lake's Core Ultra 7 155H includes six P cores and a total of ten E cores; Core i7-1255U has two P cores and eight E cores. Intel has also removed hyperthreading technology from the P cores, freeing up silicon space to enhance single-core performance.
Intel has also introduced a new GPU architecture for Lunar Lake, codenamed Battlemage, which will also power future desktop Arc discrete graphics cards. According to Intel, the integrated Arc 140V GPU is on average 31% faster in games than the old Meteor Lake Arc GPU, and 16% faster than AMD's latest Radeon 890M, with specific performance varying significantly depending on the game. The Arc 130V GPU, on the other hand, has one less Intel Xe core (7 instead of 8) and operates at a lower frequency.
The final part of the compute module is the Neural Processing Unit (NPU), which can handle some AI and machine learning tasks locally. Intel states that the Lunar Lake NPU's performance across different chip models ranges between 40 and 48 TOPS, meeting or exceeding the 40 TOPS requirement, and overall performance is about four times that of the Meteor Lake NPU (11.5 TOPS).$Microsoft (MSFT.US)$Of course, the most significant change in Lunar Lake is the integration of memory in the CPU package, a strategy that Apple and Qualcomm are also pursuing. The Lunar Lake chip comes with 16GB or 32GB of memory (based on the released models, those ending in 8 (e.g., Core Ultra 7 258V) have 32GB, while those ending in 6 have 16GB). This packaging not only saves motherboard space but also reduces power consumption due to shorter data transfer distances.
After a series of drastic reforms, Lunar Lake has ultimately achieved power efficiency comparable to ARM architecture under x86 architecture: according to previous tests by the media, the ASUS Zenbook with the 258V lasted about 16.5 hours in the PCMark modern office battery life test, while a similar configuration 155H Zenbook only lasted over 12 hours. This result puts it in close competition with the battery life of a MacBook equipped with M3, making it one of the best x86 architecture notebooks in recent years.
Intel has indeed accomplished it, breaking the power efficiency myth shaped by Apple and ARM with the Lunar Lake. After these bold reforms, Lunar Lake has achieved x86 architecture power efficiency on par with ARM architecture. In tests conducted on the ASUS Zenbook equipped with 258V, it reached about 16.5 hours in the PCMark modern office battery life test, while a similarly configured 155H Zenbook lasted just over 12 hours. This performance is very close to a MacBook equipped with M3 and stands as one of the best x86 architecture notebooks in terms of battery life in recent years.
Intel has truly delivered on its promise, using Lunar Lake as a hammer to shatter the power efficiency myth created by Apple and ARM.
Is this a make or break moment?
Unfortunately, after breaking the myth, intel chose to give up directly, declaring that future processors will not use packaging memory like Lunar Lake, and will return to traditional processor design.
Interestingly, six months before the official launch of Lunar Lake, intel decided that subsequent products like Arrow Lake, nova Lake, Raptor Lake, Twin Lake, Panther Lake, and Wildcat Lake would not adopt Lunar Lake's packaging method, meaning Lunar Lake was sentenced to death internally by intel.
Why does intel, despite the good reputation of Lunar Lake, not have a positive outlook on it?
Analyst Ming-Chi Kuo provided a viewpoint, stating that there were two reasons for the birth of Lunar Lake. Firstly, to compete with apple, after MacBooks started using self-developed chips, intel aimed to prove that x86 architecture could also achieve similar efficiency and battery life.
The second reason is in response to microsoft's Surface switching to ARM processors. Microsoft's new Surface series in 2Q24 fully adopted qualcomm processors with 45 TOPS computation power, prompting intel to launch competitive products.
He mentioned that while intel claimed Lunar Lake failed due to diluted gross margin from packaging memory, the actual reasons were the brand and contract manufacturers finding it unfavorable due to reduced part elasticity impacting profits and low procurement willingness. Additionally, intel's bargaining power with DRAM suppliers is much lower than apple's, needing to rely on taiwan semiconductor for fabrication, which is unfavorable for cost optimization. AI PC applications were also immature, making consumers unwilling to pay for Lunar Lake.
He pointed out that the failure of Lunar Lake shows that intel's challenges are not just outdated processes, but deeper issues lie in product planning capability. Another demonstration is AMD's continuous increase in server market share. Process technology may just be a surface issue, with organizational mechanisms leading to a series of incorrect product decisions possibly being intel's core problem.
Ming-Chi Kuo's views may not be entirely correct, but he did point out a core issue in intel's product line: confusion.
Let's not discuss Intel's server chips for now; the chips targeting the consumer market are already in chaos. Just imagine, if a consumer has been disappointed with the battery life of Intel processors over the past few years, but then pleasantly surprised to get a battery life comparable to an ARM MacBook on Lunar Lake, but when the next generation is released, this excellent battery performance disappears, what would consumers think?
On the other hand, Intel's 18A process is about to enter mass production, but it's not good news for its own processors. This process is currently only used by Intel itself, without any major customers confirmed. Each process upgrade feels like crossing a river by feeling the stones. In comparison, TSMC has enough customers to verify and improve its latest process, highlighting Intel's embarrassment further.
The deeper reason for Intel cutting Lunar Lake is not just about the impact on profits from memory. For the current CEO, Kirsinger, the most troublesome issue is how to balance the processor department and the contract manufacturers. If the processor department continues to choose TSMC for manufacturing in the future, the situation for the contract manufacturing department will become even more difficult. However, if the processor department does not have access to the most advanced process technology, not only the server market, but also the consumer market will be lost to competitors.
In the end, it's a matter of who will bear the hardship, obviously, no one wants to endure hard times in the near future.
But think about it, how much hardship has Apple endured for a chip the size of a fingernail? From Motorola 68K to PowerPC, and then to Intel, changing paths three times, even mobile chips once struggled and relied on others, following Samsung's lead, took years to adopt TSMC, and finally introduced the self-developed M1 chip. The struggles could be written into a tragic history book.
And now Intel has to endure this bit of hardship; what does it really amount to compared to the smooth sailing of the past decades?
Editor/Rocky