Source: Wall Street News On March 16, 2026, NVIDIA GTC 2026 officially opened, and NVIDIA founder and CEO Jensen Huang delivered a keynote speech. At this conferenceSource: Wall Street News On March 16, 2026, NVIDIA GTC 2026 officially opened, and NVIDIA founder and CEO Jensen Huang delivered a keynote speech. At this conference

Jensen Huang's GTC Speech (Full Text): The Era of Deductive Reasoning Has Arrived, and Lobsters Are the New Operating Systems

2026/03/17 09:15
33 min read
For feedback or concerns regarding this content, please contact us at [email protected]

Source: Wall Street News

On March 16, 2026, NVIDIA GTC 2026 officially opened, and NVIDIA founder and CEO Jensen Huang delivered a keynote speech.

Jensen Huang's GTC Speech (Full Text): The Era of Deductive Reasoning Has Arrived, and Lobsters Are the New Operating Systems

At this conference, considered an "annual pilgrimage for the AI ​​industry," Jensen Huang explained Nvidia's transformation from a "chip company" to an "AI infrastructure and manufacturing company." Addressing the market's most pressing concerns about performance sustainability and growth potential, Huang detailed the underlying business logic driving future growth—the "Token Factory Economics."

The earnings guidance is extremely optimistic, projecting at least $1 trillion in demand by 2027.

Over the past two years, global demand for AI computing has exploded exponentially. As large models evolve from "perception" and "generation" to "reasoning" and "action (task execution)," the consumption of computing power has soared. Regarding the order and revenue ceilings that the market is highly concerned about, Huang Renxun has given extremely strong expectations.

In his speech, Huang Renxun stated frankly:

Jensen Huang's trillion-dollar expectation once boosted Nvidia's stock price by more than 4.3%.

Moreover, he added to this figure:

Jensen Huang pointed out that Nvidia's systems have now proven themselves to be the world's "lowest-cost infrastructure." Because Nvidia can run AI models across virtually any field, this versatility allows the $1 trillion invested by customers to be fully utilized and maintain a long lifespan.

Currently, 60% of Nvidia's business comes from the top five hyperscale cloud service providers, while the other 40% is widely distributed across various fields such as sovereign cloud, enterprise, industry, robotics, and edge computing.

Token Factory Economics: Performance per Watt Determines the Lifeline of Business

To explain the rationale behind this trillion-dollar demand, Jensen Huang presented a completely new business mindset to CEOs of global companies. He pointed out that future data centers will no longer be warehouses for storing files, but rather "factories" for producing tokens (the basic units generated by AI).

Huang Renxun emphasized:

Jensen Huang divides future AI services into the following business tiers:

He pointed out that as models become larger and contexts become longer, AI will become smarter, but the token generation rate will decrease. Jensen Huang stated:

Jensen Huang emphasized that NVIDIA's architecture enables customers to achieve extremely high throughput in the free tier, while simultaneously boosting performance by an astonishing 35 times in the highest-value inference tier.

Vera Rubin achieves 350x speedup in two years; Groq fills the gap in ultra-fast reasoning.

Under the constraints of these physical limits, NVIDIA unveiled its most complex AI computing system ever, Vera Rubin. Jensen Huang stated:

Jensen Huang pointed out that through extreme end-to-end hardware and software co-design, Vera Rubin has created an amazing data leap in the same 1GW data center:

To address the bandwidth bottleneck under conditions of extremely high-speed inference (such as 1000 tokens/second), NVIDIA presented its final solution by integrating its acquired company, Groq: asymmetric decoupled inference. Jensen Huang explained:

Jensen Huang pointed out that NVIDIA, through its Dynamo software system, delegates the "pre-fill" stage, which requires massive computation and memory, to Vera Rubin, and the latency-sensitive "decoding" stage to Groq. Huang also offered suggestions on enterprise computing power configurations:

It has been revealed that the Groq LP30 chip, manufactured by Samsung, is already in mass production and is expected to ship in the third quarter, while the first Vera Rubin rack is already running on the Microsoft Azure cloud.

In addition, regarding optical interconnect technology, Jensen Huang showcased the world's first mass-produced co-packaged optical (CPO) switch, Spectrum X, thus calming the market debate over the "copper to fiber" approach.

Agents are ending traditional SaaS; "annual salary + token" has become the standard in Silicon Valley.

Besides hardware barriers, Huang devoted a lot of space to the revolution in AI software and ecosystem, especially the explosion of Agents.

He described the open-source project OpenClaw as "the most popular open-source project in human history," saying it surpassed Linux's achievements over the past 30 years in just a few weeks. Huang stated bluntly that OpenClaw is essentially an "operating system" for agent computers.

Huang Renxun asserted:

For ordinary working professionals, this transformation is also just around the corner. Jensen Huang outlines a new form of workplace for the future:

At the end of his speech, Jensen Huang also gave a sneak peek at the next-generation computing architecture, Feynman, which will be the first to achieve horizontal scaling of both copper wires and CPO. Even more intriguing is Nvidia's development of "Vera Rubin Space-1," a data center computer deployed in space, which completely opens up the possibilities for extending AI computing power beyond Earth.

The full text of Jensen Huang's GTC 2026 speech, translated below (with AI tool assistance):

Host: Welcome Jensen Huang, founder and CEO of Nvidia, to the stage.

Jensen Huang, Founder and CEO:

Welcome to GTC. I want to remind everyone that this is a technology conference. I'm very pleased to see so many people lining up to get in so early in the morning, and to see everyone here today.

At GTC, we will focus on three main themes: technology, platforms, and ecosystem. NVIDIA currently has three major platforms: the CUDA-X platform, the system platform, and our newly launched AI Factory platform.

Before we officially begin, I'd like to thank our pre-event hosts—Sarah Guo of Conviction, Alfred Lin of Sequoia Capital (Nvidia's first venture capitalist), and Gavin Baker, Nvidia's first major institutional investor. These three have profound insights into technology and wield significant influence throughout the technology ecosystem. Of course, I also want to thank all the distinguished guests I personally invited to attend today. Thank you to this all-star team.

I would also like to thank all the companies present today. NVIDIA is a platform company; we have technology, platforms, and a rich ecosystem. The companies present today represent almost all players in the $100 trillion industry, and we are deeply grateful to the 450 companies that sponsored this event.

This conference will feature 1,000 technical forums and 2,000 speakers, covering every layer of the "five-layer cake" architecture of artificial intelligence—from infrastructure such as land, electricity, and data centers, to chips, platforms, models, and the various applications that ultimately drive the entire industry forward.

CUDA: Twenty Years of Technological Accumulation

It all started here. This year marks the 20th anniversary of CUDA.

For two decades, we have been dedicated to developing this architecture. CUDA is a revolutionary invention—SIMT (Single Instruction, Multithreaded) technology allows developers to write programs in scalar code and extend them into multithreaded applications, with a programming difficulty far lower than the previous SIMD architecture. We recently added Tiles to help developers more easily program Tensor Cores and the various mathematical structures upon which today's artificial intelligence relies. Currently, CUDA has thousands of tools, compilers, frameworks, and libraries, hundreds of thousands of public projects in the open-source community, and is deeply integrated into every technology ecosystem.

This chart reveals 100% of NVIDIA's strategic logic, and I've been presenting this slide from the very beginning. The most difficult and core element to achieve is the "installed systems" at the bottom of the chart. Over the past two decades, we have accumulated hundreds of millions of GPUs and computing systems running CUDA globally.

Our GPUs cover all cloud platforms and serve almost all computer manufacturers and industries. CUDA's massive installed base is the fundamental reason why this flywheel continues to accelerate. Installed base attracts developers, developers create new algorithms and achieve breakthroughs, breakthroughs create new markets, new markets form new ecosystems and attract more companies to join, further expanding the installed base—this flywheel is continuously accelerating.

Downloads of NVIDIA libraries are growing at an astonishing rate, on a massive scale and at an ever-increasing pace. This flywheel enables our computing platform to support a massive number of applications and a constant stream of new breakthroughs.

More importantly, it also grants these infrastructures an extremely long lifespan. The reason is obvious: the applications that can run on NVIDIA CUDA are incredibly diverse, covering every stage of the AI ​​lifecycle, various data processing platforms, and a wide range of scientific solvers. Therefore, once an NVIDIA GPU is installed, its practical value is extremely high. This is also why the cloud price of our Ampere architecture GPUs, which we released six years ago, has actually increased.

The root cause of all this lies in our massive installed base, powerful flywheel architecture, and extensive developer ecosystem. When these factors work together, coupled with our continuous software updates, computing costs steadily decrease. Accelerated computing significantly improves application performance, and with our long-term maintenance and iteration of the software, users not only experience initial performance leaps but also enjoy ongoing reductions in computing costs. We are committed to providing long-term support for every GPU globally because they are architecturally compatible.

We're willing to do this because of the sheer scale of installations—each new optimization release benefits millions of users. This dynamic combination allows NVIDIA architectures to continuously expand their reach and accelerate their own growth while simultaneously reducing computing costs, ultimately stimulating new growth. CUDA is at the heart of it all.

From GeForce to CUDA: A 25-Year Evolution

Our journey with CUDA actually began 25 years ago.

GeForce—many of you grew up with GeForce. GeForce is NVIDIA's most successful marketing program. We started cultivating future customers when you couldn't afford our products—your parents became NVIDIA's earliest users, buying our products year after year, until one day you grew into excellent computer scientists and became true customers and developers.

This is the foundation laid by GeForce 25 years ago. Twenty-five years ago, we invented the programmable shader—an obvious yet profound invention that made accelerators programmable, and the world's first programmable accelerator, the pixel shader. Five years later, we created CUDA—one of the most important investments we've ever made. With limited resources, we staked the vast majority of our profits on extending CUDA from GeForce to every computer. We were so determined because we believed in its potential. Despite initial hardships, the company held onto this belief for 13 generations, a full two decades, and today CUDA is ubiquitous.

It was the pixel shader that drove the GeForce revolution. And about eight years ago, we introduced RTX—a complete architectural overhaul for the modern era of computer graphics. GeForce brought CUDA to the world, and it was because of this that many scholars, including Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton, and Andrew Ng, discovered that GPUs could be a powerful tool for accelerating deep learning, thus igniting the AI ​​explosion a decade ago.

Ten years ago, we decided to combine programmable shading with two entirely new concepts: hardware ray tracing, which was technically extremely challenging; and a forward-thinking idea at the time—we foresaw about a decade ago that AI would revolutionize computer graphics. Just as GeForce brought AI to the world, AI will now, in turn, reshape the way computer graphics are implemented.

Today, I want to show you the future. This is our next-generation graphics technology, which we call Neural Rendering—a deep fusion of 3D graphics and artificial intelligence. This is DLSS 5, please take a look.

Neural Rendering: The Fusion of Structured Data and Generative AI

Isn't this breathtaking? Computer graphics has come alive.

What did we do? We combined controllable 3D graphics (the real foundation of the virtual world) with its structured data, and then incorporated generative AI and probabilistic computation. One is completely deterministic, the other probabilistic yet highly realistic—we merged these two concepts, achieving precise control through structured data while generating content in real time. Ultimately, the content is both visually stunning and completely controllable.

The concept of integrating structured information with generative AI will continue to emerge in various industries. Structured data is the cornerstone of trustworthy AI.

Acceleration platform for structured and unstructured data

Now I'm going to show you a technical architecture diagram.

Structured data—familiar platforms like SQL, Spark, Pandas, Velox, as well as important platforms such as Snowflake, Databricks, Amazon EMR, Azure Fabric, and Google BigQuery—all process data frames. These data frames are like giant spreadsheets, carrying all the information of the business world and representing the ground truth of enterprise computing.

In the AI ​​era, we need to enable AI to use structured data and accelerate it to the extreme. In the past, accelerating the processing of structured data was to make enterprises operate more efficiently. In the future, AI will use these data structures at a speed far exceeding that of humans, and AI agents will make extensive use of structured databases.

Regarding unstructured data, vector databases, PDFs, videos, and audio constitute the vast majority of data formats worldwide—approximately 90% of the data generated each year is unstructured. In the past, this data was almost entirely unusable: we read it, stored it in file systems, and that was it. We couldn't query it, and it was difficult to retrieve data because unstructured data lacked simple indexing methods; we had to understand its meaning and context. Now, AI can do this—using multimodal perception and understanding technology, AI can read PDF documents, understand their meaning, and embed them into a larger, queryable structure.

Nvidia created two base libraries for this purpose:

  • cuDF: Used for accelerated processing of data frames and structured data.
  • cuVS: Used for vector storage, semantic data, and processing of unstructured AI data.

These two platforms will become one of the most important foundational platforms in the future.

Today, we are announcing partnerships with several companies. IBM—the inventor of the SQL language—will use cuDF to accelerate its WatsonX Data platform. Dell has partnered with us to build the Dell AI Data Platform, integrating cuDF and cuVS, and has achieved significant performance improvements in real-world projects at NTT Data. On the Google Cloud front, we are now accelerating not only Vertex AI but also BigQuery, and have partnered with Snapchat to reduce its computing costs by nearly 80%.

The benefits of accelerated computing are threefold: speed, scale, and cost. This aligns with the logic of Moore's Law—achieving leaps in performance through accelerated computing while continuously optimizing algorithms so that everyone can benefit from ever-decreasing computing costs.

NVIDIA has built an accelerated computing platform that brings together numerous libraries, including RTX, cuDF, and cuVS. These libraries are integrated into global cloud services and OEM systems, reaching users worldwide.

Deep cooperation with cloud service providers

Partnerships with major cloud service providers

Google Cloud: We accelerate Vertex AI and BigQuery, deeply integrate with JAX/XLA, and excel on PyTorch—NVIDIA is the only accelerator globally to excel on both PyTorch and JAX/XLA. We've brought customers like Base10, CrowdStrike, Puma, and Salesforce into the Google Cloud ecosystem.

AWS: We are accelerating EMR, SageMaker, and Bedrock, which are deeply integrated with AWS. What excites me most this year is that we will be bringing OpenAI to AWS, which will significantly drive AWS cloud computing consumption growth and help OpenAI expand its regional deployments and computing scale.

Microsoft Azure: The NVIDIA 100 PFLOPS supercomputer is the first supercomputer we built and the first deployed on Azure, laying an important foundation for our collaboration with OpenAI. We are accelerating Azure cloud services and AI Foundry, collaborating on Azure region expansion, and working closely together on Bing search. Notably, our **Confidential Computing** capabilities—ensuring that even carriers cannot view user data and models—make NVIDIA GPUs among the world's first to support confidential computing, enabling the secure deployment of OpenAI and Anthropic models in cloud environments across the globe. For example, with Synopsys, we are accelerating its entire EDA and CAD workflows and deploying them on Microsoft Azure.

Oracle: We were Oracle's first AI customer, and I'm proud to have been the first to explain the concept of AI cloud to Oracle. Since then, they've grown rapidly, and we've brought in many partners for them, including Cohere, Fireworks, and OpenAI.

CoreWeave: The world's first AI-native cloud, designed specifically for GPU hosting and AI cloud services, boasts an excellent customer base and strong growth momentum.

Palantir + Dell: The three parties have jointly created a brand-new AI platform based on Palantir's Ontology Platform and AI Platform, which can deploy AI in any country and in any air gap isolation environment in a completely localized manner—encompassing everything from data processing (vectorization or structuring) to the complete accelerated computing stack for AI.

NVIDIA has established this special partnership with global cloud service providers—we bring customers to the cloud, creating a mutually beneficial ecosystem.

Vertical integration, horizontal openness: Nvidia's core strategy

Nvidia is the world’s first vertically integrated and horizontally open company.

The necessity of this model is very simple: accelerated computing is not a chip issue, nor a system issue; its full description should be application acceleration. CPUs can make computers run faster overall, but this approach has reached its limits. In the future, only application- or domain-specific acceleration can continue to deliver performance leaps and cost reductions.

This is precisely why NVIDIA must delve deeply into one library after another, one domain after another, and one vertical industry after another. We are a vertically integrated computing company; there is no other way. We must understand the applications, understand the domains, deeply understand the algorithms, and be able to deploy them in any scenario—data center, cloud, on-premises, edge, and even robotic systems.

At the same time, NVIDIA maintains a horizontally open approach, willing to integrate its technology into any partner's platform, so that the world can enjoy the benefits of accelerated computing.

The attendee structure at this year's GTC perfectly illustrates this point. The financial services industry had the highest proportion of attendees – we hope to see developers, not traders. Our ecosystem spans both upstream and downstream supply chains. Whether a company is 50, 70, or 150 years old, last year was its best year ever. We are at the starting point of something very, very significant.

CUDA-X: An Accelerated Computing Engine for Various Industries

Nvidia has a deep presence in various vertical sectors:

  • Autonomous driving: Wide-ranging and far-reaching impact
  • Financial Services: Quantitative investing is shifting from manual feature engineering to supercomputer-driven deep learning, ushering in its "Transformer moment."
  • Healthcare: It's entering its own "ChatGPT moment," encompassing areas such as AI-assisted drug discovery, AI-powered diagnostics, and healthcare customer service.
  • Industry: The world's largest construction wave is underway, with AI factories, chip factories, and data center factories springing up everywhere.
  • Entertainment and Gaming: The real-time AI platform supports translation, live streaming, game interaction, and intelligent shopping agents.
  • Robotics: With over a decade of experience and a complete suite of three major computer architectures (training computer, simulation computer, and onboard computer), 110 robots were showcased at this exhibition.
  • Telecommunications: An industry worth approximately $2 trillion, base stations will evolve from single communication functions into AI infrastructure platforms. One such platform, Aerial, has deep collaborations with companies such as Nokia and T-Mobile.

At the heart of all these areas lies our CUDA-X library—the very foundation of NVIDIA as an algorithm company. These libraries are the company's most core assets, enabling the computing platform to deliver real-world value across various industries.

One of the most important libraries is cuDNN (CUDA Deep Neural Network Library), which revolutionized artificial intelligence and triggered a major explosion in modern AI.

(Playing CUDA-X demo video)

Everything you just saw was a simulation—including the physics-based solver, the AI-manipulated physics model, and the physical AI robot model. It was all simulation; there was no hand-drawn animation or joint rigging. This is precisely NVIDIA's core capability: unlocking these opportunities through a deep understanding of algorithms and the organic integration of computing platforms.

AI-native enterprises and the new computing era

You just saw industry giants that define today's society, such as Walmart, L'Oréal, JPMorgan Chase, Roche, and Toyota, as well as a large number of companies you've never heard of before—we call them AI-native companies. This list is extremely large, including OpenAI, Anthropic, and many emerging companies serving different vertical sectors.

The industry has experienced phenomenal growth over the past two years. Venture capital inflows into startups have reached a record $150 billion. More importantly, the size of a single investment has jumped from millions of dollars to hundreds of millions or even billions of dollars for the first time. There's only one reason: for the first time in history, every company of this kind requires massive amounts of computing resources and a huge number of tokens. The industry is creating, generating, or adding value to tokens from organizations like Anthropic and OpenAI.

Just as the PC revolution, the internet revolution, and the mobile cloud revolution each gave rise to a number of epoch-making companies, this generation of computing platform transformation will also give rise to a number of highly influential companies that will become an important force in the future world.

Three historic breakthroughs that drove all of this

What exactly happened in the past two years? Three major events.

First: ChatGPT, ushering in the era of generative AI (end of 2022 to 2023)

It can not only perceive and understand, but also generate unique content. I demonstrated the fusion of generative AI and computer graphics. Generative AI fundamentally changes the way we compute—from retrieval-based to generative—profoundly impacting computer architecture, deployment methods, and overall meaning.

Second: Reasoning AI, represented by O1.

Reasoning ability enables AI to self-reflect, plan, and break down problems—breaking down problems it cannot directly understand into manageable steps. This makes generative AI believable, capable of reasoning based on real-world information. To achieve this, the number of tokens used for input context and the number of output tokens used for reasoning are significantly increased, resulting in a substantial increase in computational complexity.

Third: Claude Code, the first intelligent agent model.

It can read files, write code, compile, test, evaluate, and iterate. Claude Code has revolutionized software engineering—100% of NVIDIA engineers use one or more of Claude Code, Codex, and Cursor; no software engineer is without the help of AI.

This is a completely new turning point—you are no longer asking AI "what, where, and how," but rather letting it "create, execute, and build," allowing it to proactively use tools, read files, break down problems, and take action. AI has evolved from perception to generation, to reasoning, and now it can truly get things done.

Over the past two years, the computational demands for inference have increased approximately 10,000 times, while usage has increased approximately 100 times. I've always believed that computational demands have increased a millionfold over the past two years—this is a shared feeling, a feeling shared by OpenAI, and a feeling shared by Anthropic. If more computing power can be acquired, more tokens can be generated, revenue will increase, and AI will become more intelligent. The inflection point for inference has arrived.

The era of trillion-dollar AI infrastructure

This time last year, I was here saying we had high confidence in Blackwell and Rubin's demand and order books, amounting to approximately $500 billion, through 2026. Today, a year after GTC, I stand here to tell you: looking ahead to 2027, I see the figure at least $1 trillion. And I'm convinced the actual computing demand will be far greater than that.

2025: Nvidia's Year of Deduction

2025 is NVIDIA's Year of Inference. We aim to ensure excellence at every stage of the AI ​​lifecycle, beyond training and post-training, enabling our invested infrastructure to continue operating efficiently with a longer effective lifespan and lower unit cost.

At the same time, Anthropic and Meta officially joined the NVIDIA platform, together representing one-third of the world's AI computing power demand. Open-source models are approaching cutting-edge levels and are ubiquitous.

NVIDIA is currently the only platform in the world capable of running all AI models across all AI fields—language, biology, computer graphics, computer vision, speech, protein and chemistry, robotics, etc.—whether at the edge or in the cloud, and regardless of the language. NVIDIA's architecture is versatile across all these scenarios, making us the lowest-cost and most reliable platform.

Currently, 60% of NVIDIA's business comes from the world's top five hyperscale cloud service providers, with the remaining 40% spread across various fields such as regional cloud, sovereign cloud, enterprise, industrial, robotics, and edge computing. The breadth of AI coverage itself is the source of its resilience—this is undoubtedly a completely new computing platform revolution.

Grace Blackwell and NVLink 72: Bold Architectural Innovation

While the Hopper architecture was still at its peak, we decided to completely re-architect the system, expanding NVLink from 8-way to NVLink 72, and comprehensively decomposing and reconstructing the computing system. Grace Blackwell NVLink 72 was a huge technological gamble, and it wasn't easy for any of our partners. We would like to express our sincere gratitude to everyone involved.

At the same time, we introduced NVFP4—not just a regular FP4, but a completely new type of tensor core and computation unit. We have demonstrated that NVFP4 can achieve inference without loss of precision, while delivering significant performance and energy efficiency improvements, and it is equally suitable for training. In addition, a series of new algorithms such as Dynamo and TensorRT-LLM have emerged, and we have even invested billions of dollars to build a supercomputer specifically for optimizing kernels, called DGX Cloud.

The results demonstrate our remarkable inference performance. Data from Semi Analysis—the most comprehensive AI inference performance benchmark to date—shows NVIDIA far ahead in both tokens per watt and cost per token. Moore's Law might have predicted a 1.5x performance improvement for the H200, but we achieved 35x. Dylan Patel of Semi Analysis even said, "Huang is being conservative; it's actually 50x." He's right.

I would like to quote him here: "Jensen sandbagged (Huang Renxun was being conservative in his reporting)."

Nvidia's cost per token is the lowest in the world, currently unmatched by any other company. The reason lies in its Extreme Co-design.

Taking Fireworks as an example, before NVIDIA updated its entire software and algorithms, its average token processing speed was about 700 tokens per second; after the update, it approached 5,000 tokens per second, an improvement of about 7 times. This is the power of extreme collaborative design.

AI Factory: From Data Center to Token Factory

Data centers used to be places to store files; now they are factories that produce tokens. Every cloud service provider and every AI company will use "token factory efficiency" as a core operating metric in the future.

This is my core argument:

  • Vertical axis: Throughput – Number of tokens generated per second at a fixed power level
  • Horizontal axis: Token Speed ​​– the response speed per inference step. The faster the speed, the larger the usable model, the longer the context, and the more intelligent the AI.

The token is a new commodity that will be tiered in pricing once it matures.

  • Free tier (high throughput, low speed)
  • Intermediate Tier (~$3 per million tokens)
  • Advanced Tier (~$6 per million tokens)
  • High-speed layer (~$45 per million tokens)
  • Ultra-high-speed layer (~$150 per million tokens)

Compared to Hopper, Grace Blackwell delivers 35x higher throughput at the highest value tier and introduces entirely new tiers. Using a simplified model, allocating 25% of its power across the four tiers, Grace Blackwell could generate 5x more revenue than Hopper.

Vera Rubin: Next-Generation AI Computing Systems

(Playing a video introducing the Vera Rubin system)

Vera Rubin is a complete, end-to-end optimized system designed specifically for agentic workloads:

  • Large-scale language model computation core: NVLink 72 GPU cluster, handling prefill and key-value cache.
  • The all-new Vera CPU: Designed for extremely high single-threaded performance, it uses LPDDR5 memory and boasts excellent energy efficiency. It is the world's only data center CPU using LPDDR5, making it suitable for AI agent tools.
  • Storage System: BlueField 4 + CX 9, a brand-new storage platform for the AI ​​era, with 100% global participation from the storage industry.
  • CPO Spectrum X Switch: The world's first co-packaged optical Ethernet switch, now in full-scale mass production.
  • Kyber Rack: A brand-new rack system that supports 144 GPUs forming a single NVLink domain, with front-end computing and back-end NVLink switching, creating a supercomputer.
  • Rubin Ultra: A next-generation supercomputing node with a vertically integrated design, compatible with Kyber racks, and supporting larger-scale NVLink interconnects.

Vera Rubin is now 100% liquid-cooled, reducing installation time from two days to two hours. It uses 45°C hot water cooling, significantly reducing the cooling burden on data centers. I am very excited that Satya Nadella has confirmed that the first Vera Rubin rack is now running on Microsoft Azure.

Groq Integration: The Ultimate Extension of Inference Performance

We acquired the Groq team and obtained a technology license. Groq is a deterministic dataflow processor that uses static compilation and compiler scheduling, has a large amount of SRAM, is optimized for single inference workloads, and features extremely low latency and extremely high token generation speed.

However, Groq's limited memory capacity (500MB on-chip SRAM) makes it difficult to independently handle the parameters and KV cache of large models, thus limiting its large-scale application.

The solution is Dynamo—an inference scheduling software. We use Dynamo to disaggregate the inference pipeline:

  • The prefilling and attention mechanism decoding are performed on Vera Rubin (requiring significant computing power and KV cache storage).
  • **Feed-Forward Network Decoding**, i.e., the token generation part, is completed on Groq (requiring extremely high bandwidth and low latency).

The two are tightly coupled via Ethernet, and a special mode reduces latency by about half. Under the unified scheduling of Dynamo, the "AI factory operating system," overall performance is improved by 35 times, and a new level of inference performance previously unreachable by NVLink 72 is opened up.

Groq and Vera Rubin combination recommendations:

  • If the workload is primarily high throughput, use 100% Vera Rubin.
  • If a large portion of the workload involves generating high-value tokens such as code, Groq can be introduced, with a recommended ratio of approximately 25% Groq + 75% Vera Rubin.

The Groq LP30 is manufactured by Samsung and is currently in mass production, with shipments expected to begin in Q3. Thank you to Samsung for their full cooperation.

A historic leap in reasoning performance

Quantifying previous technological advancements: Within two years, the token generation rate of a 1-gigawatt AI factory will increase from 22 million tokens/second to 700 million tokens/second, a 350-fold increase. This is the power of ultimate collaborative design.

Technology Roadmap

  • Blackwell: Currently in production, Oberon standard rack system, copper cabling extended to NVLink 72, optional optical extension to NVLink 576.
  • Vera Rubin (current): Kyber rack, NVLink 144 (copper cable); Oberon rack, NVLink 72 + optical, extended to NVLink 576; Spectrum 6, the world's first CPO switch.
  • Vera Rubin Ultra (coming soon): The next-generation Rubin Ultra GPU, LP35 chip (first time integrating NVFP4), further boosting performance several times over.
  • Feynman (next-generation): A brand-new GPU, the LP40 chip (jointly developed by NVIDIA and the Groq team, integrating NVFP4); a brand-new CPU—Rosa (Rosalyn); BlueField 5; CX 10; and Kyber racks supporting both copper cabling and CPO expansion.

The roadmap is clear: three routes—copper cable expansion, optical expansion (Scale-Up), and optical expansion (Scale-Out)—are being pursued in parallel. We need all our partners to continuously expand production capacity in copper cables, optical fibers, and CPO.

NVIDIA DSX: A Digital Twin Platform for the AI ​​Factory

AI factories are becoming increasingly complex, but the various technology suppliers that make up them have never collaborated with each other during the design phase, only "meeting" in the data center—which is clearly not enough.

To this end, we created Omniverse, and the NVIDIA DSX platform built upon it—a platform for all partners to co-design and operate gigawatt-scale AI factories in a virtual world. DSX provides:

  • Rack-level mechanical, thermal, electrical, and network simulation systems
  • Connection with the power grid enables coordinated energy-saving dispatching.
  • Dynamic power consumption and cooling optimization based on Max-Q in data centers

Conservative estimates suggest this system could improve energy efficiency by about 2 times, a very significant benefit at the scale we're discussing. Omniverse, starting with Digital Earth, will support digital twins of all sizes; we are working with global partners to build the largest computer in human history.

Furthermore, Nvidia is venturing into space. Thor chips have received radiation certification and are operating in satellites. We are working with partners to develop Vera Rubin Space-1 for building space data centers. Thermal management is a key challenge in space, where heat dissipation relies solely on radiation, and we are assembling top engineers to tackle this challenge.

OpenClaw: The Operating System for the Era of Intelligent Agents

Peter Steinberger developed a software called OpenClaw. This is the most popular open-source project in human history, surpassing Linux's thirty-year achievements in just a few weeks.

OpenClaw is essentially an agentic system capable of:

  • Managing resources, accessing tools, file systems, and large language models.
  • Execute scheduling and timed tasks
  • Break down the problem step by step and call the sub-agents.
  • Supports input and output in any modality (voice, video, text, email, etc.).

Using operating system syntax, it is indeed an operating system—an operating system for intelligent agent computers. Windows made personal computers possible; OpenClaw makes personal intelligent agents possible.

Every company needs to develop its own OpenClaw strategy, just as we all need Linux strategies, HTML strategies, and Kubernetes strategies.

A complete reshaping of enterprise IT

Before OpenClaw, enterprise IT consisted of data and files entering systems, flowing through tools and workflows, and ultimately becoming tools for human use. Software companies created the tools, while system integrators (GSIs) and consulting firms helped enterprises use them.

Enterprise IT after OpenClaw: Every SaaS company will transform into an AaaS (Agentic as a Service) company—not just providing tools, but providing AI agents that specialize in specific domains.

However, there is a key challenge here: intelligent agents within an enterprise can access sensitive data, execute code, and communicate with external entities. This must be strictly controlled in an enterprise environment.

To this end, we partnered with Peter to integrate security into the enterprise version, resulting in:

  • NeMo Claw (Reference Design): An enterprise-grade reference framework based on OpenClaw, integrating NVIDIA's complete suite of intelligent agent AI toolkits.
  • Open Shield (Security Layer): Integrated into OpenClaw, providing a policy engine, network barriers, and privacy routing to ensure enterprise data security.
  • NeMo Cloud: Downloadable and usable, and compatible with the strategy engines of all SaaS companies.

This is a renaissance for enterprise IT, an industry that was originally worth $2 trillion and is about to grow to a multi-trillion dollar scale, shifting from providing tools to providing specialized AI agent services.

I can fully foresee that in the future, every engineer in the company will have an annual token budget. Their annual salary may be hundreds of thousands of dollars, and I will give them an additional token allocation equivalent to half of their salary, amplifying their output tenfold. "How many tokens are included with joining the company" has become a new recruitment topic in Silicon Valley.

In the future, every enterprise will be both a user of tokens (for engineers) and a producer of tokens (to provide services to its customers). The significance of OpenClaw cannot be underestimated; it is as important as HTML and Linux.

NVIDIA Open Model Initiative

Regarding custom claws, we offer NVIDIA's cutting-edge self-developed model:

Modeling domains include Nemotron (large-scale language model), Cosmos (world foundation model), GROOT (general-purpose humanoid robot model), Alpamayo (autonomous driving), BioNeMo (digital biology), Physics-AI (physics).

We are at the forefront of technology in every field and are committed to continuous iteration—Nemotron 3 was followed by Nemotron 4, Cosmos 1 by Cosmos 2, and Groq will also be iterated to its second generation.

Nemotron 3 ranks among the top three best models globally in OpenClaw, placing it at the forefront of the field. Nemotron 3 Ultra will become the most powerful foundational model ever, supporting countries in building sovereign AI.

Today, we are announcing the formation of the Nemotron Consortium, investing billions of dollars to advance the development of foundational AI models. Consortium members include BlackForest Labs, Cursor, LangChain, Mistral, Perplexity, Reflection, Sarvam (India), and Thinking Machines (Mira Murati's lab). Enterprise software companies are joining in, integrating the NeMo Claw reference design and NVIDIA's AI Agent Toolkit into their own products.

Physics AI and Robotics

Digital intelligent agents act in the digital world—writing code and analyzing data; while physical AI is an embodied intelligent agent, that is, a robot.

A total of 110 robots were showcased at this year's GTC, encompassing almost all robotics R&D companies worldwide. NVIDIA provided three computers (training computer, simulation computer, and onboard computer) and a complete software stack and AI models.

In the realm of autonomous driving, the "ChatGPT moment" for autonomous driving has arrived. Today, we announce four new partners joining the NVIDIA RoboTaxi Ready platform: BYD, Hyundai, Nissan, and Geely, with a combined annual production capacity of 18 million vehicles. This further strengthens the lineup, joining Mercedes-Benz, Toyota, and GM previously. We also announce a significant partnership with Uber to deploy and integrate RoboTaxi Ready vehicles in multiple cities.

In the field of industrial robots, many robot companies such as ABB, Universal Robots, and KUKA have partnered with us to combine physical AI models with simulation systems, promoting the deployment of robots on manufacturing lines worldwide.

In the telecommunications sector, Caterpillar and T-Mobile are also included. In the future, wireless base stations will no longer be just communication nodes, but rather intelligent edge computing platforms like NVIDIA Aerial AI RAN—capable of sensing traffic in real time, adjusting beamforming, and achieving energy savings and efficiency.

Special Feature: Olaf Robot Makes its Debut

(Plays a demonstration video of the Disney Olaf robot)

Jensen Huang: The snowman has arrived! Newton is working perfectly! Omniverse is also working perfectly! Olaf, how are you?

Olaf: I'm so happy to see you.

Jensen Huang: Yes, because I gave you the computer—Jetson!

Olaf: What is that?

Huang Renxun: It's right inside your belly.

Olaf: That's amazing.

Jensen Huang: You learned to walk in Omniverse.

Olaf: I like walking. It's much better than riding a reindeer and looking up at the beautiful sky.

Jensen Huang: This is precisely because of the physics simulation—based on the Newton solver running on NVIDIA Warp, which we developed in collaboration with Disney and DeepMind, allowing you to adapt to the real physical world.

Olaf: That's exactly what I was going to say.

Jensen Huang: That's where your intelligence lies. I am a snowman, not a snowball.

Jensen Huang: Can you imagine? A Disneyland of the future—all these robot characters roaming freely throughout the park. But to be honest, I thought you'd be taller. I've never seen such a short snowman.

Olaf: (Non-comment)

Jensen Huang: Can you help me finish my speech today?

Olaf: That's awesome!

Summary of Keynote Speech

Jensen Huang: Today, we discussed the following core themes together:

  1. The arrival of the inflection point: Inference has become the core workload of AI, tokens are the new commodity, and inference performance directly determines revenue.
  2. The AI ​​Factory Era: Data centers have evolved from file storage facilities into token production factories. In the future, every company will measure its competitiveness by "AI factory efficiency."
  3. OpenClaw's Intelligent Agent Revolution: OpenClaw has ushered in the era of intelligent agent computing. Enterprise IT is transitioning from the tool era to the intelligent agent era, and every enterprise needs to develop an OpenClaw strategy.
  4. Physical AI and Robotics: Embodied intelligence is being deployed on a large scale, with autonomous driving, industrial robots, and humanoid robots together constituting the next major opportunity for physical AI.

Thank you everyone, have a great time at GTC!

Market Opportunity
Cloud Logo
Cloud Price(CLOUD)
$0.04035
$0.04035$0.04035
+5.51%
USD
Cloud (CLOUD) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.