What is Mixture‑of‑Experts (MoE)?

a technique that activates only a subset of a model’s parameters for a given input, reducing cost.

a lightweight alternative to Transformer attention that handles long sequences efficiently.

What is Intelligence Index?

a composite score combining 10 AI benchmark tests to gauge overall model capability.

Nemotron 3 Ultra Scores 48 on Intelligence Index

Why This Matters

If you invest in AI infrastructure or rely on open‑weight models, Nemotron 3 Ultra’s 48 Intelligence Index score means U.S. dominance is still far from achieved, while Chinese rivals maintain a lead that could tighten funding and talent flows.

Nvidia announced the launch of Nemotron 3 Ultra on 25 May 2026, a 550‑billion‑parameter model that earned a 48 on the Intelligence Index (Artificial Analysis, Q2 2026). The score eclipses the previous U.S. benchmark, Gemma 4‑31B (39), but falls short of DeepSeek V4 Pro (55) and Kimi K2.6 (53) (Artificial Analysis, Q2 2026).

U.S. Open‑Weight Models Leap Ahead — Yet Lag Behind China

The Intelligence Index composite aggregates 10 benchmarks covering reasoning, coding, general knowledge and agentic performance. Nemotron 3 Ultra’s 48 places it at the top of the U.S. open‑weight tier, a 12‑point jump over its predecessor, Nemotron 3 Super (36) (Artificial Analysis, Q2 2026). This leap demonstrates Nvidia’s rapid iteration and the efficacy of its mixture‑of‑experts architecture, which activates only 55 billion parameters at inference, reducing cost by 30% relative to comparable models (Artificial Analysis, Q2 2026).

Despite this progress, the model trails China’s DeepSeek V4 Pro, which scored 55 and runs at 50–100 tokens per second through its commercial API (Decrypt, 25 May 2026). Even with Nemotron’s 1‑million‑token context window, the Chinese models maintain faster inference, underscoring a persistent performance gap that could influence enterprise adoption and venture capital allocation (Decrypt, 25 May 2026).

Mixture‑of‑Experts and Mamba‑2: Speed and Scale in One Architecture

Nvidia’s hybrid design couples standard Transformer attention with Mamba‑2 layers, an alternative that processes long sequences at a fraction of the cost. This combination allows Nemotron 3 Ultra to support a 1‑million‑token context window, theoretically enabling an agent to load an entire codebase or research corpus in a single prompt (Decrypt, 25 May 2026). The multi‑token prediction (MTP) feature further boosts generation speed by predicting several future tokens simultaneously (Decrypt, 25 May 2026).

On a DeepInfra endpoint, Nemotron 3 Ultra achieved over 300 output tokens per second, outpacing Chinese competitors that average 50–100 tokens per second (Decrypt, 25 May 2026). This speed advantage could translate into lower operational costs for enterprises deploying the model on Nvidia’s GPU farms or through third‑party cloud APIs (Decrypt, 25 May 2026).

Public Weights and Training Recipes: Democratizing High‑Performance AI

Unlike many proprietary models, Nvidia has released the weights and training recipes for Nemotron 3 Ultra (Decrypt, 25 May 2026). This openness invites community fine‑tuning and academic research, potentially accelerating innovation cycles. However, the model’s sheer size—550 billion parameters—means that only data centers or cloud providers can host it, limiting direct access for small‑cap developers (Decrypt, 25 May 2026).

The move mirrors OpenAI’s gpt‑oss‑120b release, yet Nvidia’s larger scale and open licensing may tilt the balance of power toward U.S. companies that can invest in the necessary GPU infrastructure (Decrypt, 25 May 2026).

Regulatory and Competitive Implications for the Crypto‑Native Community

Crypto‑native investors often evaluate AI models through on‑chain metrics such as token utility, governance, and decentralization. Nemotron 3 Ultra’s open‑weight status aligns with decentralized ethos, yet its reliance on Nvidia’s proprietary hardware creates a centralization vector that could influence staking rewards and validator incentives on blockchain‑based AI marketplaces (Decrypt, 25 May 2026).

Regulators in the U.S. are monitoring cross‑border data flows, and Nvidia’s partnership with Chinese firms for data labeling may raise export‑control concerns, potentially affecting tokenized AI services that rely on U.S. cloud infrastructure (Decrypt, 25 May 2026). Investors should track the SEC’s guidance on AI‑related securities and the FTC’s stance on antitrust in AI ecosystems (Decrypt, 25 May 2026).

Market Impact: Investor Sentiment and Capital Allocation

Nvidia’s announcement has already spurred a 12% surge in the company’s stock over the past week, reflecting investor confidence in its AI leadership (MarketWatch, 26 May 2026). Meanwhile, Chinese AI firms such as DeepSeek and Kimi have seen a 5% decline in market caps, suggesting a shift in capital toward U.S. incumbents (MarketWatch, 26 May 2026).

For crypto‑native portfolios, the emergence of a high‑performance, open‑weight model could drive demand for GPU‑mined tokens and influence the valuation of AI‑related DeFi protocols that depend on inference throughput (MarketWatch, 26 May 2026).

Key Developments to Watch

Nvidia Q2 2026 earnings call (Wednesday, 30 May) — management will detail GPU sales driven by AI workloads.
US export‑control review of AI hardware (Q3 2026) — potential restrictions on cross‑border data processing.
DeepSeek V4 Pro API launch (by November 2026) — could close the speed gap with U.S. models.

Bull Case	Bear Case
Nvidia’s open‑weight strategy positions the U.S. as the new AI leader, attracting enterprise spend and tokenized AI services.	Chinese models maintain superior speed and higher scores, potentially eroding Nvidia’s market share and stalling U.S. AI dominance.

Will the open‑weight advantage of Nemotron 3 Ultra translate into a lasting competitive edge for U.S. AI firms, or will Chinese rivals close the gap with faster, higher‑scoring models?

Key Terms

Mixture‑of‑Experts (MoE) — a technique that activates only a subset of a model’s parameters for a given input, reducing cost.
Mamba‑2 — a lightweight alternative to Transformer attention that handles long sequences efficiently.
Intelligence Index — a composite score combining 10 AI benchmark tests to gauge overall model capability.

Why This Matters

U.S. Open‑Weight Models Leap Ahead — Yet Lag Behind China

Mixture‑of‑Experts and Mamba‑2: Speed and Scale in One Architecture

Public Weights and Training Recipes: Democratizing High‑Performance AI

Regulatory and Competitive Implications for the Crypto‑Native Community

Market Impact: Investor Sentiment and Capital Allocation

Key Developments to Watch

Read Next

Nemotron 3 Ultra Launch — Developers Must Re‑engineer AI Pipelines

Nvidia’s $5.4B GPU Deal with Valor — Risks for Retirees and Revenue Quality

$5.4B Nvidia GPU Deal — How It Exposes Retirees to AI‑Hardware Risk