Home Blog Page 96

NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing

0


Training a family of large language models (LLMs) has always come with a painful multiplier: every model variant in the family—whether 8B, 30B, or 70B—typically requires its own full training run, its own storage, and its own deployment stack. For a dev team running inference at scale, this means multiplying compute costs by the number of model sizes they want to support. NVIDIA researchers are now proposing a different approach called Star Elastic.

Star Elastic is a post-training method that embeds multiple nested submodels—at different parameter budgets—inside a single parent reasoning model, using a single training run. Applied to Nemotron Nano v3 (a hybrid Mamba–Transformer–MoE model with 30B total parameters and 3.6B active parameters), Star Elastic produces 23B (2.8B active) and 12B (2.0B active) nested variants trained with approximately 160B tokens. All three variants live in one checkpoint and can be extracted without any additional fine-tuning.

paper pdf

What does “Nested” Actually Mean here

If you haven’t encountered elastic or nested architectures before, the idea is this: instead of training three separate 30B, 23B, and 12B models, you train one model that contains the smaller ones as subsets of itself. The smaller submodels reuse the most important weights from the parent, identified through a process called importance estimation.

Star Elastic scores each model component: embedding channels, attention heads, Mamba SSM heads, MoE experts, and FFN channels by how much they contribute to model accuracy. Components are then ranked and sorted, so smaller-budget submodels always use the highest-ranked contiguous subset of components from the larger model. This property is called nested weight-sharing.

The method supports nesting along multiple axes: the SSM (State Space Model) dimension, embedding channels, attention heads, Mamba heads and head channels, MoE expert count, and FFN intermediate dimension. For MoE layers specifically, Star Elastic uses Router-Weighted Expert Activation Pruning (REAP), which ranks experts by both routing gate values and expert output magnitudes—a more principled signal than naive frequency-based pruning, which ignores how much each expert actually contributes to the layer output.

A Learnable Router, Not a Fixed Compression Recipe

A key distinction from prior compression methods like Minitron is that Star Elastic uses an end-to-end trainable router to determine the nested submodel architectures. The router takes a target budget (e.g., “give me a 2.8B active parameter model”) as a one-hot input and outputs differentiable masks that select which components are active at that budget level. These masks are trained jointly with the model through Gumbel-Softmax, which allows gradient flow through discrete architectural decisions.

The loss function combines knowledge distillation (KD) where the non-elastified parent model acts as the teacher with a router loss that penalizes deviation from the target resource budget (parameter count, memory, or latency). This means the router learns to make architecture choices that actually improve accuracy under KD, rather than just minimizing a proxy metric.

Training uses a two-stage curriculum: a short-context phase (sequence length 8,192 tokens) with uniform budget sampling, followed by an extended-context phase (sequence length 49,152 tokens) with non-uniform sampling that prioritizes the full 30B model (p(30B)=0.5, p(23B)=0.3, p(12B)=0.2). The extended context phase is critical for reasoning performance. The research team’s ablations on Nano v2—explicitly reproduced as the empirical basis for the same curriculum choice on Nano v3 show gains of up to 19.8% on AIME-2025 for the 6B variant and 4.0 percentage points for the 12B variant from Stage 2 alone, motivating its use here.

paper pdf

Elastic Budget Control: Different Models for Different Reasoning Phases

Existing budget control in reasoning models including Nemotron Nano v3’s own default behavior works by capping the number of tokens generated during a <think> phase before forcing a final answer. This approach uses the same model throughout. Star Elastic unlocks a different strategy: using different nested submodels for the thinking phase versus the answering phase.

The researchers evaluated four configurations. The optimal one, called ℳS → ℳL (small model for thinking, large model for answering), allocates a cheaper model to generate extended reasoning traces and reserves the full-capacity model for synthesizing the final answer. The 23B → 30B configuration in particular advances the accuracy–latency Pareto frontier, achieving up to 16% higher accuracy and 1.9× lower latency compared to default Nemotron Nano v3 budget control. The intuition: reasoning tokens are high-volume but tolerant of some capacity reduction; the final answer requires higher precision.

Quantization Without Breaking the Nested Structure

A naive approach to deploying a quantized elastic model would be to quantize each variant separately after slicing. That breaks the nested weight-sharing property and requires a separate quantization pass per size. Instead, Star Elastic applies Quantization-Aware Distillation (QAD) directly on the elastic checkpoint, preserving the nested mask hierarchy throughout.

For FP8 (E4M3 format), post-training quantization (PTQ) is sufficient, recovering 98.69% of BF16 accuracy on the 30B variant. For NVFP4 (NVIDIA’s 4-bit floating-point format), PTQ alone causes a 4.12% average accuracy drop, so a short nested QAD phase (~5B tokens at 48K context) brings recovery back to 97.79% for the 30B variant. In both cases, zero-shot slicing of the 23B and 12B variants from the single quantized checkpoint is preserved.

The memory implications are significant. Storing separate 12B, 23B, and 30B BF16 checkpoints requires 126.1 GB; the single elastic checkpoint requires 58.9 GB. The 30B NVFP4 elastic checkpoint fits in 18.7 GB, enabling the 12B NVFP4 variant to run on an RTX 5080 where every BF16 configuration runs out of memory. On an RTX Pro 6000, the 12B NVFP4 variant reaches 7,426 tokens/s, a 3.4× throughput improvement over the 30B BF16 baseline.

Depth vs. Width: Why Star Elastic Compresses Width

One design choice worth calling out explicitly: the research team compared two compression strategies—removing layers entirely (depth compression) versus reducing internal dimensions like hidden size, expert count, and head count (width compression). With a 15% parameter reduction and 25B tokens of knowledge distillation, width compression recovered 98.1% of baseline performance while depth compression recovered only 95.2%, with noticeable degradation on HumanEval and MMLU-Pro. As a result, Star Elastic prioritizes width-based elasticity for its main results, though depth compression (layer skipping) remains available as a mechanism for extreme latency-constrained scenarios.

paper pdf

On the evaluation suite—AIME-2025, GPQA, LiveCodeBench v5, MMLU-Pro, IFBench, and Tau Bench—the Elastic-30B variant matches its parent Nemotron Nano v3 30B on most benchmarks, while the Elastic-23B and Elastic-12B variants remain competitive against independently trained models of similar sizes. The Elastic-23B notably scores 85.63 on AIME-2025 versus Qwen3-30B-A3B’s 80.00, despite having fewer active parameters.

On training cost, the research team reports a 360× token reduction compared to pretraining each variant from scratch, and a 7× reduction over prior state-of-the-art compression methods that require sequential distillation runs per model size. The 12B variant runs at 2.4× the throughput of the 30B parent on an H100 GPU at bfloat16 with the same input/output sequence lengths.

How to Use NVIDIA Star Elastic

/* ALL rules scoped to #seg-guide — zero leakage to parent page */
#seg-guide *{box-sizing:border-box}
#seg-guide{background:#111;color:#e2e2e2;font-family:’JetBrains Mono’,monospace;font-size:14px;line-height:1.6;border-radius:12px;overflow:hidden}
#seg-guide .wrap{max-width:780px;margin:0 auto;padding:2rem 1.5rem 1.5rem}

/* ── Header ── */
#seg-guide .hdr{text-align:center;margin-bottom:2rem;border-bottom:1px solid #242424;padding-bottom:1.5rem}
#seg-guide .hdr-badge{display:inline-flex;align-items:center;gap:6px;background:rgba(118,185,0,0.1);border:1px solid rgba(118,185,0,0.35);border-radius:99px;padding:4px 12px;font-size:11px;color:#76B900;letter-spacing:.08em;text-transform:uppercase;margin-bottom:1rem}
#seg-guide .hdr-badge::before{content:”;width:6px;height:6px;border-radius:50%;background:#76B900;box-shadow:0 0 6px #76B900;animation:seg-blink 1.8s step-end infinite}
@keyframes seg-blink{0%,100%{opacity:1}50%{opacity:0}}
#seg-guide .hdr h1{font-family:’Fraunces’,serif;font-size:1.55rem;font-weight:600;color:#fff;line-height:1.25;margin-bottom:.5rem}
#seg-guide .hdr p{color:#bbb;font-size:12px;font-family:’JetBrains Mono’,monospace;margin:0}

/* ── Step Indicator ── */
#seg-guide .stepper{display:flex;align-items:center;justify-content:center;margin-bottom:2.5rem;gap:0;user-select:none}
#seg-guide .step-node{position:relative;display:flex;flex-direction:column;align-items:center;gap:4px;cursor:pointer;z-index:1}
#seg-guide .step-circle{width:30px;height:30px;border-radius:50%;border:1.5px solid #3a3a3a;background:#1a1a1a;display:flex;align-items:center;justify-content:center;font-size:11px;color:#aaa;transition:all .3s ease;font-weight:500}
#seg-guide .step-node.active .step-circle{background:#76B900;border-color:#76B900;color:#000;box-shadow:0 0 14px rgba(118,185,0,0.35)}
#seg-guide .step-node.done .step-circle{background:rgba(118,185,0,0.15);border-color:rgba(118,185,0,0.6);color:#76B900}
#seg-guide .step-label{font-size:9px;color:#888;letter-spacing:.06em;text-transform:uppercase;transition:color .3s;white-space:nowrap;position:absolute;top:36px}
#seg-guide .step-node.active .step-label{color:#76B900}
#seg-guide .step-node.done .step-label{color:#76B900}
#seg-guide .step-line{flex:1;height:1px;background:#2e2e2e;position:relative;min-width:24px;max-width:80px}
#seg-guide .step-line::after{content:”;position:absolute;left:0;top:0;height:100%;background:#76B900;width:0;transition:width .4s ease}
#seg-guide .step-line.done::after{width:100%}

/* ── Slide Area ── */
#seg-guide .slides{position:relative;overflow:hidden;min-height:360px}
#seg-guide .slide{display:none;opacity:0;transform:translateX(30px);transition:opacity .3s ease,transform .3s ease;position:absolute;width:100%;top:0;left:0}
#seg-guide .slide.active{display:block;opacity:1;transform:translateX(0);position:relative}
#seg-guide .slide.exit-left{opacity:0;transform:translateX(-30px)}

#seg-guide .slide-header{display:flex;align-items:flex-start;gap:12px;margin-bottom:1rem}
#seg-guide .slide-num{font-family:’Fraunces’,serif;font-size:2.8rem;font-weight:300;color:rgba(118,185,0,0.25);line-height:1;min-width:52px;text-align:right}
#seg-guide .slide-tag{font-size:10px;letter-spacing:.1em;text-transform:uppercase;color:#76B900;margin-bottom:3px}
#seg-guide .slide-title{font-family:’Fraunces’,serif;font-size:1.25rem;font-weight:500;color:#fff;line-height:1.3;margin:0}
#seg-guide .slide-desc{color:#ccc;font-size:13px;line-height:1.75;margin-bottom:1rem;border-left:2px solid #333;padding-left:14px}
#seg-guide .slide-desc strong{color:#fff;font-weight:500}
#seg-guide .slide-desc code{background:#1e1e1e !important;color:#76B900 !important;padding:1px 5px !important;border-radius:3px;font-size:11px !important;border:none !important;box-shadow:none !important}

/* ── Code Block ── */
#seg-guide .code-wrap{position:relative;border-radius:8px;overflow:hidden;border:1px solid #2a2a2a;margin-bottom:0}
#seg-guide .code-bar{display:flex;align-items:center;justify-content:space-between;background:#161616 !important;padding:8px 12px;border-bottom:1px solid #2a2a2a}
#seg-guide .code-lang{font-size:10px;color:#999;letter-spacing:.08em;text-transform:uppercase;display:flex;align-items:center;gap:6px}
#seg-guide .code-lang::before{content:”;width:8px;height:8px;border-radius:50%;background:#76B900;opacity:.8}
#seg-guide .copy-btn{font-size:10px;color:#aaa;background:none !important;border:1px solid #333;border-radius:4px;padding:3px 8px;cursor:pointer;font-family:’JetBrains Mono’,monospace;transition:all .2s}
#seg-guide .copy-btn:hover{color:#76B900;border-color:#76B900}
#seg-guide .copy-btn.copied{color:#76B900;border-color:#76B900}

/* Force dark background — overrides WordPress theme styles on pre/code */
#seg-guide pre,
#seg-guide pre.hljs{
margin:0 !important;
padding:14px 16px !important;
font-size:12px !important;
line-height:1.65 !important;
background:#141414 !important;
border-radius:0 !important;
border:none !important;
box-shadow:none !important;
overflow-x:auto !important;
color:#c8e8a0 !important;
font-family:’JetBrains Mono’,monospace !important;
}
#seg-guide code,
#seg-guide code.hljs{
background:transparent !important;
padding:0 !important;
font-family:’JetBrains Mono’,monospace !important;
color:#c8e8a0 !important;
border:none !important;
font-size:inherit !important;
line-height:inherit !important;
border-radius:0 !important;
box-shadow:none !important;
white-space:pre !important;
}
/* Comments stay green, keep syntax contrast */
#seg-guide pre .hljs-comment,
#seg-guide pre .hljs-meta{color:#5a8a4a !important}
#seg-guide pre .hljs-string,
#seg-guide pre .hljs-attr{color:#86c17a !important}
#seg-guide pre .hljs-keyword,
#seg-guide pre .hljs-built_in{color:#76B900 !important}
#seg-guide pre .hljs-number,
#seg-guide pre .hljs-literal{color:#a8d888 !important}

/* ── Tip box ── */
#seg-guide .tip{display:flex;gap:10px;background:rgba(118,185,0,0.07);border:1px solid rgba(118,185,0,0.25);border-radius:6px;padding:10px 14px;margin-top:12px}
#seg-guide .tip-icon{font-size:13px;flex-shrink:0;margin-top:1px;color:#76B900}
#seg-guide .tip-text{font-size:12px;color:#c0c0c0;line-height:1.7}
#seg-guide .tip-text strong{color:#e8e8e8;font-weight:500}
#seg-guide .tip-text code{background:#1e1e1e !important;color:#76B900 !important;padding:1px 5px !important;border-radius:3px;font-size:11px !important;border:none !important;box-shadow:none !important}

/* ── Variant table ── */
#seg-guide .var-table{width:100%;border-collapse:collapse;font-size:12px;margin-top:12px}
#seg-guide .var-table th{text-align:left;padding:8px 10px;color:#aaa;font-weight:500;border-bottom:1px solid #2a2a2a;font-size:10px;text-transform:uppercase;letter-spacing:.06em;background:#161616}
#seg-guide .var-table td{padding:9px 10px;border-bottom:1px solid #1e1e1e;color:#ccc}
#seg-guide .var-table td:first-child{color:#fff;font-weight:500}
#seg-guide .var-table tr:hover td{background:rgba(118,185,0,0.05)}
#seg-guide .pill{display:inline-block;font-size:10px;padding:2px 7px;border-radius:99px;font-weight:500}
#seg-guide .pill.green{background:rgba(118,185,0,0.15);color:#8fd400;border:1px solid rgba(118,185,0,0.35)}
#seg-guide .pill.yellow{background:rgba(234,179,8,0.15);color:#f0b429;border:1px solid rgba(234,179,8,0.35)}
#seg-guide .pill.blue{background:rgba(59,130,246,0.15);color:#7ab8ff;border:1px solid rgba(59,130,246,0.35)}

/* ── Navigation ── */
#seg-guide .nav{display:flex;align-items:center;justify-content:space-between;margin-top:1.8rem;padding-top:1.2rem;border-top:1px solid #222}
#seg-guide .nav-btn{display:flex;align-items:center;gap:8px;background:none;border:1px solid #333;border-radius:6px;padding:8px 16px;color:#bbb;font-family:’JetBrains Mono’,monospace;font-size:12px;cursor:pointer;transition:all .2s}
#seg-guide .nav-btn:hover{border-color:#76B900;color:#76B900}
#seg-guide .nav-btn:disabled{opacity:.25;cursor:default;pointer-events:none}
#seg-guide .progress{font-size:11px;color:#999;letter-spacing:.05em}
#seg-guide .progress span{color:#76B900}

/* ── Footer ── */
#seg-guide .footer{text-align:center;margin-top:2rem;padding-top:1rem;border-top:1px solid #1e1e1e !important}
#seg-guide .footer p{font-style:italic;font-size:10px;color:#666;margin:0}

/* ════════════════════════════════════════════════
WORDPRESS INJECTION FIXES
WordPress wptexturize/wpautop can inject ,


,

, into pre/code blocks.
These rules suppress all of them.
════════════════════════════════════════════════ */

/* Kill any


WordPress injects (blank lines → hr) */
#seg-guide hr,
#seg-guide pre hr,
#seg-guide code hr{display:none !important;height:0 !important;margin:0 !important;border:none !important}

/* Kill strikethrough from wptexturize converts em-dash patterns */
#seg-guide del,#seg-guide s,#seg-guide ins,#seg-guide u{text-decoration:none !important;color:inherit !important}
#seg-guide pre del,#seg-guide pre s,#seg-guide pre ins{display:inline !important;text-decoration:none !important;color:#c8e8a0 !important;background:transparent !important}

/* Kill empty

tags wpautop injects between elements */
#seg-guide p:empty{display:none !important;margin:0 !important;padding:0 !important;height:0 !important;line-height:0 !important}

/* Kill injected
tags everywhere EXCEPT inside slide-desc and tip-text */
#seg-guide .stepper br,
#seg-guide .nav br,
#seg-guide .hdr br{display:none !important}

/* Step-line: prevent theme min-height overrides from making it tall */
#seg-guide .step-line{
flex:1 !important;height:1px !important;min-height:1px !important;
max-height:1px !important;background:#2e2e2e !important;
position:relative;min-width:20px;max-width:80px;
margin:0 !important;padding:0 !important;
border:none !important;line-height:0 !important;font-size:0 !important
}

/* Force borders to stay dark — theme may override with light colours */
#seg-guide .hdr{border-bottom:1px solid #242424 !important}
#seg-guide .nav{border-top:1px solid #222 !important}

/* Nav buttons — theme button styles bleed in */
#seg-guide .nav-btn{
background:#111 !important;border:1px solid #333 !important;
color:#bbb !important;text-decoration:none !important;
box-shadow:none !important;outline:none
}
#seg-guide .nav-btn:hover{border-color:#76B900 !important;color:#76B900 !important}
#seg-guide .nav-btn:disabled{opacity:.25 !important}

/* ════════════════════════════════════════════════
MOBILE RESPONSIVE
════════════════════════════════════════════════ */
@media (max-width:640px){
#seg-guide .wrap{padding:1.25rem 1rem 1rem !important}

/* Header */
#seg-guide .hdr h1{font-size:1.1rem !important;line-height:1.3 !important}
#seg-guide .hdr p{font-size:10px !important}
#seg-guide .hdr-badge{font-size:9px !important;padding:3px 8px !important}
#seg-guide .hdr{margin-bottom:1.4rem !important;padding-bottom:1rem !important}

/* Stepper — shrink circles and labels */
#seg-guide .stepper{margin-bottom:2.8rem !important;gap:0 !important}
#seg-guide .step-circle{width:24px !important;height:24px !important;font-size:10px !important}
#seg-guide .step-label{font-size:8px !important;top:30px !important;letter-spacing:.02em !important}
#seg-guide .step-line{max-width:28px !important;min-width:8px !important}

/* Slide content */
#seg-guide .slide-num{font-size:2rem !important;min-width:40px !important}
#seg-guide .slide-title{font-size:1rem !important}
#seg-guide .slide-tag{font-size:9px !important}
#seg-guide .slide-desc{font-size:12px !important;padding-left:10px !important}
#seg-guide .slides{min-height:320px !important}

/* Code blocks */
#seg-guide pre,#seg-guide pre.hljs{
font-size:10.5px !important;
padding:10px 12px !important;
overflow-x:auto !important;
-webkit-overflow-scrolling:touch !important
}
#seg-guide .code-bar{padding:6px 10px !important}

/* Tip box */
#seg-guide .tip{padding:8px 10px !important}
#seg-guide .tip-text{font-size:11px !important}

/* Variant table — horizontal scroll */
#seg-guide .var-table{font-size:10px !important;display:block;overflow-x:auto !important;-webkit-overflow-scrolling:touch !important}
#seg-guide .var-table th,#seg-guide .var-table td{padding:6px 7px !important;white-space:nowrap}

/* Navigation */
#seg-guide .nav{margin-top:1.2rem !important;padding-top:0.9rem !important}
#seg-guide .nav-btn{padding:7px 10px !important;font-size:11px !important}
#seg-guide .progress{font-size:10px !important}
}

Step-by-Step Guide

How to Use NVIDIA Star Elastic

Nemotron Nano v3 Elastic — 30B / 23B / 12B in one checkpoint  ·  BF16 / FP8 / NVFP4

1

Install

2

Load

3

Infer

4

Serve

5

Precision

01
Prerequisites
Install Dependencies

Star Elastic models are distributed via Hugging Face and support both
Transformers (for experimentation) and vLLM
(recommended for production inference). Pick the option that fits your use case.

bash
# Option A — vLLM (recommended for production serving)
pip install vllm

# Option B — Transformers (for local experimentation)
pip install transformers torch accelerate

# Optional: log in to Hugging Face if needed
pip install huggingface_hub
huggingface-cli login



Hardware note: The 30B BF16 checkpoint requires ~60 GB VRAM for the full nested family.
Use FP8 (~31 GB) or NVFP4 (~19 GB) for H100/A100 or RTX-class deployment.

02
Model Loading
Load the Elastic Checkpoint

A single checkpoint contains all three nested variants — 30B (3.6A),
23B (2.8A), and 12B (2.0A). Load once; extract any variant
without retraining. The model requires trust_remote_code=True for the hybrid
Mamba–Transformer–MoE architecture.

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# The 30B BF16 elastic checkpoint — contains all 3 nested variants
model_id = "nvidia/NVIDIA-Nemotron-Labs-3-Elastic-30B-A3B-BF16"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"     # distributes across available GPUs
)

print(f"Model loaded: {model_id}")



Active vs. total parameters: “30B total / 3.6B active” means the model stores
30B weights but only routes each token through 3.6B parameters per forward pass — this is how
Mixture-of-Experts (MoE) works.

03
Inference
Run Your First Inference

The model uses a <think> token to generate a reasoning chain before
producing its final answer. Control the total token budget via max_new_tokens
— higher values allow longer reasoning traces on hard problems.

python
messages = [
    {
        "role": "user",
        "content": "What is the time complexity of QuickSort, and why?"
    }
]

# Apply chat template and tokenize
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device)

# Generate — model produces <think>...</think> then the final answer
outputs = model.generate(
    **inputs,
    max_new_tokens=4096,    # thinking + answer budget
    temperature=0.6,
    top_p=0.95,
    do_sample=True
)

response = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[-1]:],
    skip_special_tokens=True
)
print(response)



Thinking budget tip: For math/coding problems, set max_new_tokens
to 8192–32768. For simpler queries, 2048–4096 is sufficient and reduces latency.

04
Production Serving
Serve with vLLM

For production deployments, use vLLM to serve the model via an
OpenAI-compatible REST API. This enables batched inference, continuous batching,
and higher throughput — the 12B variant achieves 2.4× the throughput
of the 30B parent on an H100 GPU.

bash
# Start the vLLM server (OpenAI-compatible)
vllm serve "nvidia/NVIDIA-Nemotron-Labs-3-Elastic-30B-A3B-BF16"

# --- In a separate terminal ---

# Query the server via curl
curl -X POST " \
  -H "Content-Type: application/json" \
  --data '{
    "model": "nvidia/NVIDIA-Nemotron-Labs-3-Elastic-30B-A3B-BF16",
    "messages": [
      {
        "role": "user",
        "content": "Explain gradient descent in 3 steps."
      }
    ],
    "max_tokens": 4096,
    "temperature": 0.6
  }'

# Or run via Docker
docker model run hf.co/nvidia/NVIDIA-Nemotron-Labs-3-Elastic-30B-A3B-BF16



SGLang alternative: SGLang is also supported —
run python3 -m sglang.launch_server --model-path "nvidia/NVIDIA-Nemotron-Labs-3-Elastic-30B-A3B-BF16" --port 30000
for a drop-in alternative to vLLM.

05
Precision Selection
Choose Your Precision Variant

Three quantized checkpoints are available. All preserve the nested structure
— the 23B and 12B submodels can be extracted zero-shot from whichever precision checkpoint
you load. NVFP4 uses Quantization-Aware Distillation (QAD) to recover accuracy lost from PTQ.

bash
# BF16 — full precision, all nested variants in 58.9 GB
vllm serve "nvidia/NVIDIA-Nemotron-Labs-3-Elastic-30B-A3B-BF16"

# FP8 (E4M3) — ~2x smaller, 30B fits in 31.4 GB
# Post-training quantization, 98.69% accuracy recovery on 30B
vllm serve "nvidia/NVIDIA-Nemotron-Labs-3-Elastic-30B-A3B-FP8"

# NVFP4 — smallest footprint, 30B fits in 18.7 GB
# 12B NVFP4 variant runs on RTX 5080 (BF16 OOMs)
# 12B NVFP4 on RTX Pro 6000: 7,426 tokens/s (3.4x vs 30B BF16)
vllm serve "nvidia/NVIDIA-Nemotron-Labs-3-Elastic-30B-A3B-NVFP4"

Variant30B memory23B memory12B memoryBest for
BF16 Full58.9 GB44.0 GB23.2 GBA100 / H100
FP8 PTQ31.4 GB23.7 GB13.0 GBH100 / A100 / RTX 5090
NVFP4 QAD18.7 GB14.1 GB8.0 GBRTX 5080 / 5090 / Pro 6000

(function(){
var segCur=0, segPrev_=-1, segTotal=5;
var root=document.getElementById(‘seg-guide’);

window.segGoTo=function(n){
if(n===segCur)return;
var slides=root.querySelectorAll(‘.slide’);
var nodes=root.querySelectorAll(‘.step-node’);
var lines=root.querySelectorAll(‘.step-line’);
slides[segCur].classList.add(n>segCur?’exit-left’:”);
setTimeout(function(){
slides[segCur].classList.remove(‘active’,’exit-left’);
segCur=n;
slides[segCur].classList.add(‘active’);
slides[segCur].style.opacity=0;
slides[segCur].style.transform=’translateX(‘+(n>segPrev_?’30px’:’-30px’)+’)’;
requestAnimationFrame(function(){
requestAnimationFrame(function(){
slides[segCur].style.opacity=1;
slides[segCur].style.transform=’translateX(0)’;
});
});
segPrev_=n;
},150);
nodes.forEach(function(nd,i){
nd.classList.toggle(‘active’,i===n);
nd.classList.toggle(‘done’,i<n);
});
lines.forEach(function(ln,i){
ln.classList.toggle('done',i<n);
});
document.getElementById('seg-prev').disabled=n===0;
document.getElementById('seg-next').disabled=n===segTotal-1;
document.getElementById('seg-next').textContent=n===segTotal-2?'Finish →':'Next →';
document.getElementById('seg-cur').textContent=n+1;
};

window.segNext=function(){if(segCur0)segGoTo(segCur-1);};

window.segCopy=function(btn){
var code=btn.closest(‘.code-wrap’).querySelector(‘code’).innerText;
navigator.clipboard.writeText(code).then(function(){
btn.textContent=”Copied!”;
btn.classList.add(‘copied’);
setTimeout(function(){btn.textContent=”Copy”;btn.classList.remove(‘copied’);},2000);
});
};

function segInit(){
if(typeof hljs!==’undefined’){hljs.highlightAll();}
root.querySelectorAll(‘.slide’).forEach(function(s){
s.style.transition=’opacity .25s ease, transform .25s ease’;
});
}

if(document.readyState===’loading’){
document.addEventListener(‘DOMContentLoaded’,segInit);
} else {
segInit();
}
})();

Key Takeaways

  • Star Elastic trains 30B, 23B, and 12B nested reasoning models from a single 160B-token post-training run, achieving a 360× token reduction over pretraining from scratch.
  • Elastic budget control (23B for thinking, 30B for answering) improves the accuracy–latency Pareto frontier by up to 16% accuracy and 1.9× latency gains.
  • A learnable router with Gumbel-Softmax enables end-to-end trainable architecture selection, eliminating the need for separate compression runs per model size.
  • Nested QAD preserves zero-shot slicing across FP8 and NVFP4 quantized checkpoints, reducing the 30B elastic checkpoint to 18.7 GB in NVFP4.
  • All three precision variants (BF16, FP8, NVFP4) are publicly available on Hugging Face under nvidia/NVIDIA-Nemotron-Labs-3-Elastic-30B-A3B.

Check out the Paper, Elastic Models on Hugging Face BF16, FP8 and NVFP4 Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

The post NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing appeared first on MarkTechPost.

‘No longer a market outlier’ – How Bitcoin’s volatility has changed the game 

0
Bitcoin is no longer behaving like a market outlier



Has the volatility profile of Bitcoin been permanently altered by institutional adoption?

EU approves new sanctions on Israeli settlers over West Bank violence

0




The European Union on Monday approved new sanctions on Israeli settlers “guilty of supporting the extremist and violent colonisation of the West Bank”, French Foreign Minister Jean-Noël Barrot said on social media. Israel called the sanctions “arbitrary and political” and vowed to stand ‘”for the right of Jews” to settle in the West Bank.

Tom Lee Doubles Down on ‘Crypto Spring’ Theory but Bitmine Slows ETH Accumulation

0
Tom Lee Doubles Down on ‘Crypto Spring’ Theory but Bitmine Slows ETH Accumulation



Bitmine Immersion Technologies has slowed the pace of accumulation of more ETH, as it’s well within its timeframe to reach the 5% supply target this year.

Nevertheless, its chairman remains highly bullish on crypto and Ethereum in particular, predicting the end of the bull market and the beginning of crypto spring.

The new press release from the firm shows that its total ETH holdings have risen to 5.21 million tokens from 5.18 million last week. This means that the firm has bought roughly 30,000 coins in the past week, which is a substantial decline from the over 100,000 in the previous few accumulation announcements.

The reason for this, according to chair Tom Lee, is that the previous pace of over 100,000 ETH per week “would have us reach 5% by mid-July.” He talks about the percentage of the asset’s total supply owned by the company he runs, which is now at around 4.3%. The company’s goal is to actually hit the coveted 5% in late 2026.

The declining buying efforts don’t mean that Lee and Bitmine are not as bullish on ETH as they were before; just the opposite.

“‘Crypto spring’ has commenced, and we wanted to highlight the importance of owning ETH as a source of diversification, and the likely drivers of this coming ‘crypto bull’ cycle. If ETH closes above $2,100 at the end of May 2026, this would be the third consecutive monthly gain – this has never been seen in a crypto bear market. Thus, a close above $2,100 would validate [that] ‘crypto spring’ has arrived.”

The company has accumulated over a million ETH since the start of 2026. In addition, its portfolio consists of 201 BTC, a $200 million stake in Beast Industries, an $88 million stake in Eightco Holdings, and total cash of $775 million.

It’s still the second-largest corporate holder of any cryptocurrency, trailing only Strategy, which increased its BTC holdings again today.

The post Tom Lee Doubles Down on ‘Crypto Spring’ Theory but Bitmine Slows ETH Accumulation appeared first on CryptoPotato.

Li Keqiang Meets US Senators: China Pushes US Cooperation Over Confrontation

0


The meeting between Chinese Premier Li Keqiang and a delegation of US senators led by Steve Daines in Beijing on Thursday, May 7, 2026, represents a highly significant strategic diplomatic move, especially given the upcoming summit between Chinese President Xi Jinping and his US counterpart Donald Trump. This meeting reflects Beijing’s desire to manage tensions […]

The post Li Keqiang Meets US Senators: China Pushes US Cooperation Over Confrontation appeared first on Modern Diplomacy.

ZachXBT Dives Deeper into $LAB Token Scandal: Offers Bounty Following Market Manipulation Allegations

0
ZachXBT Dives Deeper into $LAB Token Scandal: Offers Bounty Following Market Manipulation Allegations


The cryptocurrency space has come under fire once more as the token $LAB is investigated after a massive 3,700% price spike in a single month.

A rapid appreciation is usually enough to attract speculation, however, the fundamentals paint an entirely different and much more damning picture.

ZachXBT Dives Deeper into $LAB Token Scandal: Offers Bounty Following Market Manipulation Allegations

This led to an abnormally centralized token distribution with approximately 95% of the token supply sitting with team and insiders, according to reports. Such a level of concentration also immediately raises concerns over possible price manipulation. In a decentralized market landscape, however, this unequal distribution of supply carries with it an increased potential for coordination, especially when combined with sudden price spikes.

The case has spiraled out of control so much in fact, that on-chain investigator ZachXBT is put a $10,000 bounty for legitimate tips to help identify those behind the actions.

ZachXBT Dives Deeper into $LAB Token Scandal: Offers Bounty Following Market Manipulation Allegations

The move highlights rising frustration in the crypto community to learn what many suspect is a planned effort to game the markets.

On-Chain Data Implies Insider-Initiated Liquidity Movements

Investigation shows that more granular analysis of abnormal wallet behavior is directly attributable to the LAB team. Several reports reveal that those wallets associated with insiders are among the main actors of liquidity supply and price dynamics.

One of the most eye-catching trades was that on April 8, when a wallet linked to the team tagged with 0xe037 deposited 40 million LAB tokens worth around $13.6 million to centralized exchange Bitget. The great size and timing of this transfer raised alarm bells straight away.

Digging deeper informs us that only days before the announced price increase on May 1, more wallets affiliated with the team transferred 96 million LAB tokens worth $63 million to Bitget. Large size transfers just prior to an important price pump are indicative of market manipulation by pre-positioning.

The patterns closely represent old school pump-and-dump schemes where those on the inside scoop build or hoard a massive supply, manipulate demand and then dump their bags onto retail investors while they are at the height of hype.

Rumors Of Market Manipulation Go Beyond Just One Token

The LAB incident is not an isolated case. According to the report from Specter Analyst, it appears CEXs might be enabling these cycles either accidentally or intentionally by providing liquidity and trading infrastructure without sufficient scrutiny.

For example, in the LAB case, Bitget has been mentioned as a main place of trading volume. Critics argue that exchanges profit from fatter volumes via fees, and hence have an interest in not having to rigorously enforce suspicious behaviours.

These worries reach beyond Bitget, with high-volume exchanges for perpetual futures markets possibly allowing speculative manipulation.

Community Frustration Rises as Transparency Lacks

Anger among members of the cryptocurrency community over stark lack of clarity is rising. Bitget CEO Gracy, also previously confirmed “an investigation that is still ongoing” about related matters, like the RAVE case, but weeks later comments were made public.

This silence, in turn, breeds uncertainty among investors and analysts. Trust in centralized exchanges erodes without uniform disclosures or transparent enforcement of those disclosures. The absence of accountability raises an existential question for various stakeholders, namely, who is responsible for ensuring market integrity?

ZachXBT had urged the company heads to address it publicly as well, writing that this kind of transparency was essential and should also be followed up with real action.

The continued lack of immediate updates not only sullies the exchange’s reputation but also increases uncertainty throughout the market.

Coordinated Strategy Behind Aggressive On-Chain Activity

In addition to exchange deposits; on-chain data indicates aggressive accumulation and distribution of LAB tokens. Some wallets have purchased LAB on-chain and moved this into exchanges like Gate and Bitget.

Of particular interest, these wallets also exhibit activity in other tokens such as are currently trading at over 1,000%, after a movement of more than 30 days. These cross-token movements hint at some sort of organized behavior across multiple projects.

When you see patterns like these, they are either organized trading groups working with artificial intelligence or insiders trying to liquidate as many assets as possible down the line. These actors rotate capital through tokens, playcarding one title after another to generate hype cycles in strongly correlated form whilst extracting value from retail users.

Because of that repetitiveness, it becomes a societal problem versus individual events.

This can have consequences far beyond a single token or exchange as illustrated by the LAB episode. Every manipulation case which goes undetected damages the credibility of the entire cryptocurrency sector.

Even retail actors, who are often entering in a cast-19-affected market with limited data and modelling tools, are most susceptible. Sudden price pumps and dumps not only provide massive losses but also cut out from the game reducing the long confidence in general.

This leads to a growing tension between revenue generation and market integrity. The exchanges gain from high-volume trading in fees, but if those volumes are more manipulation than commerce, the long term harm done could be extensive.

Community-Led Accountability Shines Through as Bounty Signals Change

When institutions fail to respond in a timely manner, citizens are stepping up. The $10k bounty ZacksXBT offers is the perfect example of an emerging trend where independent investigators and analysts operate as unofficial watchdogs in the crypto ecosystem.

The bounty pledges a reward for whistleblowers and sources of information, all in the name of tracking down those responsible for running LAB work. This type of thing is part of a larger trend toward decentralised accountability, where community-instituted transparency takes precedence over institutional authority.

However, these attempts also reveal the contradictions of the present order. The idea is that exchanges and regulators would take the initiative to detect and prevent manipulation to the greatest extent possible, eliminating reliance on third parties.

Market Integrity – A Critical Tipping Point

The new LAB scandal comes at an important time in the history of the crypto industry. With increased adoption comes the need for comprehensive protections designed to protect participants and create a level playing field in the market.

Erosion of confidence in the ecosystem is likely to continue until platforms make an effort to investigate suspicious activities, expose coordinated manipulation and provide updates with transparency. In contrast, strong and transparent action could help rebuild trust and establish clear standards of accountability.

For the moment, focus is trained on the ongoing investigation, and whether the industry can rise to meet one of its biggest challenges.

Disclosure: This is not trading or investment advice. Always do your research before buying any cryptocurrency or investing in any services.

Follow us on Twitter @nulltxnews to stay updated with the latest Crypto, NFT, AI, Cybersecurity, Distributed Computing, and Metaverse news!



DraftKings turns profitable as prediction markets become key growth strategy

0
The DraftKings logo.


The DraftKings logo.

DraftKings entered 2026 with the kind of earnings report investors have long been waiting for, signaling that years of aggressive expansion are beginning to translate into sustained profitability.

The Boston-based sportsbook and online gaming operator reported $1.65 billion in first-quarter revenue on Thursday, a 17% increase from the same period last year. Improved sportsbook margins and steady customer engagement helped the company post a $21 million profit, reversing a loss recorded a year earlier.

The results suggest DraftKings is moving beyond the costly customer-acquisition phase that defined the early years of legalized sports betting expansion across the United States. Company executives said the core sportsbook business is now generating enough cash flow to fund newer initiatives without undermining profitability.

We are off to a fantastic start to the year as our first quarter results exceeded our expectations,” Chief Executive Officer Jason Robins said in the company’s earnings release. “Our core business is strong, and profitability is inflecting. That gives us the firepower to press our advantage in Predictions.”

Prediction markets emerge as a major growth focus

Robins said prediction-based products are becoming an increasingly important part of DraftKings’ long-term strategy. The category allows users to trade on the outcomes of sports, political, and entertainment events and has gained traction across the broader online wagering industry.

According to Robins, DraftKings plans to combine sportsbook technology, exchange systems, and betting products to “establish a leadership position in Sports Predictions before year-end.”

The initiative aligns with the company’s broader ambition to create a “super app” ecosystem that integrates sports betting, online casino gaming, media, and predictive trading products within a single platform. Executives believe a wider digital entertainment ecosystem can increase customer engagement, boost spending, and reduce churn.

Growth shifts from customer acquisition to customer value

Despite strong revenue growth, DraftKings showed signs of relying less on rapid user acquisition and more on generating higher spending from existing customers.

Monthly unique payers declined year over year, largely due to the company’s 2025 exit from the Texas lottery business. Excluding that operation, customer growth remained modestly positive.

At the same time, average revenue per user increased significantly during the quarter as sportsbook margins improved and engagement across online casino and sports betting products remained strong. Sportsbook revenue also outpaced total betting handle growth, indicating the company retained more profit from wagers placed.

Investment spending remains elevated

DraftKings continues to invest aggressively in key strategic areas even as profitability improves. Sales and marketing expenses exceeded $400 million during the quarter, while spending on software development and legislative lobbying also remained elevated.

International expansion remains part of the company’s broader growth plans. DraftKings already operates in Ontario and could eventually benefit from Alberta’s decision to approve 28 operators ahead of launching a regulated online gambling market in Canada’s second-largest province.

Chief Financial Officer Alan Ellingson said the company believes it can continue balancing expansion with stronger earnings performance.

“The business continues to scale efficiently as we grow revenue, expand profitability, and invest in high-return opportunities,” Ellingson said.

Featured image: DraftKings

The post DraftKings turns profitable as prediction markets become key growth strategy appeared first on ReadWrite.

Nintendo has apparently blocked a workaround for watching YouTube on the Switch 2

0




Switch 2 users found a clever trick to watch YouTube through a hidden browser. Nintendo blocked it within hours, even when there is still no official streaming app nearly a year after launch.

Canton’s $6T RWA rails and Lighter’s Hyperliquid multiple

0
Canton’s $6T RWA rails and Lighter’s Hyperliquid multiple



DTCC moves DTC-custodied Treasuries onchain via Canton, while Lighter’s LIT launches trading at a fees multiple in Hyperliquid territory

Just One Lake on Earth Is Over a Mile Deep

0


Ranked: The World’s Deepest Lakes by Depth

See visuals like this from many other data creators on our Voronoi app. Download it for free on iOS or Android and discover incredible data-driven charts from a variety of trusted sources.

Key Takeaways

  • Only one lake—Baikal—exceeds a mile in depth.
  • A steep 1,460-foot drop separates the second and third deepest lakes.
  • Lake Vostok ranks fourth despite being buried under 13,000 feet of Antarctic ice.

Lake Baikal plunges to 5,387 feet—making it the only lake on Earth more than a mile deep.

While Lake Tanganyika comes close, a sharp drop follows. The third-ranked Caspian Sea is over 1,400 feet shallower, highlighting how rare extreme lake depth really is.

This visualization ranks the world’s deepest lakes by maximum depth in feet and meters, based on data from WorldAtlas.

Lake Baikal and Lake Tanganyika Stand Apart

As the world’s oldest lake at 25–30 million years old, Lake Baikal in Russia also ranks as the deepest lake, reaching a maximum depth of 5,387 feet. That makes it 564 feet deeper than Lake Tanganyika, which comes in second at 4,823 feet.

The striking depth of these two ancient lakes is attributable to their status as rift lakes, which only occur in tectonically active regions.

RankLakeLocationMax depth (ft)Max depth (m)
1Baikal Russia5,3871,642
2Tanganyika🇹🇿 Tanzania,
🇨🇩 DRC
🇧🇮 Burundi
🇿🇲 Zambia
4,8231,470
3Caspian Sea🇮🇷 Iran
🇷🇺 Russia
🇹🇲 Turkmenistan
🇰🇿 Kazakhstan
🇦🇿 Azerbaijan
3,3631,025
4Vostok🇦🇶 Antarctica3,3001,000
5O’Higgins-San Martín🇨🇱 Chile
🇦🇷 Argentina
2,742836
6Malawi/Nyasa/Niassa🇲🇿 Mozambique
🇲🇼 Malawi
🇹🇿 Tanzania
2,316706
7Issyk Kul🇰🇬 Kyrgyzstan2,192668
8Great Slave🇨🇦 Canada2,015614
9Crater🇺🇸 United States1,949594
10Matano🇮🇩 Indonesia1,936590

Most lakes are formed by glaciers, as masses of ice carved out depressions in the landscape as they moved slowly.

However, rift lakes like Baikal and Tanganyika occur where the planet’s crust has stretched, cracked, or shifted over millions of years to create deep basins that slowly filled with water. Due to their extreme depth, these lakes contain globally significant volumes of fresh surface water. Lake Baikal alone holds roughly 20% of the world’s unfrozen surface freshwater, underscoring how depth translates into global significance.

Other lakes in the ranking stand out because of their unusual characteristics, from Antarctica’s Lake Vostok, hidden around 13,100 feet under ice in total darkness, to Crater Lake in the United States, which sits in a volcanic crater.

A Sharp Drop Off in Depth

The Caspian Sea, whose brackish water is roughly a third as salty as seawater, ranks as the third deepest lake at 3,363 feet. It’s also the world’s largest lake by surface area.

A major gap of 1,460 feet—the largest in the data set—separates the Caspian Sea from the second largest lake.

After this, the decline becomes much more gradual, with a similar-sized gap (1,427 ft) between third and 10th place lakes in the ranking.

Deep Lakes Span Every Corner of the Globe

The top 10 deepest lakes span a wide geographic range. They include lakes in Russia, Central Asia, North America, South America, Africa, Antarctica, and Southeast Asia.

Several of the world’s deepest lakes even cross political borders.

Lake Tanganyika touches Tanzania, the Democratic Republic of the Congo, Burundi, and Zambia, while the Caspian Sea borders five countries. O’Higgins-San Martín Lake is shared by Chile and Argentina, and Lake Malawi/Nyasa/Niassa is linked to Mozambique, Malawi, and Tanzania.

Learn More on the Voronoi App

If you enjoyed today’s post, check out Visualized: Exploring the Ocean’s Future on Voronoi.Use This Visualization

Recent Posts