NVIDIA AI researchers recently released cuda-oxide, an experimental compiler that allows developers to write CUDA SIMT (Single Instruction, Multiple Threads) GPU kernels in standard Rust code. The project compiles Rust directly to PTX (Parallel Thread Execution) — the assembly-like intermediate representation that CUDA uses to target NVIDIA GPUs — without requiring domain-specific languages, foreign function interface bindings, or C/C++ code.
How This Makes a Change
Writing GPU kernels today typically means writing C++ and using the CUDA programming model directly, or relying on Python-level abstractions like Triton that generate CUDA under the hood. The Rust GPU ecosystem has had projects attempting to bridge this gap — Rust-GPU targets SPIR-V for Vulkan/graphics compute, rust-cuda uses a rustc codegen backend targeting NVVM IR, CubeCL uses an embedded DSL with a JIT runtime that cross-compiles to CUDA/ROCm/WGPU, and std::offload uses LLVM’s implicit offload path.
cuda-oxide occupies a specific position in this space. Its stated design center is “bringing CUDA into Rust” — kernel authoring, device intrinsics, the SIMT execution model, and the CUDA programming model expressed natively in safe Rust — closer in spirit to writing a __global__ function in C++ than to writing a generic Rust function that happens to run on a GPU. By contrast, the closest neighbor, rust-cuda, focuses on “bringing Rust to NVIDIA GPUs”: Rust ergonomics like async/.await, parts of the standard library running on-device, and a Rust-first programming model that abstracts over CUDA concepts. The NVlabs team notes it has been coordinating with rust-cuda maintainers and considers the two projects complementary.
The Compilation Pipeline
At the core of cuda-oxide is a custom rustc codegen backend — the layer in the Rust compiler responsible for generating machine code. Instead of emitting native CPU code, the rustc-codegen-cuda crate intercepts the compiler at the CodegenBackend::codegen_crate() entry point and runs a separate pipeline for device code:
Rust Source → rustc frontend → rustc_public (Stable MIR) → dialect-mir → mem2reg → dialect-llvm → LLVM IR (.ll) → PTX (.ptx)
Here are some important elements:
Why rustc_public? The raw internal MIR representation in rustc changes between nightly versions with no stability guarantees. cuda-oxide uses rustc_public — also known as Stable MIR — which is Rust’s official versioned, stable API over the compiler’s internals. This lets the backend read MIR without breaking on every nightly update.
What is Pliron? The middle stages use Pliron, a Rust-native MLIR-like IR framework written entirely in Rust. Choosing Pliron instead of upstream MLIR means the entire compiler builds with cargo — no C++ toolchain, no CMake, no tablegen. cuda-oxide defines three custom Pliron dialects: dialect-mir (modeling Rust MIR semantics — places, projections, rvalues, terminators), dialect-llvm (modeling LLVM IR with textual .ll export), and dialect-nvvm (NVIDIA GPU intrinsics like thread indexing, barriers, and TMA).
What does llc do? After the dialect-llvm printer serializes the IR into a textual .ll file, the external llc binary (the LLVM static compiler with NVPTX backend) compiles it to PTX assembly. This is the one stage outside pure Rust. The resulting .ptx file is written next to the host binary — for example, target/debug/vecadd.ptx — and loaded by the CUDA driver at runtime.
You as a developer can observe each stage with:
cargo oxide pipeline vecaddThis prints the full trace from Rust MIR through each dialect down to PTX output.
Single-Source Compilation and the Host/Device Split
Host and device code live in the same .rs source file. cargo oxide sets -Z codegen-backend=librustc_codegen_cuda.so, which routes code generation through cuda-oxide’s backend. The backend then scans compiled code for monomorphized functions whose names carry the reserved cuda_oxide_kernel_<hash>_<name> prefix — the namespace that the #[kernel] proc macro creates. Functions matching that prefix go through the cuda-oxide pipeline to produce PTX; all other host code is delegated to rustc’s standard LLVM backend. The result of a single cargo oxide build is a host binary plus a .ptx file.
cargo oxide run vecadd
cargo oxide debug vecadd --tui # debug with cuda-gdbDevice code from library dependencies is compiled lazily: the backend reads their Stable MIR from .rlib metadata on demand, only compiling functions a kernel actually calls.
What You Can Write in a Kernel
cuda-oxide supports a meaningful subset of Rust in GPU kernel functions, marked with the #[kernel] attribute macro. This includes:
- Generic functions with monomorphization —
fn scale<T: Copy>(...)is compiled to a concrete PTX kernel per type used at the call site. - Closures with captures — closures passed from the host are scalarized and passed as PTX kernel parameters automatically.
- User-defined structs and enums — standard Rust data structures work inside kernels.
- Pattern matching —
match,if let, and related constructs work in device code. - Full GPU intrinsics — the
cuda-devicecrate provides wrappers for thread indexing, warp operations (shfl_sync,ballot_sync, etc.), shared memory, barriers, TMA (Tensor Memory Accelerator), Thread Block Clusters, and scoped atomics (6 types × 3 scopes × 5 orderings).
One important GPU-specific compiler detail: rustc’s JumpThreading MIR optimization — which duplicates function calls into both branches of an if-statement — is disabled for device code in cuda-oxide. On CPUs this is a safe optimization, but on GPUs it breaks barrier semantics: all threads in a block must converge at the same bar.sync instruction, and duplicating it across branches violates that requirement. Additionally, sync primitives are marked convergent in the emitted LLVM IR so that LLVM’s optimization passes cannot move or duplicate them across control flow.
How to Use NVIDIA Star Elastic
/* ============================================================
cuda-oxide Step-by-Step Guide — WordPress Embed
Scoped to #coxide-guide — safe for WP injection
============================================================ */
#coxide-guide *,
#coxide-guide *::before,
#coxide-guide *::after {
box-sizing: border-box !important;
margin: 0 !important;
padding: 0 !important;
}
/* WPautop suppression */
#coxide-guide hr,
#coxide-guide p:empty,
#coxide-guide del,
#coxide-guide s {
display: none !important;
}
@import url(‘
#coxide-guide {
background: #0d0d0d !important;
border: 1px solid #1e1e1e !important;
border-radius: 12px !important;
font-family: ‘IBM Plex Sans’, sans-serif !important;
color: #e2e2e2 !important;
max-width: 860px !important;
margin: 32px auto !important;
overflow: hidden !important;
box-shadow: 0 0 0 1px #1a1a1a, 0 24px 80px rgba(0,0,0,0.6) !important;
}
/* ── Header ── */
#coxide-guide .cog-header {
background: #111111 !important;
border-bottom: 1px solid #222 !important;
padding: 24px 32px !important;
display: flex !important;
align-items: center !important;
gap: 16px !important;
}
#coxide-guide .cog-badge {
background: #76B900 !important;
color: #000 !important;
font-family: ‘JetBrains Mono’, monospace !important;
font-size: 10px !important;
font-weight: 700 !important;
letter-spacing: 1.5px !important;
text-transform: uppercase !important;
padding: 4px 10px !important;
border-radius: 4px !important;
flex-shrink: 0 !important;
}
#coxide-guide .cog-title {
font-size: 15px !important;
font-weight: 600 !important;
color: #f0f0f0 !important;
letter-spacing: -0.2px !important;
}
#coxide-guide .cog-subtitle {
font-size: 12px !important;
color: #666 !important;
margin-top: 2px !important;
font-family: ‘JetBrains Mono’, monospace !important;
}
/* ── Progress bar ── */
#coxide-guide .cog-progress-bar {
background: #1a1a1a !important;
height: 3px !important;
width: 100% !important;
}
#coxide-guide .cog-progress-fill {
height: 3px !important;
background: #76B900 !important;
transition: width 0.4s ease !important;
}
/* ── Step nav dots ── */
#coxide-guide .cog-dots {
display: flex !important;
align-items: center !important;
gap: 8px !important;
padding: 16px 32px !important;
background: #111 !important;
border-bottom: 1px solid #1e1e1e !important;
flex-wrap: wrap !important;
}
#coxide-guide .cog-dot {
width: 28px !important;
height: 28px !important;
border-radius: 50% !important;
border: 1.5px solid #333 !important;
background: transparent !important;
font-family: ‘JetBrains Mono’, monospace !important;
font-size: 10px !important;
color: #555 !important;
cursor: pointer !important;
display: flex !important;
align-items: center !important;
justify-content: center !important;
transition: all 0.2s !important;
flex-shrink: 0 !important;
}
#coxide-guide .cog-dot:hover {
border-color: #76B900 !important;
color: #76B900 !important;
}
#coxide-guide .cog-dot.active {
background: #76B900 !important;
border-color: #76B900 !important;
color: #000 !important;
font-weight: 700 !important;
}
#coxide-guide .cog-dot.done {
background: #1a2e00 !important;
border-color: #3a5c00 !important;
color: #76B900 !important;
}
#coxide-guide .cog-dot-line {
flex: 1 !important;
height: 1px !important;
background: #222 !important;
min-width: 8px !important;
}
/* ── Slide panel ── */
#coxide-guide .cog-slides {
position: relative !important;
overflow: hidden !important;
min-height: 380px !important;
}
#coxide-guide .cog-slide {
display: none !important;
padding: 32px !important;
animation: cog-fadein 0.3s ease !important;
}
#coxide-guide .cog-slide.active {
display: block !important;
}
@keyframes cog-fadein {
from { opacity: 0; transform: translateY(6px); }
to { opacity: 1; transform: translateY(0); }
}
#coxide-guide .cog-step-label {
font-family: ‘JetBrains Mono’, monospace !important;
font-size: 10px !important;
letter-spacing: 2px !important;
text-transform: uppercase !important;
color: #76B900 !important;
margin-bottom: 8px !important;
display: block !important;
}
#coxide-guide .cog-slide h2 {
font-size: 20px !important;
font-weight: 700 !important;
color: #f5f5f5 !important;
margin-bottom: 16px !important;
line-height: 1.3 !important;
letter-spacing: -0.3px !important;
}
#coxide-guide .cog-slide p {
font-size: 14px !important;
line-height: 1.75 !important;
color: #aaa !important;
margin-bottom: 16px !important;
}
#coxide-guide .cog-slide p strong {
color: #ddd !important;
font-weight: 600 !important;
}
#coxide-guide .cog-slide ul,
#coxide-guide .cog-slide ol {
padding-left: 20px !important;
margin-bottom: 16px !important;
}
#coxide-guide .cog-slide li {
font-size: 14px !important;
color: #aaa !important;
line-height: 1.75 !important;
margin-bottom: 4px !important;
}
#coxide-guide .cog-slide li strong {
color: #ddd !important;
}
/* ── Code blocks ── */
#coxide-guide pre {
background: #0a0a0a !important;
border: 1px solid #222 !important;
border-left: 3px solid #76B900 !important;
border-radius: 6px !important;
padding: 16px 20px !important;
margin: 16px 0 !important;
overflow-x: auto !important;
white-space: pre !important;
}
#coxide-guide code {
font-family: ‘JetBrains Mono’, monospace !important;
font-size: 12.5px !important;
line-height: 1.7 !important;
color: #d4d4d4 !important;
background: transparent !important;
border: none !important;
}
#coxide-guide p code,
#coxide-guide li code {
background: #1a1a1a !important;
border: 1px solid #2a2a2a !important;
border-radius: 3px !important;
padding: 1px 6px !important;
font-size: 12px !important;
color: #76B900 !important;
}
/* syntax colors */
#coxide-guide .kw { color: #569cd6 !important; }
#coxide-guide .fn { color: #dcdcaa !important; }
#coxide-guide .str { color: #ce9178 !important; }
#coxide-guide .cm { color: #6a9955 !important; }
#coxide-guide .num { color: #b5cea8 !important; }
#coxide-guide .at { color: #76B900 !important; }
#coxide-guide .ty { color: #4ec9b0 !important; }
#coxide-guide .mac { color: #c586c0 !important; }
/* ── Info boxes ── */
#coxide-guide .cog-note {
background: #0f1a00 !important;
border: 1px solid #2a4000 !important;
border-left: 3px solid #76B900 !important;
border-radius: 6px !important;
padding: 12px 16px !important;
margin: 16px 0 !important;
font-size: 13px !important;
color: #8ab800 !important;
line-height: 1.6 !important;
}
#coxide-guide .cog-warn {
background: #1a1000 !important;
border: 1px solid #403000 !important;
border-left: 3px solid #e6a817 !important;
border-radius: 6px !important;
padding: 12px 16px !important;
margin: 16px 0 !important;
font-size: 13px !important;
color: #c9932a !important;
line-height: 1.6 !important;
}
#coxide-guide .cog-note strong,
#coxide-guide .cog-warn strong {
display: block !important;
margin-bottom: 4px !important;
font-size: 11px !important;
letter-spacing: 1px !important;
text-transform: uppercase !important;
font-family: ‘JetBrains Mono’, monospace !important;
}
/* ── Requirement pills ── */
#coxide-guide .cog-pills {
display: flex !important;
flex-wrap: wrap !important;
gap: 8px !important;
margin: 16px 0 !important;
}
#coxide-guide .cog-pill {
background: #111 !important;
border: 1px solid #2a2a2a !important;
border-radius: 20px !important;
padding: 6px 14px !important;
font-size: 12px !important;
font-family: ‘JetBrains Mono’, monospace !important;
color: #888 !important;
display: flex !important;
align-items: center !important;
gap: 6px !important;
}
#coxide-guide .cog-pill .dot {
width: 6px !important;
height: 6px !important;
border-radius: 50% !important;
background: #76B900 !important;
flex-shrink: 0 !important;
}
/* ── Navigation footer ── */
#coxide-guide .cog-footer {
background: #111 !important;
border-top: 1px solid #1e1e1e !important;
padding: 16px 32px !important;
display: flex !important;
align-items: center !important;
justify-content: space-between !important;
}
#coxide-guide .cog-counter {
font-family: ‘JetBrains Mono’, monospace !important;
font-size: 12px !important;
color: #444 !important;
}
#coxide-guide .cog-counter span {
color: #76B900 !important;
}
#coxide-guide .cog-nav {
display: flex !important;
gap: 10px !important;
}
#coxide-guide .cog-btn {
background: #1a1a1a !important;
border: 1px solid #2e2e2e !important;
border-radius: 6px !important;
color: #ccc !important;
font-family: ‘IBM Plex Sans’, sans-serif !important;
font-size: 13px !important;
font-weight: 500 !important;
padding: 8px 20px !important;
cursor: pointer !important;
transition: all 0.2s !important;
letter-spacing: 0.2px !important;
}
#coxide-guide .cog-btn:hover {
background: #252525 !important;
border-color: #444 !important;
color: #fff !important;
}
#coxide-guide .cog-btn.primary {
background: #76B900 !important;
border-color: #76B900 !important;
color: #000 !important;
font-weight: 600 !important;
}
#coxide-guide .cog-btn.primary:hover {
background: #8fd400 !important;
border-color: #8fd400 !important;
}
#coxide-guide .cog-btn:disabled {
opacity: 0.3 !important;
cursor: not-allowed !important;
}
/* ── Attribution ── */
#coxide-guide .cog-attribution {
text-align: center !important;
padding: 12px 32px !important;
background: #0d0d0d !important;
border-top: 1px solid #181818 !important;
}
#coxide-guide .cog-attribution em {
font-size: 11px !important;
color: #3a3a3a !important;
font-style: italic !important;
font-family: ‘IBM Plex Sans’, sans-serif !important;
}
/* ── Mobile ── */
@media (max-width: 640px) {
#coxide-guide .cog-header { padding: 18px 20px !important; }
#coxide-guide .cog-dots { padding: 12px 20px !important; gap: 5px !important; }
#coxide-guide .cog-dot { width: 24px !important; height: 24px !important; font-size: 9px !important; }
#coxide-guide .cog-dot-line { min-width: 4px !important; }
#coxide-guide .cog-slide { padding: 20px !important; }
#coxide-guide .cog-slide h2 { font-size: 16px !important; }
#coxide-guide .cog-slide p,
#coxide-guide .cog-slide li { font-size: 13px !important; }
#coxide-guide pre { padding: 12px 14px !important; overflow-x: auto !important; }
#coxide-guide code { font-size: 11.5px !important; }
#coxide-guide .cog-footer { padding: 14px 20px !important; }
#coxide-guide .cog-btn { padding: 8px 14px !important; font-size: 12px !important; }
#coxide-guide .cog-pills { gap: 6px !important; }
#coxide-guide .cog-pill { font-size: 11px !important; padding: 4px 10px !important; }
#coxide-guide .cog-slides { min-height: 420px !important; }
#coxide-guide .cog-attribution { padding: 12px 20px !important; }
}
cuda-oxide — Step-by-Step Guide
(function() {
var current = 1;
var total = 9;
var slides = document.querySelectorAll(‘#coxide-guide .cog-slide’);
var dotsEl = document.getElementById(‘cog-dots’);
var progress = document.getElementById(‘cog-progress’);
var curEl = document.getElementById(‘cog-cur’);
var totEl = document.getElementById(‘cog-tot’);
var prevBtn = document.getElementById(‘cog-prev’);
var nextBtn = document.getElementById(‘cog-next’);
totEl.textContent = total;
// Build dots
for (var i = 1; i 1) {
var line = document.createElement(‘div’);
line.className=”cog-dot-line”;
dotsEl.appendChild(line);
}
var d = document.createElement(‘button’);
d.className=”cog-dot” + (i === 1 ? ‘ active’ : ”);
d.textContent = i;
d.setAttribute(‘data-idx’, i);
d.onclick = (function(idx) {
return function() { goTo(idx); };
})(i);
dotsEl.appendChild(d);
}
function goTo(n) {
if (n total) return;
// hide old slide
slides[current – 1].classList.remove(‘active’);
current = n;
slides[current – 1].classList.add(‘active’);
// update dots
var dots = dotsEl.querySelectorAll(‘.cog-dot’);
dots.forEach(function(d) {
var idx = parseInt(d.getAttribute(‘data-idx’));
d.classList.remove(‘active’, ‘done’);
if (idx === current) d.classList.add(‘active’);
else if (idx < current) d.classList.add('done');
});
// progress bar
progress.style.width = Math.round((current / total) * 100) + '%';
// counter
curEl.textContent = current;
// buttons
prevBtn.disabled = (current === 1);
nextBtn.disabled = (current === total);
if (current === total) {
nextBtn.textContent = 'Done ✓';
} else {
nextBtn.innerHTML = 'Next →';
}
}
window.cogNav = function(dir) { goTo(current + dir); };
// Init state
goTo(1);
})();
Key Takeaways
- cuda-oxide is a custom
rustccodegen backend from NVlabs that compiles#[kernel]-annotated Rust functions to PTX through a Rust →rustc_publicStable MIR → Pliron IR → LLVM IR → PTX pipeline, all buildable withcargo. - Host and device code coexist in a single
.rsfile, compiled with onecargo oxide buildcommand; the output is a host binary plus a.ptxfile placed next to it. - The safety model has three documented tiers: Tier 1 (race-free by construction via
DisjointSlice<T>+ThreadIndex), Tier 2 (scopedunsafefor shared memory, warp intrinsics, atomics), and Tier 3 (raw hardware intrinsics for TMA, WGMMA, tcgen05).index_2d(stride)is documented as currently unsound in the 0.x release. - The
gemm_solexample hits 868 TFLOPS on the B200 (58% of cuBLAS SoL) using a multi-phase GEMM pipeline with CLC andcta_group::2.
Check out the GitHub Repo. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us
The post NVIDIA AI Just Released cuda-oxide: An Experimental Rust-to-CUDA Compiler Backend that Compiles SIMT GPU Kernels Directly to PTX appeared first on MarkTechPost.





