Built with Rust

Deep learning with memory safety

RUMUS is a native-Rust deep learning framework that satisfies the borrow checker while delivering PyTorch-like ergonomics. Zero-cost abstractions, GPU acceleration, and compile-time safety guarantees.

$cargo add rumus

Why RUMUS?

Most deep learning frameworks achieve safety through runtime reference counting. RUMUS enforces memory safety as a first-class language constraint — checked at compile time, not runtime.

Memory Safe by Design

Rust's borrow checker enforces memory safety at compile time. No runtime reference counting, no data races, no use-after-free — guaranteed by the type system.

Zero-Cost Abstractions

View operations like reshape and transpose are metadata-only — zero memory allocation. Inference mode completely bypasses the autograd tape with no overhead.

GPU Acceleration

WGPU compute backend with 40+ shader entry points. Per-resource fences eliminate global pipeline stalls. Buffer pooling with power-of-2 bucketing recycles GPU memory.

PyTorch-Like Ergonomics

Familiar eager execution model. Define-by-run autograd. Module system with #[derive(Module)] proc macro. If you know PyTorch, you already know RUMUS.

Concrete Autograd

BackwardOp is a concrete enum with 16 variants — not opaque closures. Every backward operation is inspectable, Send + Sync safe, and deterministic.

Safe Serialization

Safetensors format with zero unsafe code via bytemuck. Save and load model state dicts with dot-path naming, just like PyTorch's state_dict().

Three orthogonal layers

The Tensor is not a junk drawer. RUMUS strictly partitions internal state into three independent layers, each with a single responsibility. This makes the framework auditable, predictable, and easy to extend.

Storage Layer

Raw memory management with CPU/GPU unified addressing, version tracking, and per-resource fences

Layout Layer

Shape, strides, and view semantics — reshape and transpose are zero-allocation metadata operations

Autograd Layer

Gradient tracking with append-only Wengert tape, Kahn's algorithm backward pass, and concrete BackwardOp enum

// Tensor anatomy
Tensor
StorageHandle
Cpu(Vec<f32>) | Gpu(wgpu::Buffer) | Both {...}
Layout
shape, strides, offset — views share storage
AutogradState
None | Tracked(grad_id, creator_op, is_leaf)

Familiar and expressive

If you know PyTorch, you already know RUMUS. Define models with structs, derive the Module trait, and train with eager execution.

examples/train.rs
rust
use rumus::nn::{self, type">Linear, type">Module};
use rumus::optim::type">Adam;
use rumus::autograd;
use rumus::type">Tensor;

"token-attribute">#[derive(type">Module)]
struct Net {
    fc1: type">Linear,
    fc2: type">Linear,
}

impl Net {
    fn new() -> type">Self {
        type">Self {
            fc1: type">Linear::new(784, 128),
            fc2: type">Linear::new(128, 10),
        }
    }

    fn forward(&self, x: &type">Tensor) -> type">Tensor {
        let h = nn::relu(&self.fc1.forward(x));
        self.fc2.forward(&h)
    }
}

fn main() -> type">Result<(), Box<dyn std::error::Error>> {
    let model = Net::new();
    let mut opt = type">Adam::new(model.parameters(), 0.001);

    for epoch in 0..100 {
        let pred = model.forward(&inputs);
        let loss = nn::cross_entropy_loss(&pred, &targets);

        let mut grads = autograd::backward(&loss)?;
        opt.step(&mut grads)?;

        println!("Epoch {epoch}: loss = {:.4}", loss.item());
    }

    nn::save_safetensors(&model.state_dict(""), "model.safetensors")?;
    type">Ok(())
}

Complete training ecosystem

Everything you need to define, train, and deploy deep learning models — from autograd to GPU-fused optimizers.

Layers & Modules

Linear, Conv2d, MaxPool2d, Flatten, Dropout — with automatic parameter collection via #[derive(Module)].

LinearConv2dMaxPool2dFlattenDropoutReLU

Optimizers

SGD with momentum, Adam, and AdamW with decoupled weight decay. All GPU-fused for zero host-device round-trips during training.

SGDAdamAdamW

Loss Functions

MSE and Cross-Entropy with Log-Sum-Exp numerical stability. Gradients are pre-computed in the forward pass for efficiency.

MSE LossCross-Entropy Loss

Training Loop

Trainer struct with closure-based train_step(), automatic epoch loss tracking, and BufferPool memory recycling on Drop.

Trainer<O>train_step()epoch_avg_loss()

Ready to get started?

Add RUMUS to your Rust project and start building memory-safe deep learning models today.