Documentation

Everything you need to build, train, and deploy deep learning models with RUMUS. From first principles to production-ready GPU-accelerated pipelines — all with the safety and performance guarantees of Rust.

Getting Started

Install RUMUS, build your first neural network, and train it in minutes. A hands-on introduction to the framework.

Tensors

Learn how RUMUS tensors work — creation, indexing, N-dimensional broadcasting, FP16 mixed precision with DType-aware storage, INT8 block quantization, memory-mapped .rrec data format, and efficient memory layout backed by Rust's ownership model.

Autograd

Understand automatic differentiation in RUMUS. Build computation graphs, call backward(), and inspect gradients with zero-cost abstractions.

Neural Networks

Compose models with 12 layers including TransformerBlock, MultiheadAttention, Linear, Conv2d, LayerNorm, BatchNorm, Embedding, and more. Includes FlashAttention, DataParallel, FSDP, Custom Ops, rumus-serve inference server, rumus-graph for Graph Neural Networks, and rumus-vision for direct-convolution CNN ops. INT4 quantized inference via rumus-vision, 3D parallelism via rumus-distributed. Use the #[derive(Module)] proc macro for ergonomic model definitions with ONNX export for deployment.

Optimizers

Train your models with SGD, Adam, and AdamW optimizers. Configure learning rates, momentum, weight decay, LR schedulers, gradient clipping, multithreaded DataLoader, and RecordWriter/RecordDataset for high-throughput data.

GPU Acceleration

Accelerate training and inference with 24+ WGSL shader modules via WebGPU. Includes JIT kernel fusion, FlashAttention, multi-GPU support, custom ops API, a WGSL preprocessor for precision-agnostic shaders, Q8 mixed-precision matmul kernels, and INT4 fused dequant-matmul kernels. Move tensors to the GPU with a single call — supports fused layer norm, batched matmul, softmax, and more.