Scalable Sequence Modeling: Mamba vs. Transformers

Published: December 02, 2025

Technologies: Python, PyTorch, CUDA

Architecture Implementation: Implemented and benchmarked Mamba (Selective SSM) and Transformer architectures from scratch to investigate trade-offs in computational complexity (O(L) vs O(L^2)) and memory scaling.
Performance Analysis: Conducted rigorous performance analysis, demonstrating Mamba’s superior efficiency with a Test Loss of 1.80 (vs. 1.86 for Transformer) while enabling constant-time inference.
Optimization: Executed extensive hyperparameter sweeps (state dimension, head count) to optimize next-token prediction, aligning with research into resource-efficient ML systems.