Scalable Sequence Modeling: Mamba vs. Transformers
Published:
Technologies: Python, PyTorch, CUDA
Description
- Architecture Implementation: Implemented and benchmarked Mamba (Selective SSM) and Transformer architectures from scratch to investigate trade-offs in computational complexity (O(L) vs O(L^2)) and memory scaling.
- Performance Analysis: Conducted rigorous performance analysis, demonstrating Mamba’s superior efficiency with a Test Loss of 1.80 (vs. 1.86 for Transformer) while enabling constant-time inference.
- Optimization: Executed extensive hyperparameter sweeps (state dimension, head count) to optimize next-token prediction, aligning with research into resource-efficient ML systems.
