Scalable Sequence Modeling: Mamba vs. Transformers

Published:

Technologies: Python, PyTorch, CUDA

View on GitHub

Description

  • Architecture Implementation: Implemented and benchmarked Mamba (Selective SSM) and Transformer architectures from scratch to investigate trade-offs in computational complexity (O(L) vs O(L^2)) and memory scaling.
  • Performance Analysis: Conducted rigorous performance analysis, demonstrating Mamba’s superior efficiency with a Test Loss of 1.80 (vs. 1.86 for Transformer) while enabling constant-time inference.
  • Optimization: Executed extensive hyperparameter sweeps (state dimension, head count) to optimize next-token prediction, aligning with research into resource-efficient ML systems.