Mayank Mishra

Non-member of a non-technical staff at a non-technical, non-frontier lab trying to do technical work.

prof_pic.jpg
mayank [underscore] mishra [at] berkeley [dot] edu

I am PhD student at UC Berkeley exploring pretraining and model architectures. Prior to joining Berkeley, I was leading pretraining and model architecture research at MIT-IBM Watson Lab for IBM’s Granite Models.

I received my undergraduate education (B.Tech) in Electrical Engineering from the Indian Institute of Technology Delhi.

news

Mar 19, 2026 Published M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
Aug 27, 2025 Started PhD at UC Berkeley!

latest posts

selected publications

  1. M^2RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
    Mayank Mishra, Shawn Tan, Ion Stoica, and 2 more authors
    2026
  2. SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
    Wentao Guo, Mayank Mishra, Xinle Cheng, and 2 more authors
    arXiv preprint arXiv:2512.14080, 2025
  3. Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication Overlapping
    Muru Zhang*, Mayank Mishra*, Zhongzhu Zhou, and 7 more authors
    arXiv preprint arXiv:2501.06589, 2025
  4. Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
    William Brandon*, Mayank Mishra*, Aniruddha Nrusimha, and 2 more authors
    Advances in Neural Information Processing Systems, 2024
  5. Granite Code Models: A Family of Open Foundation Models for Code Intelligence
    Mayank Mishra*, Matt Stallone*, Gaoyuan Zhang*, and 8 more authors
    arXiv preprint arXiv:2405.04324, 2024