Mayank Mishra

Non-member of a non-technical staff at a non-technical, non-frontier lab trying to do technical work.

prof_pic.jpg
mayank [underscore] mishra [at] berkeley [dot] edu

I am PhD student at UC Berkeley exploring pretraining and model architectures. Prior to joining Berkeley, I was leading pretraining and model architecture research at MIT-IBM Watson Lab for IBM’s Granite Models.

I received my undergraduate education (B.Tech) in Electrical Engineering from the Indian Institute of Technology Delhi.

news

Aug 27, 2025 Started PhD at UC Berkeley!

latest posts

selected publications

  1. SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
    Wentao Guo, Mayank Mishra, Xinle Cheng, and 2 more authors
    arXiv preprint arXiv:2512.14080, 2025
  2. Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication Overlapping
    Muru Zhang*, Mayank Mishra*, Zhongzhu Zhou, and 7 more authors
    arXiv preprint arXiv:2501.06589, 2025
  3. Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
    William Brandon*, Mayank Mishra*, Aniruddha Nrusimha, and 2 more authors
    Advances in Neural Information Processing Systems, 2024
  4. Granite Code Models: A Family of Open Foundation Models for Code Intelligence
    Mayank Mishra*, Matt Stallone*, Gaoyuan Zhang*, and 8 more authors
    arXiv preprint arXiv:2405.04324, 2024