Mayank Mishra
Non-member of a non-technical staff at a non-technical, non-frontier lab trying to do technical work.
mayank [underscore] mishra [at] berkeley [dot] edu
I am PhD student at UC Berkeley exploring pretraining and model architectures. Prior to joining Berkeley, I was leading pretraining and model architecture research at MIT-IBM Watson Lab for IBM’s Granite Models.
I received my undergraduate education (B.Tech) in Electrical Engineering from the Indian Institute of Technology Delhi.
news
| Aug 27, 2025 | Started PhD at UC Berkeley! |
|---|
latest posts
selected publications
- SonicMoE: Accelerating MoE with IO and Tile-aware OptimizationsarXiv preprint arXiv:2512.14080, 2025
- Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication OverlappingarXiv preprint arXiv:2501.06589, 2025
- Reducing Transformer Key-Value Cache Size with Cross-Layer AttentionAdvances in Neural Information Processing Systems, 2024
- Granite Code Models: A Family of Open Foundation Models for Code IntelligencearXiv preprint arXiv:2405.04324, 2024