Publications | Mayank Mishra

2026

M^2RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling

Mayank Mishra, Shawn Tan, Ion Stoica, and 2 more authors

2026

2025

SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations

Wentao Guo, Mayank Mishra, Xinle Cheng, and 2 more authors

arXiv preprint arXiv:2512.14080, 2025
FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference

Aniruddha Nrusimha, William Brandon, Mayank Mishra, and 4 more authors

arXiv preprint arXiv:2505.22758, 2025
PaTH Attention: Position Encoding via Accumulating Householder Transformations

Songlin Yang, Yikang Shen, Kaiyue Wen, and 5 more authors

arXiv preprint arXiv:2505.16381, 2025
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

Granite Vision Team, Leonid Karlinsky, Assaf Arbelle, and 8 more authors

arXiv preprint arXiv:2502.09927, 2025
Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication Overlapping

Muru Zhang^*, Mayank Mishra^*, Zhongzhu Zhou, and 7 more authors

arXiv preprint arXiv:2501.06589, 2025

2024

Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models

Sonam Gupta, Yatin Nandwani, Asaf Yehudai, and 4 more authors

arXiv preprint arXiv:2409.04787, 2024
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Yikang Shen, Matthew Stallone, Mayank Mishra, and 6 more authors

arXiv preprint arXiv:2408.13359, 2024
Scaling Granite Code Models to 128K Context

Matt Stallone, Vaibhav Saxena, Leonid Karlinsky, and 8 more authors

arXiv preprint arXiv:2407.13739, 2024
Enhancing Training Efficiency Using Packing with Flash Attention

Achintya Kundu, Rhui Dih Lee, Laura Wynter, and 2 more authors

arXiv preprint arXiv:2407.09105, 2024
The infrastructure powering IBM’s Gen AI model development

Talia Gershon, Seetharami Seelam, Brian Belgodere, and 8 more authors

arXiv preprint arXiv:2407.05467, 2024
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

William Brandon^*, Mayank Mishra^*, Aniruddha Nrusimha, and 2 more authors

Advances in Neural Information Processing Systems, 2024
Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Mayank Mishra^*, Matt Stallone^*, Gaoyuan Zhang^*, and 8 more authors

arXiv preprint arXiv:2405.04324, 2024
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Bowen Pan, Yikang Shen, Haokun Liu, and 5 more authors

2024
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

Aniruddha Nrusimha, Mayank Mishra, Naigang Wang, and 3 more authors

arXiv preprint arXiv:2404.03605, 2024
StarCoder 2 and The Stack v2: The Next Generation

Anton Lozhkov, Raymond Li, Loubna Ben Allal, and 8 more authors

arXiv preprint arXiv:2402.19173, 2024
BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback

Gaurav Pandey, Yatin Nandwani, Tahira Naseem, and 6 more authors

2024
Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code

Taishi Nakamura^*, Mayank Mishra^*, Simone Tedeschi^*, and 42 more authors

2024
Granite 3.0 Language Models

IBM Granite Team

Oct 2024

2023

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

BigScience Workshop, :, Teven Le Scao, and 391 more authors

2023
Prompting with Pseudo-Code Instructions

Mayank Mishra^*, Prince Kumar^*, Riyaz Bhat, and 3 more authors

2023
StarCoder: may the source be with you!

Raymond Li, Loubna Ben Allal, Yangtian Zi, and 64 more authors

2023
SantaCoder: don’t reach for the stars!

Loubna Ben Allal, Raymond Li, Denis Kocetkov, and 38 more authors

2023
Joint Reasoning on Hybrid-knowledge sources for Task-Oriented Dialog

Mayank Mishra, Danish Contractor, and Dinesh Raghu

2023

2022

Variational Learning for Unsupervised Knowledge Grounded Dialogs

Mayank Mishra, Dhiraj Madan, Gaurav Pandey, and 1 more author

In IJCAI-22, Jul 2022

DOI

2019

Adversarial Approximate Inference for Speech to Electroglottograph Conversion

Prathosh A. P.^*, Varun Srivastava^*, and Mayank Mishra^*

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019

DOI
Variational Inference with Latent Space Quantization for Adversarial Resilience

Vinay Kyatham^*, Mayank Mishra^*, Tarun Kumar Yadav, and 2 more authors

2019