Publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
    Wentao Guo, Mayank Mishra, Xinle Cheng, and 2 more authors
    arXiv preprint arXiv:2512.14080, 2025
  2. FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference
    Aniruddha Nrusimha, William Brandon, Mayank Mishra, and 4 more authors
    arXiv preprint arXiv:2505.22758, 2025
  3. PaTH Attention: Position Encoding via Accumulating Householder Transformations
    Songlin Yang, Yikang Shen, Kaiyue Wen, and 5 more authors
    arXiv preprint arXiv:2505.16381, 2025
  4. Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
    Granite Vision Team, Leonid Karlinsky, Assaf Arbelle, and 8 more authors
    arXiv preprint arXiv:2502.09927, 2025
  5. Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication Overlapping
    Muru Zhang*, Mayank Mishra*, Zhongzhu Zhou, and 7 more authors
    arXiv preprint arXiv:2501.06589, 2025

2024

  1. Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
    Sonam Gupta, Yatin Nandwani, Asaf Yehudai, and 4 more authors
    arXiv preprint arXiv:2409.04787, 2024
  2. Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
    Yikang Shen, Matthew Stallone, Mayank Mishra, and 6 more authors
    arXiv preprint arXiv:2408.13359, 2024
  3. Scaling Granite Code Models to 128K Context
    Matt Stallone, Vaibhav Saxena, Leonid Karlinsky, and 8 more authors
    arXiv preprint arXiv:2407.13739, 2024
  4. Enhancing Training Efficiency Using Packing with Flash Attention
    Achintya Kundu, Rhui Dih Lee, Laura Wynter, and 2 more authors
    arXiv preprint arXiv:2407.09105, 2024
  5. The infrastructure powering IBM’s Gen AI model development
    Talia Gershon, Seetharami Seelam, Brian Belgodere, and 8 more authors
    arXiv preprint arXiv:2407.05467, 2024
  6. Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
    William Brandon*, Mayank Mishra*, Aniruddha Nrusimha, and 2 more authors
    Advances in Neural Information Processing Systems, 2024
  7. Granite Code Models: A Family of Open Foundation Models for Code Intelligence
    Mayank Mishra*, Matt Stallone*, Gaoyuan Zhang*, and 8 more authors
    arXiv preprint arXiv:2405.04324, 2024
  8. Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
    Bowen Pan, Yikang Shen, Haokun Liu, and 5 more authors
    2024
  9. Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
    Aniruddha Nrusimha, Mayank Mishra, Naigang Wang, and 3 more authors
    arXiv preprint arXiv:2404.03605, 2024
  10. StarCoder 2 and The Stack v2: The Next Generation
    Anton Lozhkov, Raymond Li, Loubna Ben Allal, and 8 more authors
    arXiv preprint arXiv:2402.19173, 2024
  11. BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback
    Gaurav Pandey, Yatin Nandwani, Tahira Naseem, and 6 more authors
    2024
  12. Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code
    Taishi Nakamura*, Mayank Mishra*, Simone Tedeschi*, and 42 more authors
    2024
  13. Granite 3.0 Language Models
    IBM Granite Team
    Oct 2024

2023

  1. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
    BigScience Workshop, :, Teven Le Scao, and 391 more authors
    2023
  2. Prompting with Pseudo-Code Instructions
    Mayank Mishra*, Prince Kumar*, Riyaz Bhat, and 3 more authors
    2023
  3. StarCoder: may the source be with you!
    Raymond Li, Loubna Ben Allal, Yangtian Zi, and 64 more authors
    2023
  4. SantaCoder: don’t reach for the stars!
    Loubna Ben Allal, Raymond Li, Denis Kocetkov, and 38 more authors
    2023
  5. Joint Reasoning on Hybrid-knowledge sources for Task-Oriented Dialog
    Mayank Mishra, Danish Contractor, and Dinesh Raghu
    2023

2022

  1. Variational Learning for Unsupervised Knowledge Grounded Dialogs
    Mayank Mishra, Dhiraj Madan, Gaurav Pandey, and 1 more author
    In IJCAI-22, Jul 2022

2019

  1. Adversarial Approximate Inference for Speech to Electroglottograph Conversion
    Prathosh A. P.*, Varun Srivastava*, and Mayank Mishra*
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019
  2. Variational Inference with Latent Space Quantization for Adversarial Resilience
    Vinay Kyatham*, Mayank Mishra*, Tarun Kumar Yadav, and 2 more authors
    2019