Publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- SonicMoE: Accelerating MoE with IO and Tile-aware OptimizationsarXiv preprint arXiv:2512.14080, 2025
- FlashFormer: Whole-Model Kernels for Efficient Low-Batch InferencearXiv preprint arXiv:2505.22758, 2025
- PaTH Attention: Position Encoding via Accumulating Householder TransformationsarXiv preprint arXiv:2505.16381, 2025
- Granite Vision: a lightweight, open-source multimodal model for enterprise IntelligencearXiv preprint arXiv:2502.09927, 2025
- Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication OverlappingarXiv preprint arXiv:2501.06589, 2025
2024
- Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language ModelsarXiv preprint arXiv:2409.04787, 2024
- Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate SchedulerarXiv preprint arXiv:2408.13359, 2024
- Scaling Granite Code Models to 128K ContextarXiv preprint arXiv:2407.13739, 2024
- Enhancing Training Efficiency Using Packing with Flash AttentionarXiv preprint arXiv:2407.09105, 2024
- The infrastructure powering IBM’s Gen AI model developmentarXiv preprint arXiv:2407.05467, 2024
- Reducing Transformer Key-Value Cache Size with Cross-Layer AttentionAdvances in Neural Information Processing Systems, 2024
- Granite Code Models: A Family of Open Foundation Models for Code IntelligencearXiv preprint arXiv:2405.04324, 2024
- Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models2024
- Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation RegularizationarXiv preprint arXiv:2404.03605, 2024
- StarCoder 2 and The Stack v2: The Next GenerationarXiv preprint arXiv:2402.19173, 2024
- BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback2024
- Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code2024
- Granite 3.0 Language ModelsOct 2024
2023
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model2023
- Prompting with Pseudo-Code Instructions2023
- StarCoder: may the source be with you!2023
- SantaCoder: don’t reach for the stars!2023
- Joint Reasoning on Hybrid-knowledge sources for Task-Oriented Dialog2023
2022
2019
- Adversarial Approximate Inference for Speech to Electroglottograph ConversionIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019
- Variational Inference with Latent Space Quantization for Adversarial Resilience2019