About Me

Hi I'm Mayank Mishra.

I am currently working as a Research Engineer at IBM Research India. I am a B.Tech graduate in Electrical Engineering from Indian Institute of Technology Delhi.
My research interests include Quantum Computing, Deep Learning, Reinforcement Learning, Natural Language Processing, Classical AI and Information Theory.
I also love listening to music and gaming. I regularly stream on twitch and sometimes on youtube. Feel free to stop by and say hi.

Work experience

BigCode [Oct 2022 - Present]

  • Working on the training and inference team for reducing training time and inference latency for large language models
  • Working on implementing Multi-Query Attention and Flash Attention in Megatron-LM for distributed training

IBM Research [Aug 2020 - Present]

SAMSUNG Research Institute [Aug 2020 - Present]

  • Implemented a new user authentication system using smartphone sensors for real-time authentication
  • Used LSTM based Variational Autoencoders for projecting the obtained time-series on a lower dimensional manifold
  • Used few-shot learning to reason over user profiles with minimal data points in an online learning environment
  • Created and deployed an android application on several devices and optimized the battery consumption of the same

Publications

Variational Learning for Unsupervised Knowledge Grounded Dialogs [Published in IJCAI]

  • Proposed a model to generate responses for dialogs grounded on information present in external knowledge sources
  • Retrieved the relevant textual documents from a large indexed collection of documents in an unsupervised fashion
  • Used a variational framework to take advantage of the posterior distribution to retrieve better documents during training
  • Also showed the efficacy of the proposed model on other tasks like question answering, classification etc
  • Published in International Joint Conferences on Artificial Intelligence (IJCAI 2022)

Adversarial Approximate Inference for Speech to Electroglottograph Conversion [Published in IEEE TASLP]

  • Optimized the Speech to Laryngograph encoder using adversarial training for the network using informative priors
  • Created a cosine based loss function for enforcing amplitude invariance between ground truth and network output
  • Used a variational inference approach for learning optimal representations for speech signal to infer the EGG signal
  • Demonstrated the advantages of using informative priors over Gaussian priors in the variational autoencoder setting
  • Utilized continuous wavelet transforms using Ricker wavelets for robust peak picking
  • Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) (Volume 27, Issue 12)

Arxiv preprint

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model [Submitted to JMLR]

  • Contributed to the training codebase of BLOOM, a 176-Billion parameter multilingual large language model
  • Implemented state-of-the-art AliBi positional encodings to attend over long sequences unseen during training
  • Also implemented a novel checkpoint reshaping strategy to change the distributed configuration of a large model
  • Worked in open collaboration with researchers all over the world and gained experienced on large scale trainin
  • Submitted to Journal of Machine Learning Research (JMLR)

Arxiv preprint

Joint Reasoning on Hybrid-knowledge sources for Task-Oriented Dialog [Submitted to JMLR]

  • Worked on generating agent responses to conversations requiring reasoning over both structured databases and unstructured text documents
  • Created a modified version of the MultiWOZ dataset and showed that existing methods failed on the created dataset
  • Proposed a baseline model trained with Prompt+LM tuning to retrieve the relevant information (from both structured and unstructured sources) and generate the response
  • Submitted to European Chapter for the Association for Computational Linguistics (EACL 2023)

Arxiv preprint

Variational Inference with Latent Space Quantization for Adversarial Resilience [Submitted to AAAI]

  • Implemented a defense mechanism capitalizing on the expressive power of regularized latent space generative models
  • Trained Variational Autoencoders with a K-Lipschitz encoder to ensure closeness of similar images in the latent space
  • Proposed a mechanism for defending neural networks against adversarial examples using latent space quantization
  • Demonstrated the efficacy of the proposed mechanism against multiple attack types (black and white box) and methods
  • Submitted to Association for the Advancement of Artificial Intelligence (AAAI)

Arxiv preprint

Patents

EdgeEGG - A system and method for hand-held electrode free elctroglottograph using neural networks on programmable controllers

  • Proposed a safe contact-free ElectroGlottoGraph which provides an accurate estimate of EGG signal
  • Proposed a cost-effective and efficient mechanism with integrated speech sensors to allow edge computation of EGG
  • Designed a resource efficient hardware device optimizing both energy consumption and prediction latency, by performing computations on very low power micro-controllers
  • Also showed the efficacy of the proposed model on other tasks like question answering, classification etc
  • Indian Patent application, No. 201911036593

Projects

BLOOM-176B Inference Open-Source [Jul 2022 - Present]

  • Created an easy-to-use framework for deploying BLOOM-176B via a REST API or CLI for inference purposes
  • Experimented with approaches like HuggingFace Accelerate, DeepSpeed-Inference and DeepSpeed ZeRO for inference
  • Benchmarked throughput and latency for the 176 Billion parameter model on a single node with 8 A100 80GB GPUs
  • Experimented with quantization approaches (LLM.int8() and ZeroQuant) to reduce the memory footprint of the model
  • Contributed the source code to Megatron-DeepSpeed for benchmarking and serving BLOOM-176B with ease. Also added support for fp16, bf16 and quantized BLOOM model using both (LLM.int8() and ZeroQuant) quantization approaches
  • Official code maintainer for huggingface/transformers-bloom-inference repository

BLOOM-176B prompt tuning [Jul 2022 - Present]

  • Used prompt tuning to improve the performance of BLOOM-176B on a variety of tasks and datasets
  • Used DeepSpeed ZeRO stage 3 for prompt tuning in a distributed environment with 2 nodes with 8 GPUs each
  • Achieved state-of-the-art performance by training only 100k (6 × 10−5%) parameters of the BLOOM-176B model
  • Also created a REST API interface for serving prompt-tuned large language models like BLOOM

BLOOM-176B large-scale serving [Jul 2022 - Present]

  • Created a framework for large-scale models built upon BLOOM-176B open source contribution for over 200 researchers
  • Provided a variety of options (just like GPT-3) which included a UI, API calls for generated outputs, embeddings etc.
  • Also implemented a novel continuous batching scheme to increase server throughput to serve multiple people concurrently

Distributed Pretraining for GPT2 [Jul 2022 - Present]

  • Pretrained a GPT2-like decoder model on 16 A100 80GB GPUs on a large text corpus using Megatron-DeepSpeed
  • Experimented with different parallel configurations like Tensor Parallel, Pipeline Parallel & Fully Sharded Data Parallel
  • Optimized the model (3.55 B parameters) for high training throughput with the different parallel configurations
  • Found the best parallel configuration for taking advantage of both inter-node and intra-node GPU interconnects
  • Experimented with both 32-bit vs 16-bit for increasing training speed. Also experimented with fp16 and bf16 for stability

COVID-ASSIST [July 2021 - September 2021]

  • Worked on authoring a Watson Assistant skill to help out fellow IBMers in India during the COVID pandemic
  • Authored the skill to allow the users to request for medicines, emergency supplies, vaccination, information about COVID, doctor’s appointment etc.
  • The work involved external collaborations including organizations such as Indian Council of Medical Research (ICMR), Department of Health Research, Ministry of Health and Family Welfare and Government of India

Watson Assistant Dialog Runtime [February 2018 - August 2019]

  • Worked on improving customer experience, fixing customer issues, PII leaks, providing new features for easier authoring of skills, catching unexpected exceptions that sometimes led to the Assistant getting stuck in unforeseen states
  • Also worked on upgrading dependencies like Google's gson project and the spring expressions project, to reduce vulnerabilities, which serve as the backbone of the dialog runtime
  • Removed the cloned repos for these open-source projects and modified the entire codebase of the dialog runtime to source the jars (of these projects) from org.apache.maven's jar repository rather than building their entire repos from scratch
  • Doing so reduced the dialog runtime's build times from 10-11 mins to 4-5 mins and also reduced the code size by 30,000 lines (approx.) making the code easier to understand for new team members. This also makes future upgrades a lot easier

DSTC9 Track 1 Challenge [February 2018 - August 2019]

  • Participated in the DSTC9 challenge organized by Amazon Alexa
  • Created models for generating responses to task-oriented dialogs where the required knowledge lies in external documents
  • Worked on retrieving the relevant knowledge in both supervised and unsupervised settings

Real-time Visual Respiration Rate Estimation with Dynamic Scene Adaptation [April 2019 - May 2019]

  • Used Computer Vision based techniques for estimating the respiration rate from the video footage of an individual
  • Used the proposed algorithm to correctly identify the patients suffering from pneumonia (fast breathing)
  • Implemented and optimized the algorithm to run on Raspberry Pi for detection in real-time in hospitals

Resource and profit optimization in electricity market [April 2019 - May 2019]

  • Developed new models for evaluating flexible resources in two-settlement electricity markets (day-ahead and real-time)
  • Worked on achieving equilibrium in two settlement electricity markets using Alternating Direction Method of Multipliers

Bias Correction in Deep Neural Networks [October 2018 - November 2018]

  • Worked on reducing dataset bias in neural networks for better generalization without training on multiple datasets
  • Trained an Auxilliary Classifier GAN (ACGAN) to generate images conditionally given the class from MNIST dataset
  • Used the original MNIST images and the conditionally generated images from the ACGAN to train a CNN classifier
  • Tested this classifier on a hand-written digits dataset collected in classroom and achieved state of the art performance
  • Released the 'Nearly MNIST' dataset for future research

Lecture Summarization using Deep Learning [February 2019 - March 2019]

  • Trained Convolutional LSTMs for summarizing video lectures of various online courses
  • Used Computer Vision techniques to find edge maps, optical flows and difference of consecutive frames of the videos
  • Used the engineered features for increased accuracy over conventional recurrent networks trained using raw frames
  • Implemented a WPF software in C# to summarize the video lectures and generate lecture notes in PDF format

Touch-Point Prediction using Deep Learning [May 2018 - December 2018]

  • Worked on improving touch-screen latency for the SAMSUNG Flip device without explicitely changing the hardware
  • Trained and benchmarked Fully Connected Networks, RNNs and LSTMs and analyzed their runtime performance
  • Implemented the said algorithms on the device yielding a low error rate with no significant impact on performance

Braille Tutoring Application [January 2018 - May 2018]

  • Implemented tutorials and games using Python for comprehensive learning of Braille by visually challenged students
  • Created a Linux based secondary software for the tutor to add customized exercises or games in the application
  • Deployed the application on a Beaglebone-based device running a Refreshable Braille Display
  • Provided tactile output and sound using an external Arduino based device connected to the Refreshable Braille Display
  • Tested the application with visually challenged students in the National Association for Blind

Identifying the Diabetic Neuropathic Patients using Machine Learning [November 2017 - December 2017]

  • Trained bi-directional LSTMs for the identification of Diabetic Neuropathic patients using foot pressure data
  • Implemented a WPF software in C# to record data using an Arduino based pressure mat