About Me

Maulik Madhavi has been associated with the speech signal processing field since 2010. He received a Ph.D. degree in Information and Communication Systems from the Dhirubhai Ambani Institute of Information and Communication Technology (DA‑IICT) in Gandhinagar, India, in 2017. He completed an M.Tech. degree in ICT with a specialization in Communication Systems from DA‑IICT in Gandhinagar, India.

He was part of the Department of Electronics and Information Technology (DeIT) sponsored consortium project “Development of a Prosodically Guided Phonetic Engine for Searching Speech Databases in Indian Languages” from April 2012 to June 2014. During his master’s and doctoral studies at DA‑IICT, he served as a teaching assistant and tutor for eight courses from August 2009 to April 2012 and July 2014 to May 2017.

He was a research fellow at the National University of Singapore (NUS) from December 2017 to April 2021, where he mentored eight graduate and master’s students on their final‑year projects. His research focused on spoken‑dialogue systems for autonomous vehicles and speech recognition for healthcare applications.

Since April 2021, he has worked as a video analytics researcher at NCS Pte. Ltd., focusing on generative AI projects for vision systems. He contributes to the in‑house no‑code platform KaICC, developing algorithms and models for vision-model training and inference.

He received an IAPR (International Association for Pattern Recognition) travel scholarship for presenting at the 2012 International Conference on Biometrics (ICB 2012) in Delhi, India.

Research Interests

Spoken information retrieval
Applications of spoken language technology
Spoken language understanding
Vision language system
Deeplearning models optimization for inference speed and accuracy

Research Keywords

🗣️ Speech signal processing
🔎 Speech information retrieval (spoken term detection)
💬 Dialogue understanding
🧠 Technology for spoken language processing
🧭 Feature indexing for large‑scale search
👁️ Vision‑language perception
🎯 Object detection
🖼️ Image recognition

gantt title Skills Timeline dateFormat YYYY axisFormat %Y section Programming Languages C++ :done, 2010, 2015 Python :done, 2014, 2025 JavaScript :done, 2022, 2025 section ML Frameworks MATLAB :done, 2007, 2020 Numpy/Pandas/Matplotlib :done, 2015, 2025 TensorFlow :done, 2016, 2022 PyTorch :done, 2016, 2025 OpenCV/Transformers :done, 2021, 2025 section LLM Tools LangChain :done, 2023, 2025 vLLM/DSPY/n8n :done, 2025, 2025 section Web Frameworks Flask :done, 2017, 2025 FastAPI :done, 2020, 2025 Gradio :done, 2021, 2025

Tools

Programming Languages:
ML Frameworks:
LLM Tools:
ML Tools:
Deployment Tools:
Web Frameworks:

Contact

✉️ E‑mail: maulikmadhavi[AT]gmail[DOT]com

Research Projects

🚍🤖 Autonomous Bus Chatbot [Year 2018-2020]

Work: NUS Singapore
Description: Developed a spoken-dialogue system for an autonomous bus to assist passengers with navigation, route guidance, and emergency support.
🔊 Integrated advanced ASR and chatbot interfaces for hands-free passenger communication, enabling voice-based queries and support.

Tools Involved:

Tensorflow for wakeup-word CNN model development.
Python Flask for backend server development.
RASA framework for dialogue management system.
Docker for environment setup and deployment.

Dialogue Management System:

Development of natural language understanding (NLU) module for intent recognition and entity extraction from ASR generated text.
Implement dialogue management system to handle multi-turn conversations and context tracking.
Use Google search as a fallback mechanism to handle out-of-scope queries.

Backend Server and Android Client:

Used the Singaporean accent ASR model trained by NUS team for localized speech recognition.
Designed and implemented RESTful APIs for seamless communication between the Android client and backend server.
Developed a Python Flask server to handle real-time spoken-dialogue interactions and manage passenger queries.

Wakeup-word in Android UI

Implemented a wakeup-word feature to replace traditional push-to-talk interaction, enabling hands-free activation within the Android application.
Integrated a Convolutional Neural Network (CNN) model using TensorFlow Lite directly in the Android source code for efficient on-device inference.
Developed and open-sourced the complete implementation, available at: Hellobus_tflite GitHub repository, including model training scripts and Android integration.
Provided a Dockerfile for streamlined environment setup and reproducible builds, facilitating easy deployment and testing.

Tools

Programming Languages:
ML Frameworks:
LLM Tools:
ML Tools:
Deployment Tools:
Web Frameworks:

Publications

Journal Publications

M. C. Madhavi and H. A. Patil, ‘‘Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection,’’ in Computer Speech & Language, Elsevier, vol. 58, pp. 175-202, November 2019.
M. C. Madhavi, and H. A. Patil, ‘‘Design of Mixture of GMMs for Query-by-Example Spoken Term Detection,’’ in Computer Speech & Language, Elsevier, vol. 52, pp. 41-55, November 2018.
H. A. Patil, and M. C. Madhavi, ‘‘Combining from Magnitude and Phase Information using VTEO for Person Recognition using Humming,’’ in special issue of Recent advances in speaker and language recognition and characterization Computer Speech and Language, Elsevier, vol. 52, pp. 225-256, November 2018.
M. C. Madhavi, and H. A. Patil, ‘‘Partial Matching and Search Space Reduction for QbE-STD,’’ in Computer Speech & Language, Elsevier, vol. 45, pp. 58-82, September 2017.
H. A. Patil, M. C. Madhavi, K. K. Parhi, ‘‘Static and dynamic information derived from and system features for person recognition from humming,’’ I. J. Speech Technology, vol. 15, no. 3, pp. 393-406, 2012.

Book Chapters

M. C. Madhavi, and H. A. Patil, ‘‘Spoken Keyword Retrieval using Source and System Features,’’ Int. Conf. on Pattern Recognition and Machine Intelligence (PReMI), Kolkata, India, Dec. 05 - 08, 2017.
M. C. Madhavi, S. Sharma, H. A. Patil, ‘‘VTLN Using Different Warping Functions for Template Matching,’’ Machine Intelligence and Big Data in Industry, Springer International Publishing, D. Ryżko, P. Gawrysiak, M. Kryszkiewicz, H. Rybiński, (Eds.), pp. 111-121, 2016.
M. C. Madhavi, S. Sharma, and H. A. Patil, ‘‘Vocal tract length normalization features for audio search,’’ in Int. Conf. Text, Speech, and Dialogue, TSD, P. Král, V. Matoušek (Eds.), Pilsen, Czech Republic, pp. 387-395, 2015.
Y. Gaur, M. C. Madhavi, and H. A. Patil, ‘‘Speaker recognition using sparse representation via superimposed features,’’ in P. Maji et. al. (Eds.), Lecture Notes in Computer Science (LNCS), vol 8251, pp.140-147, Springer-Verlag, Berlin Heidelberg, Germany, 2013.
H. A. Patil, M. C. Madhavi, R. Jain, and A. Jain, ‘‘Combining from temporal and spectral features for person recognition from humming,’’ in Malay K. Kundu et al. (Eds.) PerMIn, Lecture Notes in Computer Science (LNCS), vol. 7143, pp. 321-328, Springer-Verlag, 2012.

International Conferences

B. Sharma, M. Madhavi, X. Zhou, H. Li, ‘‘Exploring teacher-student learning approach for multi-lingual speech-to-intent classification,’’ in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2021.
R. Das, M. Madhavi, H. Li, ‘‘Diagnosis of COVID-19 using Auditory Acoustic Cues,’’ in Proc. Interspeech, Brno, Czech Republic, Aug-Sep. 2021, pp. 921-925.
Y. Jiang, B. Sharma, M. Madhavi, H. Li, ‘‘Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification,’’ in Proc. Interspeech, Brno, Czech Republic, Aug-Sep. 2021, pp. 4713-4717.
X. Qian, M. Madhavi, Z. Pan, J. Wang, H. Li, ‘‘Multi-target DoA estimation with an audio-visual fusion mechanism,’’ in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, June 2021, pp. 4280-4284.
B. Sharma, M. Madhavi, H. Li, ‘‘Leveraging Acoustic and Linguistic Embeddings from Pretrained Speech and Language Models for Intent Classification,’’ in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, June 2021, pp. 7498-7502.
Y. Ong, M. Madhavi, and K. Chan, ‘‘OPENNLU: Open-Source Web-Interface NLU Toolkit for Development of Conversational Agent’’, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2020, Auckland, New Zealand, December 2020, pp. 381-385.
N. Shah, Sreeraj R, M. Madhavi, N. Shah, and H. Patil, ‘‘Query-by-Example Spoken Term Detection using Generative Adversarial Network’’, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2020, Auckland, New Zealand, December 2020, pp. 644-648.
W. Lin, M. Madhavi, R. Das and H. Li, ‘‘Transformer-based Arabic Dialect Identification,’’ in Proc. International Conference on Asian Language Processing (IALP), Kuala Lumpur, Malaysia, December 2020, pp. 192-196.
T. Liu, R. Das, M. Madhavi, S. Shen and H. Li, ‘‘Speaker-Utterance Dual Attention for Speaker and Utterance Verification,’’ in Interspeech, Shanghai, China, October 2020, pp. 4293-4297.
R. Sheelvant, B. Sharma, M. Madhavi, R. Das, S.R.M. Prasanna and H. Li ‘‘RSL2019: A Realistic Speech Localization Corpus’’ in Proc. International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (COCOSDA), Cebu City, Philippines, October 2019.
T. Liu, M. Madhavi, R. Das and Haizhou Li ‘‘A Unified Framework for Speaker and Utterance Verification’’ INTERSPEECH 2019, Graz, Austria, September 2019, pp. 4320-4324.
M. Madhavi, T. Zhan, H. Li and M. Yuan, ‘‘First Leap Towards Development of Dialogue System for Autonomous Bus’’, In. Proc. International Workshop on Spoken Dialogue Systems Technology (IWSDS), Sicily, Italy, April 2019, pp. 1-6.
R. Das, M. Madhavi, and H. Li, ‘‘Compensating Utterance Information in Fixed Phrase Speaker Verification,’’ in Asia Pacific Signal and Information Processing Association (APSIPA), 12-15 Nov. 2018, Honolulu, Hawaii, USA.
P. Tapkir, M. Kamble, H. Patil, and M. Madhavi, ‘‘Replay Spoof Detection using Power Function Based Features,’’ in Asia Pacific Signal and Information Processing Association (APSIPA), 12-15 Nov. 2018, Honolulu, Hawaii, USA.
N. Shah, M. Madhavi, and H. Patil, ‘‘Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion, ‘’ INTERSPEECH 2018, Hyderabad, India, 2-6 September 2018, pp. 1968-1972.
M. C. Madhavi, and H. A. Patil, ‘‘VTLN-Warped Gaussian for QbE-STD,’’ in 25th European Signal Process. Conf., EUSIPCO, Kos island, Greece, Aug. 28-Sep. 2, 2017, pp. 563-567.
M. C. Madhavi, and H. A. Patil, ‘‘Two Stage Zero-resource Approaches for QbE-STD,’’ in Ninth International Conference on Advances in Pattern Recognition (ICAPR-2017), Kolkata, India, December 28-30, 2017.
M. C. Madhavi, and H. A. Patil, ‘‘Combining evidences from detection sources for query-by-example spoken term detection,’’ in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Kuala Lumpur, Malaysia, December 12-15, 2017, pp. 563-568.
A. Rajpal, T. B. Patel, H. B. Sailor, M. C. Madhavi, H. A. Patil, and H. Fujisaki, ‘‘Native Language Identification Using Spectral and Source-Based Features,’’ in 17th Proc. Annual Conf. of Int. Speech Communication Association (ISCA), INTERSPEECH, San Francisco, USA, 8-12 Sept. 2016, pp. 2383-2387.
M. C. Madhavi, and H. A. Patil, ‘‘Modification in Sequential Dynamic Time Warping for Fast Computation of Query-by-Example Spoken Term Detection Task,’’ in Int. Conf. on Signal Processing and Communications (SPCOM), IISc Bangalore, India, June 12-15, 2016, pp. 1-6.
M. C. Madhavi, H. A. Patil, and B. B. Vachhani, ‘‘Spectral transition measure for detection of obstruents,’’ in 23rd European Signal Process. Conf., EUSIPCO, Nice, France, Aug 31 - Sept. 4, 2015, pp. 330-334.
H. B. Sailor, M. C. Madhavi, and H. A. Patil, ‘‘Significance of phase-based features for person recognition using humming,’’ in 2nd Int. Conf. on Perception and Machine Intelligence (PerMin), C-DAC, Kolkata, Feb. 26-27, 2015, pp. 99-103.
B. Vachhani, K. D. , M. C. Madhavi and H. A. Patil, ‘‘A spectral transition measure based MELCEPSTRAL features for obstruent detection,’’ in Int. Conf. on Asian Lang. Process. (IALP ‘14), Kuching, Malaysia, 2014, pp. 50-53.
S. Sharma, M. C. Madhavi and H. A. Patil, ‘‘Vocal Tract Length Normalization for Vowel Recognition in Low Resource Languages,’’ in Int. Conf. on Asian Lang. Process. (IALP ‘14), Kuching, Malaysia, 2014, pp. 54-57.
M. C. Madhavi, S. Sharma, and H. A. Patil, ‘‘Development of language resources for speech application in Gujarati and Marathi,’’ in Int. Conf. on Asian Lang. Process., (IALP), Kuching, Malaysia, 2014, pp. 115-118.
A. Undhad , H. Patil, and M. C. Madhavi, ‘‘Exploiting speech source information for vowel landmark detection for low resource language,’’ in the 9th Int. Symposium on Chinese Spoken Language Processing, ISCSLP’14, Singapore, Sep. 12-14, 2014, pp. 546-550.
N. Shah, H. Patil, M. Madhavi, H. Sailor and T. Patel, ‘‘Deterministic Annealing EM Algorithm for Developing TTS System in Gujarati,’’ in the 9th Int. Symposium on Chinese Spoken Language Processing, ISCSLP’14, Singapore, Sep. 12-14, 2014, pp. 526-530.
M. C. Madhavi, and H. A. Patil, ‘‘Exploiting Variable length Teager Energy Operator in features for person recognition from humming,’’ in the 9th Int. Symposium on Chinese Spoken Language Processing, ISCSLP’14, Singapore, Sep. 12-14, 2014, pp. 624-628.
S. Sharma, M. C. Madhavi and H. A. Patil, ‘‘Development of Vocal Tract Length Normalized Phonetic Engine for Gujarati and Marathi Languages,’’ in The 17th Oriental COCOSDA’14, Phuket, Thailand, Sept. 10-12, 2014.
K. D. , B. B. Vachhani, M. C. Madhavi, N. H. Chhayani, and H. A. Patil. ‘‘Development of speech corpora in Gujarati and Marathi for phonetic transcription,’’ in Int. Conf. Oriental COCOSDA held jointly with 2013 Conf. on Asian Spoken Lang. Research and Evaluation (O-COCOSDA/), 2013, Gurgaon, India, pp. 1-6. 2013.
H. A. Patil, M. C. Madhavi, K. D. Malde, and B. B. Vachhani, ‘‘Phonetic Transcription of Fricatives and Plosives for Gujarati and Marathi Languages, ‘’ in Int. Conf. on Asian Lang. Process. (IALP), Hanoi, Vietnam, November 13-15, 2012, pp. 177-180.
H. A. Patil, M. C. Madhavi, and N. H. Chhayani, ‘‘Person Recognition Using Humming, and Speech, ‘’ in Int. Conf. on Asian Lang. Process. (IALP), Hanoi, Vietnam, November 13-15, 2012, pp. 149-152.
H. A. Patil, and M. C. Madhavi, ‘‘Significance of magnitude and phase information via VTEO for humming based biometrics,’’ in Proc. Int. Conf. on Biometrics (ICB), New Delhi, India, 2012, pp. 372-377.
H. A. Patil, M. C. Madhavi, and K. K. Parhi,’‘Combining Evidence from Spectral and Source-Like Features for Person Recognition from Humming,’’ in 12th Proc. Annual Conf. of Int. Speech Communication Association (ISCA), INTERSPEECH, Florence, Italy, August 27-31, 2011, pp. 369-372.

About Me

Research Interests

Research Keywords

Tools

Contact

Links

Research Projects

🚍🤖 Autonomous Bus Chatbot [Year 2018-2020]

Tools Involved:

Dialogue Management System:

Backend Server and Android Client:

Wakeup-word in Android UI

Tools

Publications

Journal Publications

Book Chapters

International Conferences