View My LinkedIn Profile | View My GitHub Profile | View My Google Scholar Profile | View My ResearchGate Profile
Maulik Madhavi has been associated with the speech signal processing field since 2010. He received a Ph.D. degree in Information and Communication Systems from the Dhirubhai Ambani Institute of Information and Communication Technology (DA‑IICT) in Gandhinagar, India, in 2017. He completed an M.Tech. degree in ICT with a specialization in Communication Systems from DA‑IICT in Gandhinagar, India.
He was part of the Department of Electronics and Information Technology (DeIT) sponsored consortium project “Development of a Prosodically Guided Phonetic Engine for Searching Speech Databases in Indian Languages” from April 2012 to June 2014. During his master’s and doctoral studies at DA‑IICT, he served as a teaching assistant and tutor for eight courses from August 2009 to April 2012 and July 2014 to May 2017.
He was a research fellow at the National University of Singapore (NUS) from December 2017 to April 2021, where he mentored eight graduate and master’s students on their final‑year projects. His research focused on spoken‑dialogue systems for autonomous vehicles and speech recognition for healthcare applications.
Since April 2021, he has worked as a video analytics researcher at NCS Pte. Ltd., focusing on generative AI projects for vision systems. He contributes to the in‑house no‑code platform KaICC, developing algorithms and models for vision-model training and inference.
He received an IAPR (International Association for Pattern Recognition) travel scholarship for presenting at the 2012 International Conference on Biometrics (ICB 2012) in Delhi, India.
✉️ E‑mail: maulikmadhavi[AT]gmail[DOT]com
M. C. Madhavi and H. A. Patil, ‘‘Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection,’’ in Computer Speech & Language, Elsevier, vol. 58, pp. 175-202, November 2019.
M. C. Madhavi, and H. A. Patil, ‘‘Design of Mixture of GMMs for Query-by-Example Spoken Term Detection,’’ in Computer Speech & Language, Elsevier, vol. 52, pp. 41-55, November 2018.
H. A. Patil, and M. C. Madhavi, ‘‘Combining from Magnitude and Phase Information using VTEO for Person Recognition using Humming,’’ in special issue of Recent advances in speaker and language recognition and characterization Computer Speech and Language, Elsevier, vol. 52, pp. 225-256, November 2018.
M. C. Madhavi, and H. A. Patil, ‘‘Partial Matching and Search Space Reduction for QbE-STD,’’ in Computer Speech & Language, Elsevier, vol. 45, pp. 58-82, September 2017.
H. A. Patil, M. C. Madhavi, K. K. Parhi, ‘‘Static and dynamic information derived from and system features for person recognition from humming,’’ I. J. Speech Technology, vol. 15, no. 3, pp. 393-406, 2012.
M. C. Madhavi, and H. A. Patil, ‘‘Spoken Keyword Retrieval using Source and System Features,’’ Int. Conf. on Pattern Recognition and Machine Intelligence (PReMI), Kolkata, India, Dec. 05 - 08, 2017.
M. C. Madhavi, S. Sharma, H. A. Patil, ‘‘VTLN Using Different Warping Functions for Template Matching,’’ Machine Intelligence and Big Data in Industry, Springer International Publishing, D. Ryżko, P. Gawrysiak, M. Kryszkiewicz, H. Rybiński, (Eds.), pp. 111-121, 2016.
M. C. Madhavi, S. Sharma, and H. A. Patil, ‘‘Vocal tract length normalization features for audio search,’’ in Int. Conf. Text, Speech, and Dialogue, TSD, P. Král, V. Matoušek (Eds.), Pilsen, Czech Republic, pp. 387-395, 2015.
Y. Gaur, M. C. Madhavi, and H. A. Patil, ‘‘Speaker recognition using sparse representation via superimposed features,’’ in P. Maji et. al. (Eds.), Lecture Notes in Computer Science (LNCS), vol 8251, pp.140-147, Springer-Verlag, Berlin Heidelberg, Germany, 2013.
H. A. Patil, M. C. Madhavi, R. Jain, and A. Jain, ‘‘Combining from temporal and spectral features for person recognition from humming,’’ in Malay K. Kundu et al. (Eds.) PerMIn, Lecture Notes in Computer Science (LNCS), vol. 7143, pp. 321-328, Springer-Verlag, 2012.
B. Sharma, M. Madhavi, X. Zhou, H. Li, ‘‘Exploring teacher-student learning approach for multi-lingual speech-to-intent classification,’’ in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2021.
R. Das, M. Madhavi, H. Li, ‘‘Diagnosis of COVID-19 using Auditory Acoustic Cues,’’ in Proc. Interspeech, Brno, Czech Republic, Aug-Sep. 2021, pp. 921-925.
Y. Jiang, B. Sharma, M. Madhavi, H. Li, ‘‘Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification,’’ in Proc. Interspeech, Brno, Czech Republic, Aug-Sep. 2021, pp. 4713-4717.
X. Qian, M. Madhavi, Z. Pan, J. Wang, H. Li, ‘‘Multi-target DoA estimation with an audio-visual fusion mechanism,’’ in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, June 2021, pp. 4280-4284.
B. Sharma, M. Madhavi, H. Li, ‘‘Leveraging Acoustic and Linguistic Embeddings from Pretrained Speech and Language Models for Intent Classification,’’ in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, June 2021, pp. 7498-7502.
Y. Ong, M. Madhavi, and K. Chan, ‘‘OPENNLU: Open-Source Web-Interface NLU Toolkit for Development of Conversational Agent’’, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2020, Auckland, New Zealand, December 2020, pp. 381-385.
N. Shah, Sreeraj R, M. Madhavi, N. Shah, and H. Patil, ‘‘Query-by-Example Spoken Term Detection using Generative Adversarial Network’’, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2020, Auckland, New Zealand, December 2020, pp. 644-648.
W. Lin, M. Madhavi, R. Das and H. Li, ‘‘Transformer-based Arabic Dialect Identification,’’ in Proc. International Conference on Asian Language Processing (IALP), Kuala Lumpur, Malaysia, December 2020, pp. 192-196.
T. Liu, R. Das, M. Madhavi, S. Shen and H. Li, ‘‘Speaker-Utterance Dual Attention for Speaker and Utterance Verification,’’ in Interspeech, Shanghai, China, October 2020, pp. 4293-4297.
R. Sheelvant, B. Sharma, M. Madhavi, R. Das, S.R.M. Prasanna and H. Li ‘‘RSL2019: A Realistic Speech Localization Corpus’’ in Proc. International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (COCOSDA), Cebu City, Philippines, October 2019.
T. Liu, M. Madhavi, R. Das and Haizhou Li ‘‘A Unified Framework for Speaker and Utterance Verification’’ INTERSPEECH 2019, Graz, Austria, September 2019, pp. 4320-4324.
M. Madhavi, T. Zhan, H. Li and M. Yuan, ‘‘First Leap Towards Development of Dialogue System for Autonomous Bus’’, In. Proc. International Workshop on Spoken Dialogue Systems Technology (IWSDS), Sicily, Italy, April 2019, pp. 1-6.
R. Das, M. Madhavi, and H. Li, ‘‘Compensating Utterance Information in Fixed Phrase Speaker Verification,’’ in Asia Pacific Signal and Information Processing Association (APSIPA), 12-15 Nov. 2018, Honolulu, Hawaii, USA.
P. Tapkir, M. Kamble, H. Patil, and M. Madhavi, ‘‘Replay Spoof Detection using Power Function Based Features,’’ in Asia Pacific Signal and Information Processing Association (APSIPA), 12-15 Nov. 2018, Honolulu, Hawaii, USA.
N. Shah, M. Madhavi, and H. Patil, ‘‘Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion, ‘’ INTERSPEECH 2018, Hyderabad, India, 2-6 September 2018, pp. 1968-1972.
M. C. Madhavi, and H. A. Patil, ‘‘VTLN-Warped Gaussian for QbE-STD,’’ in 25th European Signal Process. Conf., EUSIPCO, Kos island, Greece, Aug. 28-Sep. 2, 2017, pp. 563-567.
M. C. Madhavi, and H. A. Patil, ‘‘Two Stage Zero-resource Approaches for QbE-STD,’’ in Ninth International Conference on Advances in Pattern Recognition (ICAPR-2017), Kolkata, India, December 28-30, 2017.
M. C. Madhavi, and H. A. Patil, ‘‘Combining evidences from detection sources for query-by-example spoken term detection,’’ in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Kuala Lumpur, Malaysia, December 12-15, 2017, pp. 563-568.
A. Rajpal, T. B. Patel, H. B. Sailor, M. C. Madhavi, H. A. Patil, and H. Fujisaki, ‘‘Native Language Identification Using Spectral and Source-Based Features,’’ in 17th Proc. Annual Conf. of Int. Speech Communication Association (ISCA), INTERSPEECH, San Francisco, USA, 8-12 Sept. 2016, pp. 2383-2387.
M. C. Madhavi, and H. A. Patil, ‘‘Modification in Sequential Dynamic Time Warping for Fast Computation of Query-by-Example Spoken Term Detection Task,’’ in Int. Conf. on Signal Processing and Communications (SPCOM), IISc Bangalore, India, June 12-15, 2016, pp. 1-6.
M. C. Madhavi, H. A. Patil, and B. B. Vachhani, ‘‘Spectral transition measure for detection of obstruents,’’ in 23rd European Signal Process. Conf., EUSIPCO, Nice, France, Aug 31 - Sept. 4, 2015, pp. 330-334.
H. B. Sailor, M. C. Madhavi, and H. A. Patil, ‘‘Significance of phase-based features for person recognition using humming,’’ in 2nd Int. Conf. on Perception and Machine Intelligence (PerMin), C-DAC, Kolkata, Feb. 26-27, 2015, pp. 99-103.
B. Vachhani, K. D. , M. C. Madhavi and H. A. Patil, ‘‘A spectral transition measure based MELCEPSTRAL features for obstruent detection,’’ in Int. Conf. on Asian Lang. Process. (IALP ‘14), Kuching, Malaysia, 2014, pp. 50-53.
S. Sharma, M. C. Madhavi and H. A. Patil, ‘‘Vocal Tract Length Normalization for Vowel Recognition in Low Resource Languages,’’ in Int. Conf. on Asian Lang. Process. (IALP ‘14), Kuching, Malaysia, 2014, pp. 54-57.
M. C. Madhavi, S. Sharma, and H. A. Patil, ‘‘Development of language resources for speech application in Gujarati and Marathi,’’ in Int. Conf. on Asian Lang. Process., (IALP), Kuching, Malaysia, 2014, pp. 115-118.
A. Undhad , H. Patil, and M. C. Madhavi, ‘‘Exploiting speech source information for vowel landmark detection for low resource language,’’ in the 9th Int. Symposium on Chinese Spoken Language Processing, ISCSLP’14, Singapore, Sep. 12-14, 2014, pp. 546-550.
N. Shah, H. Patil, M. Madhavi, H. Sailor and T. Patel, ‘‘Deterministic Annealing EM Algorithm for Developing TTS System in Gujarati,’’ in the 9th Int. Symposium on Chinese Spoken Language Processing, ISCSLP’14, Singapore, Sep. 12-14, 2014, pp. 526-530.
M. C. Madhavi, and H. A. Patil, ‘‘Exploiting Variable length Teager Energy Operator in features for person recognition from humming,’’ in the 9th Int. Symposium on Chinese Spoken Language Processing, ISCSLP’14, Singapore, Sep. 12-14, 2014, pp. 624-628.
S. Sharma, M. C. Madhavi and H. A. Patil, ‘‘Development of Vocal Tract Length Normalized Phonetic Engine for Gujarati and Marathi Languages,’’ in The 17th Oriental COCOSDA’14, Phuket, Thailand, Sept. 10-12, 2014.
K. D. , B. B. Vachhani, M. C. Madhavi, N. H. Chhayani, and H. A. Patil. ‘‘Development of speech corpora in Gujarati and Marathi for phonetic transcription,’’ in Int. Conf. Oriental COCOSDA held jointly with 2013 Conf. on Asian Spoken Lang. Research and Evaluation (O-COCOSDA/), 2013, Gurgaon, India, pp. 1-6. 2013.
H. A. Patil, M. C. Madhavi, K. D. Malde, and B. B. Vachhani, ‘‘Phonetic Transcription of Fricatives and Plosives for Gujarati and Marathi Languages, ‘’ in Int. Conf. on Asian Lang. Process. (IALP), Hanoi, Vietnam, November 13-15, 2012, pp. 177-180.
H. A. Patil, M. C. Madhavi, and N. H. Chhayani, ‘‘Person Recognition Using Humming, and Speech, ‘’ in Int. Conf. on Asian Lang. Process. (IALP), Hanoi, Vietnam, November 13-15, 2012, pp. 149-152.
H. A. Patil, and M. C. Madhavi, ‘‘Significance of magnitude and phase information via VTEO for humming based biometrics,’’ in Proc. Int. Conf. on Biometrics (ICB), New Delhi, India, 2012, pp. 372-377.
H. A. Patil, M. C. Madhavi, and K. K. Parhi,’‘Combining Evidence from Spectral and Source-Like Features for Person Recognition from Humming,’’ in 12th Proc. Annual Conf. of Int. Speech Communication Association (ISCA), INTERSPEECH, Florence, Italy, August 27-31, 2011, pp. 369-372.