Maulik Madhavi

Logo

Maulik Madhavi has been associated with Speech signal processing-field since 2010. He received Ph.D. degree (Information and Communication Systems) from Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, India in 2017. He received M.Tech (ICT) degree (Communication Systems specialization) from DA-IICT, Gandhinagar, India.

He was a part of Department of Electronics and Information Technology (DeITY), India sponsored consortium project, “Development of Prosodically Guided Phonetic Engine for Searching Speech Databases in Indian Languages” during April 2012 - June 2014 (2 years and three months). During his masters and doctoral studies at DA-IICT, he was teaching assistant/tutor at DA-IICT for eight different courses (August 2009-April 2012, July 2014-May 2017).

He was a research Fellow (Dec 2017-April 2021) at National University of Singapore (NUS). He has also mentored 7 NUS graduate students for final year project (FYP) and 1 Master student. Please refer to this link for relevant materials. He was involved in several research projects spoken dialogue system for autonomous vehicle, speech recognition for health care.

Currently, he is Video Analytics Researcher at NCS Pte Ltd (April 2021-). He is currently working on several algorithms related to video analytics. He is also actively working in the research of generative AI related projects to improve the vision system. He is involve in in-house product no-code platform (kaicc), where he contributed in algorithm and model developements for vision model training and [...]

He received IAPR (International Association for Pattern Recognition) Travel Scholarship for presenting our joint paper in International Conference on Biometrics, ICB'12, Delhi, India.

View My LinkedIn Profile View My GitHub Profile

View My GitHub Profile

About Maulik Madhavi

Maulik Madhavi has been associated with the speech signal processing field since 2010. He received a Ph.D. degree in Information and Communication Systems from the Dhirubhai Ambani Institute of Information and Communication Technology (DA‑IICT) in Gandhinagar, India, in 2017. He received an M.Tech. degree in ICT with a specialization in Communication Systems from DA‑IICT in Gandhinagar, India.

He was part of the Department of Electronics and Information Technology (DeIT) sponsored consortium project “Development of a Prosodically Guided Phonetic Engine for Searching Speech Databases in Indian Languages” from April 2012 to June 2014 (two years and three months). During his master’s and doctoral studies at DA‑IICT, he served as a teaching assistant/tutor for eight different courses from August 2009 to April 2012 and again from July 2014 to May 2017.

He was a research fellow from December 2017 to April 2021 at the National University of Singapore (NUS). He also mentored seven NUS graduate students on their final‑year projects (FYPs) and one master’s student. Please refer to this link for relevant materials. He was involved in several research projects on spoken‑dialogue systems for autonomous vehicles and on speech recognition for healthcare.

Since April 2021, he has worked as a video analytics researcher at NCS Pte. Ltd. He is actively researching generative‑AI projects aimed at improving vision systems. He is involved in the in‑house no‑code platform KaICC kaicc), where he has contributed to algorithms and model development for vision-model training and inference.

He received an IAPR (International Association for Pattern Recognition) travel scholarship for presenting a joint paper at the 2012 International Conference on Biometrics (ICB 2012) in Delhi, India.

Research Interests

Research Keywords

Speech signal processing, speech information retrieval, dialogue understanding, technology for spoken language processing, feature indexing for large scale search, visual perception via vision language models, Object detection, Image Recognition.

Tools

Contact

E-mail: maulikmadhavi[AT]gmail[DOT]com

Research Projects

Autonomous Bus Chatbot

Brief Info:

Skills involvement:

Wakeup-word in Android UI

Skills involvement:

Publications

Journal Publications

  1. M. C. Madhavi and H. A. Patil, “Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection,” in Computer Speech & Language, Elsevier, vol. 58, pp. 175-202, November 2019.

  2. M. C. Madhavi, and H. A. Patil, “Design of Mixture of GMMs for Query-by-Example Spoken Term Detection,” in Computer Speech & Language, Elsevier, vol. 52, pp. 41-55, November 2018.

  3. H. A. Patil, and M. C. Madhavi, “Combining from Magnitude and Phase Information using VTEO for Person Recognition using Humming,” in special issue of Recent advances in speaker and language recognition and characterization Computer Speech and Language, Elsevier, vol. 52, pp. 225-256, November 2018.

  4. M. C. Madhavi, and H. A. Patil, “Partial Matching and Search Space Reduction for QbE-STD,” in Computer Speech & Language, Elsevier, vol. 45, pp. 58-82, September 2017.

  5. H. A. Patil, M. C. Madhavi, K. K. Parhi, “Static and dynamic information derived from and system features for person recognition from humming,” I. J. Speech Technology, vol. 15, no. 3, pp. 393-406, 2012.

Book Chapters

  1. M. C. Madhavi, and H. A. Patil, ‘‘Spoken Keyword Retrieval using Source and System Features,’’ Int. Conf. on Pattern Recognition and Machine Intelligence (PReMI), Kolkata, India, Dec. 05 - 08, 2017.

  2. M. C. Madhavi, S. Sharma, H. A. Patil, ‘‘VTLN Using Different Warping Functions for Template Matching,’’ Machine Intelligence and Big Data in Industry, Springer International Publishing, D. Ryżko, P. Gawrysiak, M. Kryszkiewicz, H. Rybiński, (Eds.), pp. 111-121, 2016.

  3. M. C. Madhavi, S. Sharma, and H. A. Patil, ‘‘Vocal tract length normalization features for audio search,’’ in Int. Conf. Text, Speech, and Dialogue, TSD, P. Král, V. Matoušek (Eds.), Pilsen, Czech Republic, pp. 387-395, 2015.

  4. Y. Gaur, M. C. Madhavi, and H. A. Patil, ‘‘Speaker recognition using sparse representation via superimposed features,’’ in P. Maji et. al. (Eds.), Lecture Notes in Computer Science (LNCS), vol 8251, pp.140-147, Springer-Verlag, Berlin Heidelberg, Germany, 2013.

  5. H. A. Patil, M. C. Madhavi, R. Jain, and A. Jain, ‘‘Combining from temporal and spectral features for person recognition from humming,’’ in Malay K. Kundu et al. (Eds.) PerMIn, Lecture Notes in Computer Science (LNCS), vol. 7143, pp. 321-328, Springer-Verlag, 2012.

International Conferences

  1. B. Sharma, M. Madhavi, X. Zhou, H. Li, “Exploring teacher-student learning approach for multi-lingual speech-to-intent classification,’’ in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2021.

  2. R. Das, M. Madhavi, H. Li, “Diagnosis of COVID-19 using Auditory Acoustic Cues,’’ in Proc. Interspeech, Brno, Czech Republic, Aug-Sep. 2021, pp. 921-925.

  3. Y. Jiang, B. Sharma, M. Madhavi, H. Li, “Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification,’’ in Proc. Interspeech, Brno, Czech Republic, Aug-Sep. 2021, pp. 4713-4717.

  4. X. Qian, M. Madhavi, Z. Pan, J. Wang, H. Li, “Multi-target DoA estimation with an audio-visual fusion mechanism,” in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, June 2021, pp. 4280-4284.

  5. B. Sharma, M. Madhavi, H. Li, “Leveraging Acoustic and Linguistic Embeddings from Pretrained Speech and Language Models for Intent Classification,” in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, June 2021, pp. 7498-7502.

  6. Y. Ong, M. Madhavi, and K. Chan, “OPENNLU: Open-Source Web-Interface NLU Toolkit for Development of Conversational Agent”, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2020, Auckland, New Zealand, December 2020, pp. 381-385.

  7. N. Shah, Sreeraj R, M. Madhavi, N. Shah, and H. Patil, “Query-by-Example Spoken Term Detection using Generative Adversarial Network”, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2020, Auckland, New Zealand, December 2020, pp. 644-648.

  8. W. Lin, M. Madhavi, R. Das and H. Li, “Transformer-based Arabic Dialect Identification,” in Proc. International Conference on Asian Language Processing (IALP), Kuala Lumpur, Malaysia, December 2020, pp. 192-196.

  9. T. Liu, R. Das, M. Madhavi, S. Shen and H. Li, “Speaker-Utterance Dual Attention for Speaker and Utterance Verification,” in Interspeech, Shanghai, China, October 2020, pp. 4293-4297.

  10. R. Sheelvant, B. Sharma, M. Madhavi, R. Das, S.R.M. Prasanna and H. Li “RSL2019: A Realistic Speech Localization Corpus” in Proc. International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (COCOSDA), Cebu City, Philippines, October 2019.

  11. T. Liu, M. Madhavi, R. Das and Haizhou Li “A Unified Framework for Speaker and Utterance Verification” INTERSPEECH 2019, Graz, Austria, September 2019, pp. 4320-4324.

  12. M. Madhavi, T. Zhan, H. Li and M. Yuan, “First Leap Towards Development of Dialogue System for Autonomous Bus”, In. Proc. International Workshop on Spoken Dialogue Systems Technology (IWSDS), Sicily, Italy, April 2019, pp. 1-6.

  13. R. Das, M. Madhavi, and H. Li, “Compensating Utterance Information in Fixed Phrase Speaker Verification,” in Asia Pacific Signal and Information Processing Association (APSIPA), 12-15 Nov. 2018, Honolulu, Hawaii, USA.

  14. P. Tapkir, M. Kamble, H. Patil, and M. Madhavi, “Replay Spoof Detection using Power Function Based Features,” in Asia Pacific Signal and Information Processing Association (APSIPA), 12-15 Nov. 2018, Honolulu, Hawaii, USA.

  15. N. Shah, M. Madhavi, and H. Patil, “Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion, “ INTERSPEECH 2018, Hyderabad, India, 2-6 September 2018, pp. 1968-1972.

  16. M. C. Madhavi, and H. A. Patil, ‘‘VTLN-Warped Gaussian for QbE-STD,’’ in 25th European Signal Process. Conf., EUSIPCO, Kos island, Greece, Aug. 28-Sep. 2, 2017, pp. 563-567.

  17. M. C. Madhavi, and H. A. Patil, ‘‘Two Stage Zero-resource Approaches for QbE-STD,’’ in Ninth International Conference on Advances in Pattern Recognition (ICAPR-2017), Kolkata, India, December 28-30, 2017.

  18. M. C. Madhavi, and H. A. Patil, ‘‘Combining evidences from detection sources for query-by-example spoken term detection,’’ in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Kuala Lumpur, Malaysia, December 12-15, 2017, pp. 563-568.

  19. A. Rajpal, T. B. Patel, H. B. Sailor, M. C. Madhavi, H. A. Patil, and H. Fujisaki, ‘‘Native Language Identification Using Spectral and Source-Based Features,’’ in 17th Proc. Annual Conf. of Int. Speech Communication Association (ISCA), INTERSPEECH, San Francisco, USA, 8-12 Sept. 2016, pp. 2383-2387.

  20. M. C. Madhavi, and H. A. Patil, ‘‘Modification in Sequential Dynamic Time Warping for Fast Computation of Query-by-Example Spoken Term Detection Task,’’ in Int. Conf. on Signal Processing and Communications (SPCOM), IISc Bangalore, India, June 12-15, 2016, pp. 1-6.

  21. M. C. Madhavi, H. A. Patil, and B. B. Vachhani, ‘‘Spectral transition measure for detection of obstruents,” in 23rd European Signal Process. Conf., EUSIPCO, Nice, France, Aug 31 - Sept. 4, 2015, pp. 330-334.

  22. H. B. Sailor, M. C. Madhavi, and H. A. Patil, ‘‘Significance of phase-based features for person recognition using humming,’’ in 2nd Int. Conf. on Perception and Machine Intelligence (PerMin), C-DAC, Kolkata, Feb. 26-27, 2015, pp. 99-103.

  23. B. Vachhani, K. D. , M. C. Madhavi and H. A. Patil, ‘‘A spectral transition measure based MELCEPSTRAL features for obstruent detection,’’ in Int. Conf. on Asian Lang. Process. (IALP ‘14), Kuching, Malaysia, 2014, pp. 50-53.

  24. S. Sharma, M. C. Madhavi and H. A. Patil, ‘‘Vocal Tract Length Normalization for Vowel Recognition in Low Resource Languages,’’ in Int. Conf. on Asian Lang. Process. (IALP ‘14), Kuching, Malaysia, 2014, pp. 54-57.

  25. M. C. Madhavi, S. Sharma, and H. A. Patil, ‘‘Development of language resources for speech application in Gujarati and Marathi,’’ in Int. Conf. on Asian Lang. Process., (IALP), Kuching, Malaysia, 2014, pp. 115-118.

  26. A. Undhad , H. Patil, and M. C. Madhavi, ‘‘Exploiting speech source information for vowel landmark detection for low resource language,’’ in the 9th Int. Symposium on Chinese Spoken Language Processing, ISCSLP’14, Singapore, Sep. 12-14, 2014, pp. 546-550.

  27. N. Shah, H. Patil, M. Madhavi, H. Sailor and T. Patel, ‘‘Deterministic Annealing EM Algorithm for Developing TTS System in Gujarati,’’ in the 9th Int. Symposium on Chinese Spoken Language Processing, ISCSLP’14, Singapore, Sep. 12-14, 2014, pp. 526-530.

  28. M. C. Madhavi, and H. A. Patil, ‘‘Exploiting Variable length Teager Energy Operator in features for person recognition from humming,’’ in the 9th Int. Symposium on Chinese Spoken Language Processing, ISCSLP’14, Singapore, Sep. 12-14, 2014, pp. 624-628.

  29. S. Sharma, M. C. Madhavi and H. A. Patil, ‘‘Development of Vocal Tract Length Normalized Phonetic Engine for Gujarati and Marathi Languages,’’ in The 17th Oriental COCOSDA’14, Phuket, Thailand, Sept. 10-12, 2014.

  30. K. D. , B. B. Vachhani, M. C. Madhavi, N. H. Chhayani, and H. A. Patil. ‘‘Development of speech corpora in Gujarati and Marathi for phonetic transcription,’’ in Int. Conf. Oriental COCOSDA held jointly with 2013 Conf. on Asian Spoken Lang. Research and Evaluation (O-COCOSDA/), 2013, Gurgaon, India, pp. 1-6. 2013.

  31. H. A. Patil, M. C. Madhavi, K. D. Malde, and B. B. Vachhani, ‘‘Phonetic Transcription of Fricatives and Plosives for Gujarati and Marathi Languages, ‘’ in Int. Conf. on Asian Lang. Process. (IALP), Hanoi, Vietnam, November 13-15, 2012, pp. 177-180.

  32. H. A. Patil, M. C. Madhavi, and N. H. Chhayani, “Person Recognition Using Humming, and Speech, ‘’ in Int. Conf. on Asian Lang. Process. (IALP), Hanoi, Vietnam, November 13-15, 2012, pp. 149-152.

  33. H. A. Patil, and M. C. Madhavi, ‘‘Significance of magnitude and phase information via VTEO for humming based biometrics,’’ in Proc. Int. Conf. on Biometrics (ICB), New Delhi, India, 2012, pp. 372-377.

  34. H. A. Patil, M. C. Madhavi, and K. K. Parhi,’‘Combining Evidence from Spectral and Source-Like Features for Person Recognition from Humming,’’ in 12th Proc. Annual Conf. of Int. Speech Communication Association (ISCA), INTERSPEECH, Florence, Italy, August 27-31, 2011, pp. 369-372.


Page template forked from evanca

Remove above link if you don’t want to attibute