We have recently seen a couple of breakthroughs in speech recognition – Microsoft’s systems have reached human parity in transcribing speech and Baidu’s system is said to be 3 times faster than human transcription for typing text messages, and just as accurate as humans. Does this mean that the problem is solved? How did Deep Learning manage to revolutionize the field? What does it take to replicate this success in new languages and domains? In this talk, I will present the problem of Automatic Speech Recognition and talk about the various Machine Learning solutions that have been proposed over the years. We will look at research and results on specific datasets and trace the improvement of speech recognition systems on them. I will also present some state-of-the-art Deep Learning models that are being used for ASR and discuss some open problems in the field.
I am currently a Post Doctoral researcher at Microsoft Research India working on speech and language technologies for multilingual communities. My current focus is on building automatic speech recognizers for code switched speech or mixed-language speech. Previously, I completed my Masters and PhD in the School of Computer Science at Carnegie Mellon University, and my dissertation was on speech synthesis for low resource languages, advised by Prof Alan Black. My research interests span speech and language technologies, machine learning and intelligent agents.Written on March 25th, 2017 by