Site icon Coding is Love

Speech recognition and synthesis with simple JavaScript

Speech with javascript

Speech – The action that human beings commonly use to interact with other human beings or even pets. But when it comes to computers, we use actions like click, type, drag, drop etc. For years people have tried to speak to computers in various ways and they have undoubtedly succeed in doing so. Today, it is also possible to do speech recognition using the computation power of just a browser.

Speech recognition is also called speech-to-text. And speech synthesis is also called text-to-speech. These are the terms we will be using in this post. They are both very simple and easy to implement in just a few lines of code. And it is unbelievably accurate given the fact that it runs on a browser.

For the sake of making the concepts clear, we will be making a simple demo that recognizes the user’s speech and repeats after him with the synthesized speech. Lets get started.

Text-to-speech

Converting text to speech is the easiest of them both. There is an in-built api and we just need to call it to. Let’s see how it works step-by-step with code.

So, this is how you generate speech on the web. Now, lets look at the more fun part of the post.

Text-to-speech browser support

Speech-to-text

This is slightly trickier. Because, just like in human beings, listening is always harder than speaking.

Speech-to-text browser support

And finally the Demo

Quickly setup a project with 3 files in it. index.html, tts.js and stt.js. Put the text-to-speech code in tts.js and the speech-to-text code in >stt.js. In index.html, just include the two scripts.

<script src="tts.js"></script>
<script src="stt.js"></script>

Now, we need to call the speak() function we wrote inside the onresult of the recognition.

recognition.onresult = function(event) {
    var current = event.resultIndex;
    var transcript = event.results[current][0].transcript;
    speak(transcript)
}

Now open this index.html in a browser and try it out. You will need to host this on a http server locally. Its not a big deal. This is needed because https is mandatory for accessing the microphone. When you open the page, it should ask for permission the first time. Accept it and say something. It should repeat it right back.

The full code for this can be found here

Exit mobile version