The internet is a place where we all look for information and the most common way to consume information, in the form of articles, blogs, news etc, is to read them. It is great if you in the comfort of your office or home and have enough time but to be able to talk to the website you are browsing and having to listen to the responses is completely another thing. Modern web apps can talk. They can listen to you as well.
The Web Speech API has been there for a while and most of the modern browsers have been supporting it now. In fact, Chrome started supporting Web Speech API as early as version 33 and now Edge and Firefox are also supporting it.
Learn more about compatibility here: https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API#Browser_compatibility
The Web Speech API has two parts: SpeechSynthesis (Text-to-Speech), and SpeechRecognition (Asynchronous Speech Recognition). We are going to have a look at what we can achieve with the SpeechSynthesis part.
Consider the possibilities.
- Instead of reading a long article, you can have that read out to you while you are driving.
- With real-time translations technologies making their way, this can lead to penetrating markets that were earlier not reachable due to language issues.
- And furthermore, this will open up ways for those who struggle with physical disabilities or illiteracy. Everyone can listen to and understand a language, right?
The basics include using the speechSynthesis object which can be accessed in any supported browser using window.speechSysnthesis. There are a number of methods and properties on this object of which we are most interested in is the speak() method. The speak() method takes in a SpeechSynthesisUtterance object. Let’s have a look at the most basic code to simply say “Hello World!”
The SpeechSynthesisUtterance object takes in the text that you need to make the API speak and then pass it to the speak() function. That’s it. This is the minimum code that you need.
Instead of passing in a static text, you can also pass in a reference to an id or a class element on the page and it will still do that job.
A working demo of the above is available here on CodePen:
Simple and Easy! Although there is quite a configuration that you can do by using properties, events, and methods on the utterance object we created above. Let’s have a look at what all we can configure:-
You can define the text that you want to be spoken using the text property. In the above example we are setting the text by passing during the object creation, but setting the text property is another way to do so.
Pitch specifies the pitch at which the text is spoken. Generally, 1 is the default value of pitch, with a minimum value of 0 and a maximum of 2. However, this can change for different platforms.
Rate specifies the speed at which the text is spoken. Again, 1 is the default value of rate, while it can range between 0.1 and 10, 10 being the highest.
As the name implies, volume takes a value between 0 and 1 and sets the speech volume. Default value is 1 which is the highest for every platform.
The lang property specifies the language of speech which takes in a BCP 47 language tag, for example, “en-US” and “en-IN”. This also specifies the locale along with the language and the speech is uttered accordingly.
voice property specifies the voice to be used to speak the text. This should be set to one of the SpeechSynthesisVoice objects returned by SpeechSynthesis.getVoices(). If not set by the time the utterance is spoken, the voice used will be the most suitable default voice available for the utterance's lang setting.
Look at the code below with all the properties defined.
The getVoices() method returns an array of voices which may vary from platform to platform. Here is a list of voices I got in my browser.
Let’s create a more sophisticated working demo which lets us see the Web Speech API’s Speech Synthesis capabilities.
We start by creating a simple form with a select element that we will populate with all the voices available on the platform and a textarea input for the text to be read. A button at the bottom will be responsible to initiate the reading.
Next, we need to set some settings for the SpeechSynthesisUtterance including rate, pitch, volume and lang since these will not be changing. We also need to hook into the event onvoiceschanged to get the list of voices because voices are loaded asynchronously with the page so you may get a blank array otherwise.
Once we get the voices using the getVoices() method, we add those to the select element on the page using a for loop.
Finally, create the read() function that will get the selected voice, the text to be read and call the speak() function.
That’s it. Here is final working CodePen if you want to fork and play with it:
Along with the speak() method, The API also offers play(), pause() and resume() methods to take more control on your implementation.
I also tried to increase or decrease the rate of speech in real time but it did not work. It looks like that once the speech starts playing, you can change any properties on the utterance but it won’t affect the current speech at all. The changes will be implemented the next time the speak() function is called, obviously.
Similarly, if you try to change the voice while the speech is in progress, it does not work as well. There are clean tricks that you can implement if you want to change the speed in real time but those are not must-have features. I am happy with the basic reading of the articles at a consistent speed.