top

The new Speech Synthesis Web API

The internet is a place where we all look for information and the most common way to consume information, in the form of articles, blogs, news etc, is to read them. It is great if you in the comfort of your office or home and have enough time but to be able to talk to the website you are browsing and having to listen to the responses is completely another thing. Modern web apps can talk. They can listen to you as well. The Web Speech API has been there for a while and most of the modern browsers have been supporting it now. In fact, Chrome started supporting Web Speech API as early as version 33 and now Edge and Firefox are also supporting it. Learn more about compatibility here: https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API#Browser_compatibility The Web Speech API has two parts: SpeechSynthesis (Text-to-Speech), and SpeechRecognition (Asynchronous Speech Recognition). We are going to have a look at what we can achieve with the SpeechSynthesis part. Speech Synthesis can make your applications and websites talk even if they are running in a browser. This is a standard JavaScript Web API (although still marked as experimental) so the code will work out of the box without the use of any external libraries or frameworks. No dependencies whatsoever. All you need is just a web browser to launch your website. Consider the possibilities. Instead of reading a long article, you can have that read out to you while you are driving. With real-time translations technologies making their way, this can lead to penetrating markets that were earlier not reachable due to language issues. And furthermore, this will open up ways for those who struggle with physical disabilities or illiteracy. Everyone can listen to and understand a language, right? Enough talking. Let’s see this in action with some bare minimum code. The code below is simple JavaScript so you can put this even in the browser’s console windows which can be opened simply by pressing F12 on Chrome. The basics include using the speechSynthesis object which can be accessed in any supported browser using window.speechSysnthesis. There are a number of methods and properties on this object of which we are most interested in is the speak() method. The speak() method takes in a SpeechSynthesisUtterance object. Let’s have a look at the most basic code to simply say “Hello World!” The SpeechSynthesisUtterance object takes in the text that you need to make the API speak and then pass it to the speak() function. That’s it. This is the minimum code that you need. Instead of passing in a static text, you can also pass in a reference to an id or a class element on the page and it will still do that job. A working demo of the above is available here on CodePen: See the Pen <a data-cke-saved-href='https://codepen.io/samarthagarwal/pen/xPWNGK/' href='https://codepen.io/samarthagarwal/pen/xPWNGK/'>HelloWorld!</a> by Samarth (<a data-cke-saved-href='https://codepen.io/samarthagarwal' href='https://codepen.io/samarthagarwal'>@samarthagarwal</a>) on <a data-cke-saved-href='https://codepen.io' href='https://codepen.io'>CodePen</a>. Simple and Easy! Although there is quite a configuration that you can do by using properties, events, and methods on the utterance object we created above. Let’s have a look at what all we can configure:- text You can define the text that you want to be spoken using the text property. In the above example we are setting the text by passing during the object creation, but setting the text property is another way to do so. pitch Pitch specifies the pitch at which the text is spoken. Generally, 1 is the default value of pitch, with a minimum value of 0 and a maximum of 2. However, this can change for different platforms. rate Rate specifies the speed at which the text is spoken. Again, 1 is the default value of rate, while it can range between 0.1 and 10, 10 being the highest. volume As the name implies, volume takes a value between 0 and 1 and sets the speech volume. Default value is 1 which is the highest for every platform. lang The lang property specifies the language of speech which takes in a BCP 47 language tag, for example, “en-US” and “en-IN”. This also specifies the locale along with the language and the speech is uttered accordingly. voice voice property specifies the voice to be used to speak the text. This should be set to one of the SpeechSynthesisVoice objects returned by SpeechSynthesis.getVoices(). If not set by the time the utterance is spoken, the voice used will be the most suitable default voice available for the utterance's lang setting. Look at the code below with all the properties defined. The getVoices() method returns an array of voices which may vary from platform to platform. Here is a list of voices I got in my browser. Let’s create a more sophisticated working demo which lets us see the Web Speech API’s Speech Synthesis capabilities. We start by creating a simple form with a select element that we will populate with all the voices available on the platform and a textarea input for the text to be read. A button at the bottom will be responsible to initiate the reading. Next, we need to set some settings for the SpeechSynthesisUtterance including rate, pitch, volume and lang since these will not be changing. We also need to hook into the event onvoiceschanged to get the list of voices because voices are loaded asynchronously with the page so you may get a blank array otherwise. Once we get the voices using the getVoices() method, we add those to the select element on the page using a for loop. Finally, create the read() function that will get the selected voice, the text to be read and call the speak() function. That’s it. Here is final working CodePen if you want to fork and play with it: See the Pen <a data-cke-saved-href='https://codepen.io/samarthagarwal/pen/MOGYEr/' href='https://codepen.io/samarthagarwal/pen/MOGYEr/'>HelloWorld!</a> by Samarth (<a data-cke-saved-href='https://codepen.io/samarthagarwal' href='https://codepen.io/samarthagarwal'>@samarthagarwal</a>) on <a data-cke-saved-href='https://codepen.io' href='https://codepen.io'>CodePen</a>. Along with the speak() method, The API also offers play(), pause() and resume() methods to take more control on your implementation. I also tried to increase or decrease the rate of speech in real time but it did not work. It looks like that once the speech starts playing, you can change any properties on the utterance but it won’t affect the current speech at all. The changes will be implemented the next time the speak() function is called, obviously. Similarly, if you try to change the voice while the speech is in progress, it does not work as well. There are clean tricks that you can implement if you want to change the speed in real time but those are not must-have features. I am happy with the basic reading of the articles at a consistent speed. Happy reading!
Rated 4.0/5 based on 20 customer reviews
Normal Mode Dark Mode

The new Speech Synthesis Web API

Samarth Agarwal
Blog
21st Dec, 2017
The new Speech Synthesis Web API

The internet is a place where we all look for information and the most common way to consume information, in the form of articles, blogs, news etc, is to read them. It is great if you in the comfort of your office or home and have enough time but to be able to talk to the website you are browsing and having to listen to the responses is completely another thing. Modern web apps can talk. They can listen to you as well.

The Web Speech API has been there for a while and most of the modern browsers have been supporting it now. In fact, Chrome started supporting Web Speech API as early as version 33 and now Edge and Firefox are also supporting it.

Learn more about compatibility here: https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API#Browser_compatibility

The Web Speech API has two parts: SpeechSynthesis (Text-to-Speech), and SpeechRecognition (Asynchronous Speech Recognition). We are going to have a look at what we can achieve with the SpeechSynthesis part.

Speech Synthesis can make your applications and websites talk even if they are running in a browser. This is a standard JavaScript Web API (although still marked as experimental) so the code will work out of the box without the use of any external libraries or frameworks. No dependencies whatsoever. All you need is just a web browser to launch your website.

Consider the possibilities.

  • Instead of reading a long article, you can have that read out to you while you are driving.
  • With real-time translations technologies making their way, this can lead to penetrating markets that were earlier not reachable due to language issues.
  • And furthermore, this will open up ways for those who struggle with physical disabilities or illiteracy. Everyone can listen to and understand a language, right?

Enough talking. Let’s see this in action with some bare minimum code. The code below is simple JavaScript so you can put this even in the browser’s console windows which can be opened simply by pressing F12 on Chrome.

The basics include using the speechSynthesis object which can be accessed in any supported browser using window.speechSysnthesis. There are a number of methods and properties on this object of which we are most interested in is the speak() method. The speak() method takes in a SpeechSynthesisUtterance object. Let’s have a look at the most basic code to simply say “Hello World!”

speech synthesis api

The SpeechSynthesisUtterance object takes in the text that you need to make the API speak and then pass it to the speak() function. That’s it. This is the minimum code that you need.

Instead of passing in a static text, you can also pass in a reference to an id or a class element on the page and it will still do that job.

api_speech_synthesis

speechsynthesis_speak

A working demo of the above is available here on CodePen:

Simple and Easy! Although there is quite a configuration that you can do by using properties, events, and methods on the utterance object we created above. Let’s have a look at what all we can configure:-

text

You can define the text that you want to be spoken using the text property. In the above example we are setting the text by passing during the object creation, but setting the text property is another way to do so.

pitch

Pitch specifies the pitch at which the text is spoken. Generally, 1 is the default value of pitch, with a minimum value of 0 and a maximum of 2. However, this can change for different platforms.

rate

Rate specifies the speed at which the text is spoken. Again, 1 is the default value of rate, while it can range between 0.1 and 10, 10 being the highest.

volume

As the name implies, volume takes a value between 0 and 1 and sets the speech volume. Default value is 1 which is the highest for every platform.

lang

The lang property specifies the language of speech which takes in a BCP 47 language tag, for example, “en-US” and “en-IN”. This also specifies the locale along with the language and the speech is uttered accordingly.

voice

voice property specifies the voice to be used to speak the text. This should be set to one of the SpeechSynthesisVoice objects returned by SpeechSynthesis.getVoices(). If not set by the time the utterance is spoken, the voice used will be the most suitable default voice available for the utterance's lang setting.

Look at the code below with all the properties defined.

speechsynthesis_1

The getVoices() method returns an array of voices which may vary from platform to platform. Here is a list of voices I got in my browser.

Let’s create a more sophisticated working demo which lets us see the Web Speech API’s Speech Synthesis capabilities.

We start by creating a simple form with a select element that we will populate with all the voices available on the platform and a textarea input for the text to be read. A button at the bottom will be responsible to initiate the reading.

speech_synthesis_api

Next, we need to set some settings for the SpeechSynthesisUtterance including rate, pitch, volume and lang since these will not be changing. We also need to hook into the event onvoiceschanged to get the list of voices because voices are loaded asynchronously with the page so you may get a blank array otherwise.

Once we get the voices using the getVoices() method, we add those to the select element on the page using a for loop.

new speech synthesis

Finally, create the read() function that will get the selected voice, the text to be read and call the speak() function.

speech_1

That’s it. Here is final working CodePen if you want to fork and play with it:

Along with the speak() method, The API also offers play(), pause() and resume() methods to take more control on your implementation.

I also tried to increase or decrease the rate of speech in real time but it did not work. It looks like that once the speech starts playing, you can change any properties on the utterance but it won’t affect the current speech at all. The changes will be implemented the next time the speak() function is called, obviously.

Similarly, if you try to change the voice while the speech is in progress, it does not work as well. There are clean tricks that you can implement if you want to change the speed in real time but those are not must-have features. I am happy with the basic reading of the articles at a consistent speed.

Happy reading!

Samarth

Samarth Agarwal

Blog Author

Samarth Agarwal is an App/Web Developer, Entrepreneur, and Instructor. He has been teaching front-end application and mobile development for more than three years. He has been developing apps for more than five years and working with a variety of front end and back end technologies including Ionic, NodeJS, AngularJS, Angular, ReactJS etc. He is an instructor for Ionic and Angular on Udemy, LinkedIn Learning, and Internshala Trainings.

Leave a Reply

Your email address will not be published. Required fields are marked *

SUBSCRIBE OUR BLOG

Follow Us On

Share on

other Blogs

20% Discount