Assistants like Siri, Google Now, or Cortana have become part of our daily lives for some time now. These tools not only recognize our voice but can also read text aloud. Thanks to HTML5, JavaScript offers us a native API to work with speech synthesis (text-to-speech or TTS) in a simple way, without relying on external libraries.
The native speech synthesizer in JavaScript allows any text to be played in the browser by configuring the language, pitch, and rate. In my case, I was surprised by how quickly a functional example can be implemented with just a few lines of code.
Thanks to HTML5 which provides an API that offers developers the possibility of working with speech recognition and speech synthesis in a simple way, and whose use is similar to the other existing APIs in JavaScript.
How to get started with SpeechSynthesis in JavaScript
Create your first SpeechSynthesisUtterance object
The SpeechSynthesisUtterance class captures text and converts it into audio. Creating an object is as simple as:
var speechMessage = new SpeechSynthesisUtterance('Hi!');
window.speechSynthesis.speak(speechMessage);This small example works immediately in modern browsers and is ideal for testing different voices and configurations.
Play text with the browser: basic example
To play dynamic text, all you have to do is assign a variable:
var texto = "Welcome to my app";
var speechMessage = new SpeechSynthesisUtterance(texto); window.speechSynthesis.speak(speechMessage);This opens the door to interactive applications where content can read personalized messages to the user.
Essential properties of SpeechSynthesisUtterance
- text: the text the browser will speak
- It is the most important property. Here you define what the synthesizer should say.
- lang: change the voice language
- Allows you to specify the language, for example:
- speechMessage.lang = 'es-ES'; // Spanish from Spain
In my own tests, changing the language improves the naturalness of the pronunciation.
This allows you to personalize the voice according to the application's context, making it more pleasant for the user.
volume: control the voice volume
speechMessage.volume = 0.8; // value between 0 and 1Ideal for balancing the audio with other sounds on the page.
- pitch and rate: pitch and playback rate
speechMessage.pitch = 1.2; // slightly higher pitch
speechMessage.rate = 0.9; // slightly slower rateThis allows you to personalize the voice according to the application's context, making it more pleasant for the user.
volume: control the voice volume
speechMessage.volume = 0.8; // value between 0 and 1Ideal for balancing the audio with other sounds on the page.
Events and control of the speech synthesis flow
onstart and onend: detect start and end
speechMessage.onstart = function() {
console.log('Reproducci贸n iniciada...');
};
speechMessage.onend = function() {
console.log('Reproducci贸n finalizada.');
};These events allow you to synchronize animations or effects with the voice, something I personally implemented in an interactive tutorial project.
Other useful events for advanced projects
There are events like onerror, onpause, and onresume that allow complete control of the voice playback flow.
We already have half the task done
In a previous post, we talked a little about the Web Speech API using voice recognition (speechRecognition()) in JavaScript, which gives our applications the ability to recognize speech according to the configured language through the PC or mobile device microphone:
The SpeechSynthesisUtterance class allows you to capture texts and convert them into audio.
Now it remains to explain how we can do the opposite, which results in, given a text, playing the audio in the configured language using the speech synthesis API in JavaScript, which is also part of the Web Speech API.
As indicated at the beginning, it is really simple to use the Web Speech API for the browser to "talk" to us based on the previous configuration using the speech synthesis API; the minimum necessary code would be something like this:
var speechSynthesisUtterance = new SpeechSynthesisUtterance('Hola'); window.speechSynthesis.speak(speechSynthesisUtterance);Click here to test the previous example.
As we can see, we first create an instance of the SpeechSynthesisUtterance class passing as a parameter the text that will be "spoken" by the browser and process this object through the speechSynthesis interface, which is the one that will finally make the browser "speak."
SpeechSynthesisUtterance class properties
The SpeechSynthesisUtterance class contains a series of methods, properties, etc., that allow you to establish how the browser will "speak"; one of them is text, which allows you to set/get the text that will be captured when our browser speaks to us using the speech synthesis API.
Although there are other attributes that allow for a little more customization of various aspects of the SpeechSynthesisUtterance class.
Like all APIs, the SpeechSynthesisUtterance class has a series of properties with which we can configure various things besides the text; among the most important we have language, volume, voice, pitch, rate, etc.
SpeechSynthesisUtterance.lang
This is also one of the most important properties and allows you to set or get the language of the text presented for the speech synthesis API.
SpeechSynthesisUtterance.pitch
Allows you to set/get the voice pitch; the actual (float) value is between zero (for the lowest) and two (for the highest).
SpeechSynthesisUtterance.rate
Allows you to set the rate at which the browser will "speak"; the actual (float) value is between zero point one (for the lowest) and ten (for the highest).
SpeechSynthesisUtterance.text
This is the most important property of all, and it is the one that allows you to set or get the text with which we want our browser to speak to us.
SpeechSynthesisUtterance.volume
Sets or gets the voice volume, represented by a real (float) value between zero (for the lowest) and one (for the highest).
Event handlers
You can see all the speech synthesis API events in the official documentation at the following link; although among those that can be considered most important or used we have:
peechMessage.onstart = function(e) {
console.log('Speaking...');
};
speechMessage.onend = function(e) {
console.log('Finished.');
};Browser support
To verify browser support, simply use the following code:
if ('speechSynthesis' in window) {
// SpeechSynthesisUtterance is supported
} else {
console.log('The API is not supported in this browser');
}Tips for improving the user experience
- Test different voices available in the browser.
- Adjust volume and rate according to the content type.
- Avoid overly long texts without pauses.
Speech synthesis example
Dynamic text-to-speech from variables
function talk(texto) {
var speechMessage = new SpeechSynthesisUtterance(texto);
window.speechSynthesis.speak(speechMessage);
}
talk("Hi!.");Voice and rate customization for interactive applications
In my experience, allowing the user to select the pitch and rate increases accessibility and improves interaction in educational or tutorial projects.
A simple example serves to better understand each of the properties and event handlers of the speech synthesis API seen above.
Frequently asked questions about the speech synthesizer in JavaScript
- Which browsers support the speech synthesis API in JavaScript?
- Chrome, Firefox, Edge, and Safari support the Web Speech API, although some voices may vary.
- How can I change the default language or voice?
- Use the lang and voice properties of SpeechSynthesisUtterance.
- Can I control the speed, volume, and pitch of the voice?
- Yes, with rate, volume, and pitch.
- Is it possible to play dynamic text from a variable in JavaScript?
- Yes, simply assign the text to SpeechSynthesisUtterance.text and use speechSynthesis.speak().
I agree to receive announcements of interest about this Blog.
Learn how to use the JavaScript Speech Synthesis API to convert text to speech on your website. Control the language, pitch, and volume with SpeechSynthesisUtterance. Make your browser speak with this tutorial and code examples!