The Speech Recognition API in JavaScript: speechRecognition()

- Andrés Cruz

En español
The Speech Recognition API in JavaScript: speechRecognition()

Example Download

The speech recognition API in JavaScript is an API that all HTML5 developers have available, it is an API that invites us to "talk" with the browser, an API with which we can execute voice commands with JavaScript or convert our voice to text, as you can see the ways in which we can use this API are infinite where our imagination is the limit and we can even create or use libraries.

With this HTML5 API we can execute voice commands according to our data, since we implement the logic

JavaScript has all kinds of APIs that facilitate the process when developing web applications of any type:

To name a few of the many JavaScript APIs of which some have been seen on DesarrolloLibre.

"The JavaScript API that invites us to talk to applications"

In this entry we will take the first steps with the speechRecognition() Voice Recognition API in JavaScript, which in other words gives the ability to our applications to recognize the voice according to the language configured through the microphone of the PC or mobile device.

Getting started with the Speech Recognition API in JavaScript

The skeleton of the code that we will use is the following and we will analyze it in the following section:

if (!('webkitSpeechRecognition' in window)) {
  alert("¡API no soportada!");
} else {
  var recognition = new webkitSpeechRecognition();
  recognition.continuous = true;
  recognition.interimResults = true;
  recognition.lang = "es-Ve";

  recognition.onstart = function() {}
  recognition.onresult = function(event) {}
  recognition.onerror = function(event) {}
  recognition.onend = function() {}
}

Analyzing the previous code...

As with all "new" APIs, it is necessary to verify the availability of the Speech Recognition API in the browser by checking if the webkitSpeechRecognition object exists; If it doesn't exist, a message is simply displayed saying that the API is not available in your browser:

if (!('webkitSpeechRecognition' in window)) {
  alert("¡API no soportada!");
}else{
...
}

If the Speech Recognition API is available or implemented in the browser, the following section of code enclosed by the else is executed; the first line of code:

  recognition.continuous = true;

Initializes the continuous attribute (set to false by default) in order to define continuity when speaking; That is, it is established that when the user stops speaking, Voice Recognition comes to an end (the onend event is triggered).

This other line of code:

recognition.interimResults = true;

Specifies whether the results returned are final and will not change (false) or not (true).

The following line of code:

recognition.lang = "es-VE";

Initializes the lang attribute that specifies the language that will be recognized to make the request; in other words, the language to be used by the user.

JavaScript Speech Recognition API Main Events

Once the previous attributes have been established, now it is the turn of the events that will actually allow us to control and obtain the user's words in plain text:

  recognition.onstart = function() {}
  recognition.onresult = function(event) {}
  recognition.onerror = function(event) {}
  recognition.onend = function() {}

The SpeechRecognition API onstart event

This event is executed when the start() function is called the browser begins to "listen"; In other words, it represents the moment at which the application starts listening; to invoke the onstart event we must do the following:

var recognition = new webkitSpeechRecognition();
...
recognition.start();

The SpeechRecognition API onerror event

This event is executed if and only if an error has occurred. Here we can express the error to our user so that he can solve it, if the microphone or its permissions were not found.

The SpeechRecognition API onend event

This event is executed when the user has finished speaking, which means that the speech recognition has come to an end; It is a good candidate to make some changes at a visual level in our API, so that the user knows that they are no longer being heard and that the processing of what they expressed through voice is going to be carried out.

The SpeechRecognition API onresult event

Finally, this event returns the obtained result; In other words, the words expressed verbally converted into plain text; as you can see, this is the "strong" method of this API and it is where we can process our user's desired response in text format; We don't have to do anything with audio, the API does everything for us and gives us the text equivalent of what the user expresses through voice.

This is the interesting part of the API; here we finally get the response obtained by the user in the following structure:

{
  ..
  results: {
    0: {
      0: {
        confidence: 0.6...,
        transcript: "Hola"
      },
      isFinal:true,
      length:1
    },
    length:1
  },
  ..
}

To obtain the last paragraph spoken by the user we can do the following:

if(event.results[i].isFinal)
    event.results[i][0].transcript;

Complete JavaScript Speech Recognition API Example: Speech to Text

Having already explained the "strong" part of the code or, in other words, the basics of the API, it is possible to create a small program that allows you to use Voice Recognition in a web application and display the result in a text field:

	var recognition;
	var recognizing = false;
	if (!('webkitSpeechRecognition' in window)) {
		alert("¡API no soportada!");
	} else {

		recognition = new webkitSpeechRecognition();
		recognition.lang = "es-VE";
		recognition.continuous = true;
		recognition.interimResults = true;

		recognition.onstart = function() {
			recognizing = true;
			console.log("empezando a escuchar");
		}
		recognition.onresult = function(event) {

		 for (var i = event.resultIndex; i < event.results.length; i++) {
			if(event.results[i].isFinal)
				document.getElementById("texto").value += event.results[i][0].transcript;
		    }
			
			//texto
		}
		recognition.onerror = function(event) {
		}
		recognition.onend = function() {
			recognizing = false;
			document.getElementById("procesar").innerHTML = "Escuchar";
			console.log("terminó de escuchar, llegó a su fin");

		}

	}

	function procesar() {

		if (recognizing == false) {
			recognition.start();
			recognizing = true;
			document.getElementById("procesar").innerHTML = "Detener";
		} else {
			recognition.stop();
			recognizing = false;
			document.getElementById("procesar").innerHTML = "Escuchar";
		}
	}

Giving some considerations about the above code:

  • We use the recognizing variable to easily know when the browser is "listening" or not.
  • Currently we cannot use this API on sites that do not have an HTTPS connection.

With this simple example we see how easy it is to convert text to speech using the native HTML5 API for speech recognition.

Problems with Google Chrome in voice recognition

In the latest versions of Google Chrome released by Google, it has made changes to the security of its browser that make it impossible to use devices such as microphones and cameras from the popular browser if the website does not have the HTTPS certificate, the request to access either the camera or The device's microphone will not work, this causes many problems if we do not have said HTTPS certificate on our website or the website where we want to use the script to access the device's webcam or microphone.

Managing permissions in Google Chrome

Google Chrome has a permission system for devices that can be accessed to different websites; To do this, we go to the script developed in this entry (or any other that tries to access the microphone of our computer) and we go to the small icon that appears in the navigation bar located in the upper right corner and we click on it. same; the problem that exists with recent versions of Google Chrome is that regardless of the option we select:

solicitud permiso micrófono

Google Chrome never allows us to access the microphone to any website that does not have the HTTPS certificate unless it is localhost.

and click on "Manage microphone settings":

ventana solicitud permiso micrófono

Here we can see the websites which are allowed to access our microphone; as an important point we see that Google Chrome allows access to the microphone if we are accessing from the localhost, so if we copy the script provided in this entry and copy it to our localhost it will be able to work correctly when Google Chrome requests permission and later we assign the same. An important point is that we will NOT see any website in this section that does not have the HTTPS certificate no matter what option we place in the previous window.

Extra voice command with a JavaScript library

In this add-on that we bring you, we are going to see how to process voice commands with the annyang library in JavaScript; remember that you need a web server with HTTPS or from localhost; The operation is simple, we only have to indicate the voice commands:

var commands = {
	// annyang will capture anything after a splat (*) and pass it to the function.
	// e.g. saying "Show me Batman and Robin" is the same as calling showFlickr('Batman and Robin');
	'show me *tag': showFlickr,

	// A named variable is a one word variable, that can fit anywhere in your command.
	// e.g. saying "calculate October stats" will call calculateStats('October');
	'calculate :month stats': calculateStats,

	// By defining a part of the following command as optional, annyang will respond to both:
	// "say hello to my little friend" as well as "say hello friend"
	'say hello (to my little) friend': greeting
};

And then we define the functions that we are going to use, one function for each command:

var showFlickr = function (tag) {
	$('#frace').text("Tag: " + tag);
	console.log("Tag: " + tag)
}

var calculateStats = function (month) {
	$('#frace').text("Data de " + month);
	console.log("Data de " + month)
}

var greeting = function () {
	$('#frace').text("Hola Mundo");
	console.log("Hola Mundo")
}

Finally we add the commands, optionally the language to use and start() to start everything.

// Agregamos nuestros comandos a annyang.
annyang.addCommands(commands);

//Establecemos el lenguaje
//annyang.setLanguage("es-MX");

// Empezamos a escuchar.
annyang.start();

Although if you do not want to use an external library and you only want to use the native API provided in HTML5, you can also do it, you can do it in many ways, if you are going to create simple commands you can compare the text returned by the user completely, if you are interested in grabbing words keys, you can use the indexOf function provided by the JavaScript API; without are several commands, since you use a switch or a grouped if; You must take all these considerations in the onresul function:

recognition.onresult = function(event) {
		 for (var i = event.resultIndex; i < event.results.length; i++) {
			if(event.results[i].isFinal)
				document.getElementById("texto").value += event.results[i][0].transcript;

Example Download

Andrés Cruz

Desarrollo con Laravel, Django, Flask, CodeIgniter, HTML5, CSS3, MySQL, JavaScript, Vue, Android, iOS, Flutter

Andrés Cruz en Udemy

Acepto recibir anuncios de interes sobre este Blog.