Exploring different text to speech apis on the cloud

TTS What is it and what is it for?

Surely you have used some assistant on your cell phone, for example on android, by saying "OK Google", your phone will respond with a synthetic voice to the requests you made in a spoken way. This synthetic voice is TTS, an acronym for the English words "Text to Speech", and is a technique that converts letters to a similar or very similar to human voice. It is mainly used to interact with the computer without using a screen, the programmer gives orders for the computer to read text strings, and convert them into a very similar human voice. TTS is currently in the assistants of smartphones, e-book readers, smart phone plants, etc.

Why use a cloud provider?

Cloud providers use very simple apis to use TTS, usually they give free minimum quotas that allow to experiment and develop, and once in production, one benefits from being able to have the latest technology in TTS algorithms, for example in Google one has access to artificial intelligence algorithms that allow a quality very similar to human voice, and are constantly improving.

Cloud providers with TTS. advantages and disadvantages

Let's evaluate Google cloud text api:

Google Cloud

Google Cloud

We enter the page where the demo is at https://cloud.google.com/text-to-speech/ I recommend leaving the English language for the test because the wavenet algorithm is only available for this language. Then in voice type we choose basic, and then press the speak it button, we can hear a voice of good quality almost that it is not noticed that it is a machine compared to basic, which is a slightly more robotized voice.

Let's evaluate Azure TTS api:

Azure Logo


We can access the page with demos here: https://azure.microsoft.com/es-es/services/cognitive-services/text-to-speech/ We have two sections, the first one that is neural text and the second section that is the standard text. The first is created by machine learning algorithms where quality is better than standard. The quality of the most advanced algorithm is similar to google.

Let's evaluate IBM Watson TTS api:

IBM Watson logo

IBM Watson

Now, IBM blow all expectations, the TTS quality is impressive,you almost can't tell that is a machine which generated the audio. We can go to the demos page here: https://text-to-speech-demo.ng.bluemix.net/ The voices generated by the machine learning algorithms are V3 (enhanced dnn), try it and you will notice the difference with the voices of Azure and Google Cloud, in addition to the option of voices generated by artificial intelligence algorithms, there are Spanish language options, which are not available with Azure and Google Cloud.

Leave a Comment

Your email address will not be published.