Updated 11/05/2016
Buffered, SD-cached high-quality speech and sound effects for your IOT project. Requirements: Raspberry Pi or similar, Node-Red and a little of your time.
As you may know I’ve done a couple of blog items on Speech on the Raspberry Pi recently – I was quite happy with the Google Translate API until recently when they stopped making it available for free, so I was forced to go off in search of alternatives and settled on local synthesizers like eSpeak – the problem being – they generally sound AWFUL. eSpeak sounds like someone’s dog being strangled.
And so it was that one of our readers (thank you for that) contacted me and suggested I take a look at Ivona. https://www.ivona.com/us/
This is new: I’ve just gutted the code from the original blog to produce a much better version, better thought out with cacheing for off-line use, unifying speech and sound effects into an auto-created library. In the event of an external comms failure so that you cannot access the outside world, your Ivona is going to fail and this happened here at Bedrock – SO the idea hit us to CACHE all mp3 files (assuming you’re not going to do unique messages every time). That way the SECOND time you ask for a message you will already have a file with that name – the total file space for dozens or even hundreds of these MP3 files is a tiny fraction of a typical SD storage capability and not even worth taking into consideration in most projects.
I originally did a video to accompany this blog - https://www.youtube.com/watch?v=qoxPVa48qRw
If you use the link I’ve provided above (to Ivona) and select a suitable voice on their web page then enter some text in the box, you’ll find it does a pretty good job. Ivona is free to developers – just grab a free account and get an API key and secret key. I spent the entire evening playing with the code and when I looked at the percentage of my “free use” – nothing.
Take a tip when you copy and paste that API code information (which you should immediately on getting an account as you won’t be able to get the same key later) and pass them via NOTEPAD (paste then copy again) to get rid of any hidden characters.
So now you have an API key and secret key to be used in the Node-Red node Ivona (node-red-contrib-ivona) https://www.npmjs.com/package/node-red-contrib-ivona
Your API key for Node-Red is the ACCESS code and the PASSWORD is the secret key !!!!! I used my email address for the username – this is all a bit non-intuitive so beware.
Over to Node-Red. So what this node does is take in your TEXT, send it off to Ivona which returns an MP3 file with your chosen speech. You should also have MPG123 installed on your end computer (I’m using a Raspberry Pi2 for all of this). http://www.mpg123.de/
In the simplest case you would send off your text, get the MP3 file, send that to mpg123 for playback. But then you are stuck with a file… and what if you send 2 in quick succession – they will overlap each other as Node-Red runs asynchronously.
Here’s the solution and it’s a lot better than I had in the past. You can fire off several speech requests including requests for other .mp3 files. for special effects I have a bunch of MP3 files already stored – such as “alert” and “hailing frequencies open”.
![Ivona speech Ivona speech]()
In the example above (that red block is NOT the Ivona node – it is a subflow I wrote – more in a minute)… let me show you those two INJECTS on the left..
![Ivona speech Ivona speech]()
![Ivona speech Ivona speech]()
The first has “alert” in the topic and some text “Red 1 logged in” – the second simply text.
I can click one, wait for it to speak and then click the second – or I can chose not to wait, clicking wildly – and they will still play in order. So if you specify speech for the TOPIC AND THE PAYLOAD, simply both will go into the queue in order.
How do I do that – so looking at the red Node-Red SUBFLOW…
![flow flow]()
The yellow’ish blocks are user functions, the red block is a Node-Red EXEC functions – the purple item is a simple 1 second delay for good measure. The purple item is the node-red-contrib-ivona node.
I take in text… if the topic has the word “alert” in it – I put that on the queue BEFORE the main text – other than that there is NO difference between the two.
If there is no text, just a blank message coming in, I check the queue and if not empty, try to use the items on the queue (first in, first out) one at a time.
The INJECT function is needed to start the ball rolling for the first item in the queue. Once I find text in the queue, it is send to the Ivona node IF such a named file does not exist - and then on to the mpg123 player – either way setting a BUSY flag so that those one-second ticks can’t pull another item off the queue until I’m done.
When done – I send empty messages back into the input to trigger off any further items in the queue.
Here is the main function:
var frompush=0;
if (typeof context.arr == "undefined" || !(context.arr instanceof Array)) context.arr = [];
if (typeof context.global.speech_busy == "undefined") context.global.speech_busy=0;
if ((msg.payload==="")&&(context.global.speech_busy===0))
if (context.arr.length)
{
frompush=1;
msg.payload=context.arr.shift();
}
if (msg.payload!=="")
{
// just push but not recursively
if (frompush===0)
{
if (msg.topic!=="") context.arr.push(msg.topic);
context.arr.push(msg.payload);
return;
}
context.global.speech_busy=1;
msg.fname=msg.payload.replace(/ /g,'_');
msg.fname=msg.fname.replace(/\./g,'');
msg.fname=msg.fname.replace(/\,/g,'');
msg.fname=msg.fname.toLowerCase();
msg.fname="/home/pi/recordings/"+msg.fname+".mp3";
return msg;
}
Note the busy flag and the use of PUSH for the queue.
The “copy file to payload” is trivial – Ivona returns the filename in msg.file which is not where I want it.
msg.payload=msg.file;
return msg;
The reset flag function simply clears the busy flag and returns a blank message.
context.global.speech_busy=0;
msg.payload=""; return msg;
The trigger is the Node-Red DELAY function simply set to delay for one second and then pass the message on.
![Ivona speech Ivona speech]()
The MPG123 EXEC node calls mpg123 and passes the file name as parameter. The DELETE node simply deletes the file that Ivona creates… Here’s the Ivona setup. Put your credentials in the top box.
![moustache moustache]()
Note the triple moustache {{{}}} - Ivona examples use a double – but that then interprets slashes and we don’t want to do that because we have a file path in there.
And that is about it – works a treat and produces high quality buffered speech – for free – for your IOT endeavours.
Pick your own file directory (note that I used /home/pi/recordings but that isn’t in any way special) and any words or phrases you want SOUNDS for instead of voice – simply replace with files of the same name (not that spaces are replaced in files names by underscores). So “alert 2” as a file name would be “alert_2.mp3”
![Share on Facebook Facebook]()
![Share on Google+ google_plus]()
![Pin it with Pinterest pinterest]()