ome time, converting speech, audio or any mp3 to text is a little bit tricky, But in this article, we'll learn in a
very easy way.
In this, we will convert multiple audio file at a time. We'll use IBM's Bluemix tool.
We'll use Nodejs, Bluemix (IBM Cloud Plateform).
let’s start converting Speech/voice To Text in Nodejs
Follow the steps:
Step 1: Register on Bluemix (IBM Cloud Plateform)
We need to register on the Bluemix.
Step 2: Login on Bluemix
After the registration, let's login there.
Step 3: Crete a service for speech-to-text. And get the username and password for speech-to-text.
Link : Create
service for speech to text.
After creating the service, we need to start the coading part.
Start coding:
Now this time we need to write codes.Step 4: Create a speech.js file and require speech-to-text and fs just like below code:
var SpeechToTextV1 = require(‘watson-developer-cloud/speech-to-text/v1’);
var fs = require(‘fs’);
In the above code, we are importing the servise from "‘watson-developer-cloud/speech-to-text/v1" and also importing the "fs".
Step 5: Create an object of SpeechToText just like below code:
var speech_to_text = new SpeechToTextV1({
username: ‘1234567–8765–4267–9e76-fgff34f’,
password: ‘ABCdefghiJK’
In the above code, we are configuring the service using predefine usernamea and password.
Step 6: Create an array and insert the path of the audio. You can have many audio file in this array:
var files = [‘./music/hello.flac’,’./music/somebody2010.flac’];
Here we have two audio file "./music/hello.flac" and "/music/somebody2010.flac", we have created an array of files. we'll use this "files" to conver audio to the text format.
Step 7: Create params for every audio, So we will do it in for loop and call speech_to_text.recognize() to convert.Get the response and console it.
for (var file in files) {
var params = {
audio: fs.createReadStream(files[file]),
content_type: ‘audio/flac’,
timestamps: true,
word_alternatives_threshold: 0.9,
keywords: [‘colorado’, ‘tornado’, ‘tornadoes’],
keywords_threshold: 0.5
speech_to_text.recognize(params, function (error, transcript) {
if (error)
console.log(‘Error:’, error);
console.log(JSON.stringify(transcript, null, 2));
In the above code, we are itearing over files and preaparing the "params". And this params is being used in the "speech_to_text" service.
Congratulation ! You did it.
The complete code is here :
Step 0: Check the environment setup. (Should installed nodejs,Remove all errors if any).
Step 1: Navigate terminal/command prompt to this directory.
Step 2: Run command: node FILE_NAME.js;
Step 3: See the message in console log;
Step 4: if executed Step 3 successfully then you have text message according to audio. Otherwise go to Step 0.
