Recreate Siri using a Knowledge Engine API

Voice recognition and voice control applications are increasing in popularity. Programs like Apple’s Siri, Microsoft’s Cortana, and “OK Google” are changing the way people interact with their phones, games consoles, and multimedia devices.

A common feature among these applications is the ability for users to ask a general information question and have the answer spoken back to them. For example, ask Siri “What's the capital of China?” and she'll answer “Beijing”.

To accomplished this there are a few underlying technologies at work:

  • Voice recording and accurate speech to text translation to transcribe the words of the user's question.

  • Processing the quesiton through a 'knowledge engine' to get a direct answer.

  • Text to speech translation to read out the answer for the user to hear.

Readers of Arduino Meets Linux should recognize that each of these individual processes can be recreated with the Arduino Yún:

  • In Project 7 we build a circuit that can record speech using a microphone and audio interface, then translate it to text using AT&T's Speech API.

  • In Project 6 we learn how to use Arduino's Temboo library. Temboo can give us access to the WolframAlpha knowledge engine to answer questions.

  • In Project 7 we install eSpeak on our Yún which can convert text to speech to play through speakers connected to the audio interface.

In the demo below we'll put these pieces togther to create our own recreation of 'Siri' that allows us to ask it questions and have it speak the answers.

Modifying Project 7

This demo will be a modification of Arduino Meets Linux's Project 7.

Project 7 allows users to control the Yún's digital pins using voice commands - saying commands like "Turn on nine." or "Turn off ten." will set the output state of individual pins.

For the Siri recreation we'll assume that you already have Project 7 completed and working. With this as a starting point we can use the same circuit and simply make changes to its Python script and Arduino sketch.

Updating the Python Script

When the circuit's pushbutton is pressed the Arduino sketch running on the Atmega32u4 launches a Python script on the Atheros AR9331.

yuri.py records the user's voice through the microphone and saves it to a .wav file. The .wav file is then sent to AT&T's Speech API which returns a JSON response containing the translated speech to text string.

The string is parsed out into individual words which are passed into the script's processCommand() function, where a command is formatted and sent back to the Arduino sketch that's waiting for it.

For our recreation, users will be asking questions instead of speaking commands. We still want the user's voice to be recorded and translated to text in the same way, but we don't need to process it into a formatted command.

Instead, the script needs to be updated so that the full translated text string is returned to the Arduino sketch. To do this:

  1. Login to the OpenWrt-Yun command line.

  2. At the command prompt, type the following command and then press Enter:

    nano /mnt/sda1/P7/yuri.py

  3. Remove the following line near the bottom of the script:

    processCommand(vData["Recognition"]["NBest"][0]["Hypothesis"].split(' '))

    and replace it with:

    client.mailbox(vData["Recognition"]["NBest"][0]["ResultText"])

  4. Press CTRL + X

  5. Press Y and then press Enter

vData["Recognition"]["NBest"][0]["ResultText"] is a string containing the full transcription of the audio, adjusted for capitalization.

The string is passed from the script running on the Atheros AR9331 to the sketch running on the Atmega32u4 using the mailbox() method of the BridgeClient class.

Our sketch will now have access to the full text translation of the question the user asks. Since the processCommand() function isn't used anymore in the script it is taken out of the source code below.

WolframAlpha and Temboo

In order to get an answer to our questions, we need to feed them into a knowledge engine, or 'answer engine'.

These engines use algorithms and big data to compute singular, direct answers to basic queries. This differentiates them from standard search engines that simply return lists of webpages.

The WolframAlpha application programming interface (API) is a knowledge engine service that supports natural language queries. You use it by sending questions to the API over the internet, and then receive the answers as a response. It is free to use for non-commercial experiments making less than 2000 API calls per month.

Temboo is a web service that you can connect to using your Arduino Yún. The Temboo library is included with the Arduino IDE, and it contains code that handles the most complicated aspects of working with web services and APIs.

With Temboo, we can access the WolframAlpha API from our sketch using simple classes and methods.

If you are not already a Temboo user, you need to create an account on their website:

After logging in, create a new application for this project:

  1. Click ACCOUNT, and then click Applications.

  2. Click New Application.

  3. In the APPLICATION box, type YunWolframAlpha.

  4. Write down the Key, or copy and paste it somewhere for later. You need to send this value from your sketch to use Temboo.

To use the WolframAlpha API from Temboo you must first create a developer account:

After logging in, create a new application for this project:

  1. Click My APPS, then click Get an AppID.

  2. In the Application name box, type YunWolframAlpha.

  3. In the Description box, type a short description of this project.

  4. Click Get AppID.

  5. Write down the APPID, or copy and paste it somewhere for later. You need to send this value from your sketch to use WolframAlpha.

Each web service that you can use through Temboo is called a “choreo”. You can find the details of these choreos (and example code that uses them) on the Temboo website.

WolframAlpha has two choreos available:

Choreo Description
GetSearchResult Allows your application to submit a query to Wolfram|Alpha and return only the plain text from the first result pod.
Query Allows your application to submit free-form queries similar to the queries one might enter at the Wolfram|Alpha website.

The Siri recreation will utilize the GetSearchResult choreo. This will allow us to input our question string and get the WolframAlpha answer as another string.

For the WolframAlpha GetSearchResult choreo:

  1. Login to http://www.temboo.com

  2. Click LIBRARY.

  3. Under CHOREOS, click the arrow next to WolframAlpha.

  4. Click GetSearchResult.

Updating the Arduino Sketch

In the loop() function of ArduinoMeetsLinx-P7.ino, the sketch is repeatedly listening for any messages being sent from the Python script using the messageAvailable() method of the Mailbox class.

With the changes we made to the yuri.py script, these incoming messages will now contain the translated question string. We need to update how these messages are processed.

Add the following empty function to the sketch:

void askWolframAlpha(String question) {
}

Remove the following code from the loop() function:

if (Mailbox.messageAvailable()) {
  String message;
  Mailbox.readMessage(message);
  if (message.length() >= 2) {
    int pin = atoi(&message[1]);
    pinMode(pin, OUTPUT);
    if (message[0] == 'n') {
      digitalWrite(pin, HIGH);
    }
    else if (message[0] = 'f') {
      digitalWrite(pin, LOW);
    }
  }
}

And replace it with this:

if (Mailbox.messageAvailable()) {
  String message;
  Mailbox.readMessage(message);
  askWolframAlpha(message);
}

When our question string is received from the Python script through the Mailbox, it will now be passed as argument into the askWolframAlpha() function. This is where we'll run it through WolframAlpha and speak out the answer.

The Arduino IDE comes with a library – Temboo.h – that you can use to work with Temboo from your Arduino sketches.

Add the following #include directive to the sketch:

#include <Temboo.h>

When you connect to Temboo, you need to send your Temboo and WolframAlpha account information. Underneath the #include directives in your sketch, add the following statements:

#define TEMBOO_ACCOUNT       F("Your Temboo username")
#define TEMBOO_APP_KEY_NAME  F("YunWolframAlpha")
#define TEMBOO_APP_KEY       F("Your registered application key")
#define WOLFRAM_ALPHA_ID     F("Your WolframAlpha application ID")

Replace Your Temboo username and Your registered application key with the values that you receive from the Temboo website.

Replace Your WolframAlpha application ID value that you receive on the WolframAlpha website.

To use a choreo from an Arduino sketch, you need to create an instance of the TembooChoreo class.

Add the following code to the askWolframAlpha() function in your sketch:

TembooChoreo WolframChoreo;
WolframChoreo.begin();
WolframChoreo.setAccountName(TEMBOO_ACCOUNT);
WolframChoreo.setAppKeyName(TEMBOO_APP_KEY_NAME);
WolframChoreo.setAppKey(TEMBOO_APP_KEY);

To use the WolframAlpha GetSearchResult service, you need to tell the choreo to use it and then send your input parameters. Now, add the following code directly underneath the preceding lines:

WolframChoreo.setChoreo(F("/Library/WolframAlpha/GetSearchResult/"));
WolframChoreo.addInput(F("AppID"), WOLFRAM_ALPHA_ID);
WolframChoreo.addInput(F("Input"), question);

The two required inputs for this choreo are AppID, your WolframAlpha AppID and, Input, the text string of your query.

To run the search, add the following line:

WolframChoreo.run();

When the Temboo request finishes, you can read the result like you do when reading from files or network connections. However, the result of a call to the WolframAlpha GetSearchResult returns several lines of information. For example:

Result
Beijing, China
HTTP_CODE
200

The answer to the input question is held on the line below Result. To capture only this line the following code skips the first eight characters, which includes "Result" plus two characters that represent the line break, and then extracts all the characters in the next line.

Add the following code underneath WolframChoreo.run();:

String response;
int skip = 0;
while (WolframChoreo.available()) {
  WolframChoreo.read();
  skip++;
  if (skip == 8) {
    while (WolframChoreo.available()) {
      uint8_t c = WolframChoreo.read();
      if ( (c == '\n') || (c == '\r') ) {
        break;
      }
      else {
        response.concat((char)c);
      }
    }
    break;
  }
}
WolframChoreo.close();

The answer to the input question is now contained in the string variable response, which we can now give back to the user.

To print the answer to the Serial Monitor add the following line to the setup() function:

Serial.begin(9600);

Then add the following lines to the end of the askWolframAlpha() function:

Serial.println("Qusetion: " + question);
Serial.println("Response: " + response);
Serial.println(" ");

To speak out the answer through the circuit's speakers add the following lines to the end of the askWolframAlpha() function:

p.runShellCommand("espeak \" Thanks for the question. The answer I came up with is... " +  response +  "\"");
p.close();

Here we are utilizing eSpeak for the text to speech conversion. For more information see Text to Speech with the Arduino Yún.

The source code for this sketch is included below, with any unused functions now removed.

Conculsion

After updating Project 7's Python script and Arduino sketch you should have a fully operational 'Siri' recreation. Simply press and hold the pushbutton on the circuit, and when you're prompted by the opening tone ask a question.

There will be a slight delay between when you ask a question and when the response is read out. This is due the processing time required for both the speech to text translation and the WolframAlpha search.

Source Code

yuri.py

ArduinoMeetsLinux-P7.ino

Start Building Today

book cover Start Reading Now

Your guide to Yún developement. Arduino Meets Linux includes:

  • Over 320+ pages of content
  • OpenWrt-Yun Linux tutorials
  • Python and shell scripting
  • 7 in-depth Arduino projects

Take your Arduino projects to the next level. This book shows you how.