Talking to machines

Article of 26 January 2018 


‘Siri, what will the weather be like today?’ ‘OK, Google, turn the music down!’ ‘Alexa, order the blue trainers for me again!’ Technology that obeys words has long since become routine with voice assistants. In order for voice control to work smoothly, complex software processes run in the background. The correct interpretation of spoken commands requires plenty of preliminary work, high computing power and, last but not least, artificial intelligence.

A simple request, a short sentence – the human brain interprets what is meant very easily, makes the connection and initiates an appropriate reaction. This is much more complicated for a machine. If you want to control technical devices with speech, many individual steps are required.

Detecting and interpreting speech

‘Give me a pen!’ – a very simple command, but one which makes the computer work hard in the background. Firstly, the spoken sentence is turned into text. The speech recognition software must use the frequency patterns to identify which words are contained and overcome many challenges in doing so: unclear pronunciation, similar-sounding words with different meanings and different intonations or dialects. By making a comparison with extensive data bases, in which countless examples of words and their frequency patterns are stored, the software works out which words are being dealt with.

In the next step the task is to work out the meaning of the sentence. To do this, the software sends the text to a language interface, which checks it for certain keywords. Beforehand, the programmer must determine all the necessary terms and commands – called intents – as well as their synonyms, and define which action is supposed to be behind them in each case. For example, ‘give’ is identified as the request to transport an object to a particular place, whilst the word ‘me’ is understood to be a person or an objective of the action.

Artificial intelligence finds the optimal solution

Once the interface has identified the meaning of the sentence, it supplies a context object, which is a software code, with which the appliance’s control system can work. In order to give the machine a clear instruction, the ‘artificial intelligence’ now comes into play with other software. This evaluates the content of the context object and at the same time gets information from various sensors about the position of the appliance and its surroundings. The software houses modules for different solution methods, which are assigned to certain actions. The program uses all this information to construct a command, for example, as to how and where a gripper arm should move and sends it to the device control. The sensor technology thus detects where the pen is on the desk and what path the machine must take to pick it up and hand it to a person. In this way the software gradually learns which solution method is the best for each action and applies this knowledge to the next action.

All these complex procedures must take place in fractions of a second, because the person expects a prompt and above all correct reaction from the machine. Whilst voice recognition works relatively well after 30 years of application, there is still plenty of research and development work behind the voice control of machines – until at some point we will be able to converse as naturally with a machine as with our neighbour.

You can see how Festo is using voice control technology in a new concept from the Bionic Learning Network at the Hannover Messe 2018 – don’t miss it!