NAO Humanoid Robot Audio

NAO Humanoid Robot Audio

NAO uses four microphones to track sounds, and its voice recognition and text-to-speech capabilities allow it to communicate in 8 languages.

Sound Source Localization

One of the main purposes of humanoid robots is to interact with people. Sound localization allows a robot to identify the direction of sounds. To produce robust and useful outputs while meeting CPU and memory requirements, NAO sound source localization is based on an approach known as “Time Difference of Arrival.”

When a nearby source emits a sound, each of NAO’s four microphones receives the sound wave at slightly different times.

For example, if someone talks to NAO on its left side, the corresponding sound wave first hits the left microphones, then the front and rear microphones a few milliseconds later, and finally the right microphone.

These differences, known as interaural time difference (ITD), can then be mathematically processed to determine the current location of the emitting source.

By solving the equation every time it hears a sound, NAO can determine the direction of the emitting source (azimuthal and elevation angles) from ITDs between the four microphones.

This feature is available as a NAOqi module called ALAudioSourceLocalization; it provides a C++ and Python API that allows precise interactions with a Python script or NAOqi module.

Two Choregraphe boxes that allow easy use of the feature inside a behavior are also available:

Possible applications include:

  • Human Detection, Tracking, and Recognition
  • Noisy Object Detection, Tracking, and Recognition
  • Speech Recognition in a specific direction
  • Speaker Recognition in a specific direction
  • Remote Monitoring/Security applications
  • Entertainment applications

Audio Signal Processing

In robotics, embedded processors have limited computational power, making it useful to perform some calculations remotely on a desktop computer or server.

This is especially true for audio signal processing; for example, speech recognition often takes place more efficiently, faster, and more accurately on a remote processor. Most modern smartphones process voice recognition remotely.

Users may want to use their own signal processing algorithms directly in the robot.

The NAOqi framework uses Simple Object Access Protocol (SOAP) to send and receive audio signals over the Web.

Sound is produced and recorded in NAO using the Advanced Linux Sound Architecture (ALSA) library.

The ALAudioDevice module manages audio inputs and outputs.

Using NAO’s audio capabilities, a wide range of experiments and research can take place in the fields of communications and human-robot interaction.

For example, users can employ NAO as a communication device, interacting with NAO (talk and hear) as if it were a human being.

Signal processing is of course an interesting example. Thanks to the audio module, you can get the raw audio data from the microphones in real time and process it with your own code.

Key Features

Hardware Platform






Tactile Sensors


Open Source



Purchasing Options

NAO Carrying Case

£12,255.60 (£10,213.00 Exc. VAT)
£4,902.00 (£4,085.00 Exc. VAT)

£384.00 (£320.00 Exc. VAT)