# Pyroomacoustics

## Summary

Pyroomacoustics is a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the package can be divided into three main components: an intuitive Python object-oriented interface to quickly construct different simulation scenarios involving multiple sound sources and microphones in 2D and 3D rooms; a fast C implementation of the image source model for general polyhedral rooms to efficiently generate room impulse responses and simulate the propagation between sources and receivers; and finally, reference implementations of popular algorithms for beamforming, direction finding, and adaptive filtering. Together, they form a package with the potential to speed up the time to market of new algorithms by significantly reducing the implementation overhead in the performance evaluation step.

### Room Acoustics Simulation

Consider the following scenario.

Suppose, for example, you wanted to produce a radio crime drama, and it so happens that, according to the scriptwriter, the story line absolutely must culminate in a satanic mass that quickly degenerates into a violent shootout, all taking place right around the altar of the highly reverberant acoustic environment of Oxford’s Christ Church cathedral. To ensure that it sounds authentic, you asked the Dean of Christ Church for permission to record the final scene inside the cathedral, but somehow he fails to be convinced of the artistic merit of your production, and declines to give you permission. But recorded in a conventional studio, the scene sounds flat. So what do you do?

—Schnupp, Nelken, and King, Auditory Neuroscience, 2010

Faced with this difficult situation, pyroomacoustics can save the day by simulating the environment of the Christ Church cathedral!

At the core of the package is a room impulse response (RIR) generator based on the image source model that can handle

• Convex and non-convex rooms
• 2D/3D rooms

Both a pure python implementation and a C accelerator are included for maximum speed and compatibility.

The philosophy of the package is to abstract all necessary elements of an experiment using object oriented programming concept. Each of these elements is represented using a class and an experiment can be designed by combining these elements just as one would do in a real experiment.

Let’s imagine we want to simulate a delay-and-sum beamformer that uses a linear array with four microphones in a shoe box shaped room that contains only one source of sound. First, we create a room object, to which we add a microphone array object, and a sound source object. Then, the room object has methods to compute the RIR between source and receiver. The beamformer object then extends the microphone array class and has different methods to compute the weights, for example delay-and-sum weights. See the example below to get an idea of what the code looks like.

The Room class also allows one to process sound samples emitted by sources, effectively simulating the propagation of sound between sources and microphones. At the input of the microphones composing the beamformer, an STFT (short time Fourier transform) engine allows to quickly process the signals through the beamformer and evaluate the output.

### Reference Implementations

In addition to its core image source model simulation, pyroomacoustics also contains a number of reference implementations of popular audio processing algorithms for

• beamforming
• direction of arrival (DOA) finding

We use an object-oriented approach to abstract the details of specific algorithms, making them easy to compare. Each algorithm can be tuned through optional parameters. We have tried to pre-set values for the tuning parameters so that a run with the default values will in general produce reasonable results.

## Quick Install

Install the package with pip:

The requirements are:

# 【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.2-sd.III 》

︰昔有『 Thue 』者，其言曰︰

Thue represents one of the simplest possible ways to construe 『constraint-based』基於約束 programming. It is to the constraint-based 『paradigm』典範 what languages like『 OISC 』── 單指令集電腦 One instruction set computer ── are to the imperative paradigm; in other words, it’s a 『tar pit』焦油坑.

，果然概念廣大能通天！！『 SRS 』卻是個『奧秘語言』？？

︰ 當真是︰ ☆★ 之火，可以燎原。能不精思實念乎？

── 摘自《M♪o 之學習筆記本《丑》控制︰【黑水北】當思恒念

$\overline{GH} = D$
$\overline{BG} = X$
$\overline{AB} = H$
$\angle AHB = \alpha$
$\angle AGB = \beta$

$\tan(\alpha) = \frac{H}{D + X}$
$\tan(\beta) = \frac{H}{X}$

$H = D \cdot \tan(\alpha) \cdot \frac{1}{1 - \frac{\tan(\alpha)}{\tan(\beta)}}$

─── 摘自《失之豪釐，差以千里！！《上》

─── 《格點圖像算術《投影幾何》【一】

# Direction of arrival

In signal processing literature, direction of arrival denotes the direction from which usually a propagating wave arrives at a point, where usually a set of sensors are located. These set of sensors forms what is called a sensor array. Often there is the associated technique of beamforming which is estimating the signal from a given direction. Various engineering problems addressed in the associated literature are:

• Find the direction relative to the array where the sound source is located
• Direction of different sound sources around you are also located by you using a process similar to those used by the algorithms in the literature
• Radio telescopes use these techniques to look at a certain location in the sky
• Recently beamforming has also been used in RF applications such as wireless communication. Compared with the spatial diversity techniques, beamforming is preferred in terms of complexity. On the other hand, beamforming in general has much lower data rates. In multiple access channels (CDMA, FDMA, TDMA), beamforming is necessary and sufficient
• Various techniques for calculating the direction of arrival, such as Angle of Arrival (AoA), Time Difference of Arrival (TDOA), Frequency Difference of Arrival (FDOA), or other similar associated techniques.

Implementation of an Acoustic Sensor Array on a Mobile Robotic Device for Estimating Location of a Stationary Target

Tripp McGehee

Supervisor: Professor Arye Nehorai

Department of Electrical and Systems Engineering
Washington University in St. Louis
Spring 2007

### Abstract

We implemented an acoustic sensor array on a mobile robotic device for estimating the location of a stationary target. Our goal was to build a robot that could adaptively locate and move towards a stationary sound source. We mounted an array of four omnidirectional microphones with their respective sound cards on a Lego Mindstorm. The measurements were transmitted to a computer through USB port and processed using Labview. As a first approach, we estimated the time differences of arrivals of the sound wave reaching each of the microphones for estimating the direction of the acoustic source relative to the robot. Then, the robot transmitted the command via USB to rotate towards the estimated direction of the sound. We addressed two major technical issues while implementing this project: sensor calibration and simultaneous sampling using four independent sound cards. The results of our first experiment in finding the direction of the acoustic source are encouraging. However, a more precise sampling control is required to enable implementing successfully more sophisticated algorithms, such as maximum likelihood estimation.

※ 註

# numpy.correlate

numpy.correlate(a, v, mode=’valid’)
Cross-correlation of two 1-dimensional sequences.

This function computes the correlation as generally defined in signal processing texts:

with a and v sequences being zero-padded where necessary and conj being the conjugate.

Parameters: a, v : array_like Input sequences. mode : {‘valid’, ‘same’, ‘full’}, optional Refer to the convolve docstring. Note that the default is ‘valid’, unlike convolve, which uses ‘full’. old_behavior : bool old_behavior was removed in NumPy 1.10. If you need the old behavior, use multiarray.correlate. out : ndarray Discrete cross-correlation of a and v.

convolve
Discrete, linear convolution of two one-dimensional sequences.
multiarray.correlate
Old, no conjugate, version of correlate.

Notes

The definition of correlation above is not unique and sometimes correlation may be defined differently. Another common definition is:

which is related to c_{av}[k] by c'_{av}[k] = c_{av}[-k].

# Autocorrelation

The autocorrelation of a signal describes the similarity of a signal against a time-shifted version of itself. For a signal , the autocorrelation is:

In this equation, is often called the lag parameter. is maximized at and is symmetric about .

The autocorrelation is useful for finding repeated patterns in a signal. For example, at short lags, the autocorrelation can tell us something about the signal’s fundamental frequency. For longer lags, the autocorrelation may tell us something about the tempo of a musical signal.

# 【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.2-sd.II 》

⊙ㄚ！！『讀書』一事，怎是一個□字了得？『讀錯了書』是書不好 ，『錯讀得書』是讀不好。該如何是好？也許有個順口溜，能夠回答這個問題︰

─── 要是，需是不了不好 ─── 紅樓夢

─── 《讀錯□書⊙錯讀□書

ParameterError: Input buffer must be contiguous.

# librosa.display.waveplot

librosa.display.waveplot(y, sr=22050, max_points=50000.0, x_axis=’time’, offset=0.0, max_sr=1000, **kwargs)
Plot the amplitude envelope of a waveform.

If y is monophonic, a filled curve is drawn between [-abs(y), abs(y)].

If y is stereo, the curve is drawn between [-abs(y[1]), abs(y[0])], so that the left and right channels are drawn above and below the axis, respectively.

Long signals (duration >= max_points) are down-sampled to at most max_sr before plotting.

Parameters: y : np.ndarray [shape=(n,) or (2,n)] audio time series (mono or stereo)

This section covers advanced use cases for input and output which go beyond the I/O functionality currently provided by librosa.

librosa uses audioread for reading audio. While we chose this library for best flexibility and support of various compressed formats like MP3: some specific formats might not be supported. Especially specific WAV subformats like 24bit PCM or 32bit float might cause problems depending on your installed audioread codecs. libsndfile covers a bunch of these formats. There is a neat wrapper for libsndfile called PySoundFile which makes it easy to use the library from python.

Note

See installation instruction for PySoundFile here.

Reading audio files using PySoundFile is similmar to the method in librosa. One important difference is that the read data is of shape (nb_samples, nb_channels) compared to (nb_channels, nb_samples) in <librosa.core.load>. Also the signal is not resampled to 22050 Hz by default, hence it would need be transposed and resampled for further processing in librosa. The following example is equivalent to librosa.load(librosa.util.example_audio_file()):

# 【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.2-sd.I 》

︰老杜倡議拗體詩，誰知拗得拗不得？

‧高級的

‧沒學過的

‧ㄍㄚˋ伊講不通的

─── 摘自《M♪o 之學習筆記本《子》開關︰【䷝】狀態編碼

─── 《神經網絡【學而堯曰】十

# Uberi/speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline. https://pypi.python.org/pypi/SpeechRe…

# SpeechRecognition

Library for performing speech recognition, with support for several engines and APIs, online and offline.

Speech recognition engine/API support:

Quickstart: pip install SpeechRecognition. See the “Installing” section for more details.

To quickly try it out, run python -m speech_recognition after installing.

## Library Reference

The library reference documents every publicly accessible object in the library. This document is also included under reference/library-reference.rst.

See Notes on using PocketSphinx for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under reference/pocketsphinx.rst.

……

## Requirements

To use all of the functionality of the library, you should have:

• Python 2.6, 2.7, or 3.3+ (required)
• PyAudio 0.2.11+ (required only if you need to use microphone input, Microphone)
• PocketSphinx (required only if you need to use the Sphinx recognizer, recognizer_instance.recognize_sphinx)
• Google API Client Library for Python (required only if you need to use the Google Cloud Speech API, recognizer_instance.recognize_google_cloud)
• FLAC encoder (required only if the system is not x86-based Windows/Linux/OS X)

The following requirements are optional, but can improve or extend functionality in some situations:

• On Python 2, and only on Python 2, some functions (like recognizer_instance.recognize_bing) will run slower if you do not have Monotonic for Python 2 installed.
• If using CMU Sphinx, you may want to install additional language packs to support languages like International French or Mandarin Chinese.

The following sections go over the details of each requirement.

───

# 【鼎革‧革鼎】︰ Raspbian Stretch 《六之 K.1 》

《淮南子‧天文》講︰巳則生已定也。或存巳之古意，新生兒。

，作此《大哉問》？？

□︰禁果是蘋果嗎？
○︰莫宰羊！
□︰那蘋果是禁果嗎？
○︰煩惱即菩提！！

……

︰☿ 《風俗通》裡記載：杜宣夏至日赴飲，見酒杯中似有蛇 ，然不敢不飲。酒後胸腹痛切，多方醫治不愈。後得知壁上赤弩照于杯中，影如蛇，病即愈。

， 未雨先綢繆！又豈是杯弓蛇影胡疑猜？寧夜靜思，

─── 《【䷁】黃裳元吉

# Play and Record Sound with Python

This Python module provides bindings for the PortAudio library and a few convenience functions to play and record NumPy arrays containing audio signals.

Documentation:
Source code repository and issue tracker:
https://github.com/spatialaudio/python-sounddevice/
MIT — see the file LICENSE for details.