downloads

AudioIO

Platform independent interfacing of numpy arrays of floats with audio files and devices for scientific data analysis.

Features

Audio data are always numpy arrays of floats with values ranging between -1 and 1 independent of how the data are stored in an audio file.
load_audio() function for loading data of a whole audio file at once.
Blockwise, random-access loading of large or sequential audio files (class AudioLoader based on class BufferedArray).
Read arbitrary metadata() as nested dictionaries of key-value pairs. Supported RIFF chunks are INFO lists, BEXT, iXML, and GUANO.
Read markers(), i.e. cue points with spans, labels, and descriptions.
write_audio() function for writing data, metadata, and markers to an audio file.
Platform independent, synchronous (blocking) and asynchronous (non blocking) playback of numpy arrays via play() with automatic resampling to match supported sampling rates.
Detailed and platform specific installation instructions (pip, conda, Debian and RPM based Linux packages, homebrew for MacOS) for all supported audio packages (see audiomodules).

The AudioIO modules try to use whatever audio packages are installed on your system to achieve their tasks. AudioIO, however, adds own code for handling metadata and marker lists.

Installation

AudioIO is available at PyPi. Simply run:

pip install audioio

Then you can use already installed audio packages for reading and writing audio files and for playing audio data. However, audio file formats supported by the python standard library are limited to basic wave files and playback capabilities are poor. If you need support for additional audio file formats or proper sound output, you need to install additional packages.

See installation for further instructions and recommendations on additional audio packages.

Usage

information.

import audioio as aio

Loading audio data

Load an audio file into a numpy array using load_audio():

data, samplingrate = aio.load_audio('audio/file.wav')

The read in data are always numpy arrays of floats ranging between -1 and 1. The arrays are always 2-D arrays with first axis time and second axis channel, even for single channel data.

Plot the first channel:

import numpy as np
import matplotlib.pyplot as plt

time = np.arange(len(data))/samplingrate
plt.plot(time, data[:,0])
plt.show()

Get a nested dictionary with key-value pairs of the file's metadata and print it using metadata() and print_metadata():

md = aio.metadata('audio/file.wav')
aio.print_metadata(md)

See the audiometadata module for functions to read, write, and change metadata of various types.

Get and print marker positions, spans, labels and texts using markers() and print_markers():

locs, labels = aio.markers('audio/file.wav')
aio.print_markers(locs, labels)

You can also randomly access chunks of data of an audio file, without loading the entire file into memory, by means of the AudioLoader class. This is really handy for analysing very long sound recordings:

# open audio file with a buffer holding 60 seconds of data:
with aio.AudioLoader('audio/file.wav', 60.0) as data:
     block = 1000
     rate = data.samplerate
     for i in range(len(data)//block):
         x = data[i*block:(i+1)*block]
         # ... do something with x and rate

Instead of a single audio file it can also handle recordings that are split over many files. Just pass all these files as a list to the AudioLoader class.

Even simpler, iterate in blocks over the file with overlap using the blocks() generator:

from scipy.signal import spectrogram
nfft = 2048
with aio.AudioLoader('some/audio.wav') as data:
    for x in data.blocks(100*nfft, nfft//2):
        f, t, Sxx = spectrogram(x, nperseg=nfft, noverlap=nfft//2)

Metadata and markers can be accessed by the metadata() and markers() member functions of the AudioLoader object:

with aio.AudioLoader('audio/file.wav', 60.0) as data:
     md = data.metadata()
     locs, labels = data.markers()

See API documentation of the audioloader, audiometadata, and audiomarkers modules for details.

Writing audio data

Write a 1-D or 2-D numpy array into an audio file (data values between -1 and 1) using the write_audio() function:

aio.write_audio('audio/file.wav', data, samplerate)

Again, in 2-D arrays the first axis (rows) is time and the second axis the channel (columns).

Metadata in form of a nested dictionary with key-value pairs, marker positions and spans (locs) as well as associated labels and texts (labels) can also be passed on to the write_audio() function:

aio.write_audio('audio/file.wav', data, samplerate, md, locs, labels)

See API documentation of the audiowriter module for details.

Converting audio files

AudioIO provides a command line script for converting, downsampling, renaming and merging audio files:

> audioconverter -e float -o test.wav test.mp3

If possible, audioconverter tries to keep metadata and marker lists.

See documentation of the audioconverter module for details.

Display metadata and markers

AudioIO provides a command line script that prints metadata and markers of audio files to the console:

> audiometadata test.wav

See documentation of the audiometadata module for details.

Fixing time stamps

AudioIO provides a command line script for fixing time stamps in the metadata and file names of audio files. This is useful in case the real-time clock of a recorder failed.

Let's assume you have a continous recording spread over the following four files each covering 3 minutes of the recording:

logger-20190101T000015.wav
logger-20190101T000315.wav
logger-20190101T000615.wav
logger-20190101T000915.wav

However, the recording was actually started at 2025-06-09T10:42:17. Obviously, the real-time clock failed, since all times in the file name and the time stamps in the metadata start in the year 2019.

To fix this, run

> fixtimestamps -s '2025-06-09T10:42:17' logger-2019*.wav

Then the files are renamed:

logger-20190101T000015.wav -> logger-20250609T104217.wav
logger-20190101T000315.wav -> logger-20250609T104517.wav
logger-20190101T000615.wav -> logger-20250609T104817.wav
logger-20190101T000915.wav -> logger-20250609T105117.wav

and the time stamps in the meta data are set accordingly.

See documentation of the fixtimestamps module for details.

Playing sounds

Fade in and out (fade()) and play (play()) a 1-D or 2-D numpy array as a sound (first axis is time and second axis the channel):

aio.fade(data, samplingrate, 0.2)
aio.play(data, samplingrate)

Just beep()

aio.beep()

Beep for half a second and 440 Hz:

aio.beep(0.5, 440.0)
aio.beep(0.5, 'a4')

Musical notes are translated into frequency with the note2freq() function.

See API documentation of the playaudio module for details.

Managing audio modules

Simply run in your terminal

> audiomodules

and you get something like

Status of audio packages on this machine:
-----------------------------------------

wave              is  installed (F)
ewave             not installed (F)
scipy.io.wavfile  is  installed (F)
soundfile         is  installed (F)
wavefile          not installed (F)
audioread         is  installed (F)
pydub             is  installed (F)
pyaudio           not installed (D)
sounddevice       NOT installed (D)
simpleaudio       not installed (D)
soundcard         not installed (D)
ossaudiodev       is  installed (D)
winsound          not installed (D)

F: file I/O, D: audio device

For better performance you should install the following modules:

sounddevice:
------------
The sounddevice package is a wrapper of the portaudio library (http://www.portaudio.com). 
For documentation see https://python-sounddevice.readthedocs.io

First, install the following packages:

sudo apt install libportaudio2 portaudio19-dev python3-cffi

Install the sounddevice module with pip:

sudo pip install sounddevice

Use this to see which audio modules you have already installed on your system, which ones are recommended to install, and how to install them.

See API documentation of the audiomodules module for details.

Used by

thunderlab: Load and preprocess time series data.
thunderfish: Algorithms and programs for analysing electric field recordings of weakly electric fish.
audian: Python-based GUI for viewing and analyzing recordings of animal vocalizations.

Alternatives

All the audio modules AudioIO is using.

Reading and writing audio files:

wave: simple wave file interface of the python standard library.
ewave: extended wave files.
scipy.io.wavfile: simple scipy wave file interface.
SoundFile: support of many open source audio file formats via libsndfile.
wavefile: support of many open source audio file formats via libsndfile.
audioread: mpeg file support.
Pydub: mpeg support for reading and writing, playback via simlpeaudio or pyaudio.
scikits.audiolab: seems to be no longer active.

Metadata:

GUANO: Grand Unified Acoustic Notation Ontology, an extensible, open format for embedding metadata within bat acoustic recordings.

Playing sounds:

sounddevice: wrapper for portaudio.
PyAudio: wrapper for portaudio.
simpleaudio: uses ALSA on Linux, runs well on windows.
SoundCard: playback via CFFI and the native audio libraries of Linux, Windows and macOS.
ossaudiodev: playback via the outdated OSS interface of the python standard library.
winsound: native windows audio playback of the python standard library, asynchronous playback only with wave files.

Not yet supported by audioio:

mutagen: handles audio metadata of many audio file formats.
playsound: pure Python, cross platform, single function module with no dependencies for playing sounds. Plays sounds from files only.
PreferredSoundPlayer: Platfrom independt playing of sound files.
AudioPlayer: cross platform Python 3 package for playing sounds (mp3, wav, ...).

Scientific audio software:

diapason: musical notes like playaudio.note2freq.
librosa: audio and music processing in python.
TimeView: GUI application to view and analyze time series signal data.
scikit-maad: quantitative analysis of environmental audio recordings
Soundscapy: analysing and visualising soundscape assessments.
BatDetect2: detecting and classifying bat echolocation calls in high frequency audio recordings.
Batogram: viewing bat call spectrograms with GUANO metadata, including the ability to click to open the location in Google Maps.