Module thunderlab.dataloader

Load time-series data from files.

data, rate, unit, amax = load_data('data/file.wav')

The function data_loader() loads the whole time-series from the file as a numpy array of floats. First dimension is frames, second is channels. In contrast to the audioio.load_audio() function, the values of the data array are not restricted between -1 and 1. They can assume any value wihin the range -amax to +amax with the returned unit.

data = DataLoader('data/file.wav', 60.0)

or

with DataLoader('data/file.wav', 60.0) as data:

Create an DataLoader object that loads chuncks of 60 seconds long data on demand. data can be used like a read-only numpy array of floats.

Supported file formats

Metadata

Many file formats allow to store metadata that further describe the stored time series data. We handle them as nested dictionary of key-value pairs. Load them with the metadata() function:

metadata = metadata('data/file.mat')

Markers

Some file formats also allow to store markers that mark specific positions in the time series data. Load marker positions and spans (in the 2-D array locs) and label and text strings (in the 2-D array labels) with the markers() function:

locs, labels = markers('data.wav')

Aditional, format specific functions

Global variables

var data_loader_funcs

List of implemented load functions.

Each element of the list is a tuple with the data format's name, its check and its load function.

Functions

def relacs_samplerate_unit(filepath, channel=0)

Retrieve sampling rate and unit from a relacs stimuli.dat file.

Parameters

filepath : str
Path to a relacs data directory, or a file in a relacs data directory.
channel : int
Channel (trace) number, if filepath does not specify a trace-*.raw file.

Returns

samplerate : float
Sampling rate in Hertz
unit : str
Unit of the trace, can be empty if not found

Raises

IOError/FileNotFoundError:
If the stimuli.dat file does not exist.

Valueerror

stimuli.dat file does not contain sampling rate.

def relacs_header(filepath,
store_empty=False,
first_only=False,
lower_keys=False,
flat=False,
add_sections=False)

Read key-value pairs from a relacs *.dat file header.

Parameters

filepath : str
A relacs *.dat file, can be also a zipped .gz file.
store_empty : bool
If False do not add meta data with empty values.
first_only : bool
If False only store the first element of a list.
lower_keys : bool
Make all keys lower case.
flat : bool
Do not make a nested dictionary. Use this option also to read in very old relacs metadata with ragged left alignment.
add_sections : bool
If True, prepend keys with sections names separated by '.' to make them unique.

Returns

data : dict
Nested dictionary with key-value pairs of the file header.

Raises

IOError/FileNotFoundError:
If filepath cannot be opened.
def check_relacs(file_path)

Check for valid relacs file.

Parameters

file_path : str
Path to a relacs data directory, or a file in a relacs data directory.

Returns

is_relacs : boolean
 

True if file_path is a valid relacs directory or is a file therein.

def relacs_trace_files(file_path)

Expand file path for relacs data to appropriate trace*.raw file names.

Parameters

file_path : str
Path to a relacs data directory, or a file in a relacs data directory.

Returns

trace_file_paths : list of str
List of relacs trace*.raw files.
def load_relacs(file_path, amax=1.0)

Load traces that have been recorded with relacs (https://github.com/relacs/relacs).

Parameters

file_path : str
Path to a relacs data directory, or a file in a relacs data directory.
amax : float
The amplitude range of the data.

Returns

data : 2-D array
All data traces as an 2-D numpy array, even for single channel data. First dimension is time, second is channel.
rate : float
Sampling rate of the data in Hz
unit : str
Unit of the data
amax : float
Maximum amplitude of data range.

Raises

Valueerror

  • Invalid name for relacs trace-*.raw file.
  • Sampling rates of traces differ.
  • Unit of traces differ.
def metadata_relacs(file_path,
store_empty=False,
first_only=False,
lower_keys=False,
flat=False,
add_sections=False)

Read meta-data of a relacs data set.

Parameters

file_path : str
A relacs data directory or a file therein.
store_empty : bool
If False do not add meta data with empty values.
first_only : bool
If False only store the first element of a list.
lower_keys : bool
Make all keys lower case.
flat : bool
Do not make a nested dictionary. Use this option also to read in very old relacs metadata with ragged left alignment.
add_sections : bool
If True, prepend keys with sections names separated by '.' to make them unique.

Returns

data : nested dict
Nested dictionary with key-value pairs of the meta data.
def fishgrid_spacings(metadata, unit='m')

Spacing between grid electrodes.

Parameters

metadata : dict
Fishgrid metadata obtained from metadata_fishgrid().
unit : str
Unit in which to return the spacings.

Returns

grid_dist : list of tuple of float
For each grid the distances between rows and columns in unit.
def fishgrid_grids(metadata)

Retrieve grid sizes from a fishgrid.cfg file.

Parameters

metadata : dict
Fishgrid metadata obtained from metadata_fishgrid().

Returns

grids : list of tuple of int
For each grid the number of rows and columns.
def check_fishgrid(file_path)

Check for valid fishgrid file (https://github.com/bendalab/fishgrid).

Parameters

file_path : str
Path to a fishgrid data directory or a file in a fishgrid data directory.

Returns

is_fishgrid : bool
True if file_path is a valid fishgrid data directory or a file therein.
def fishgrid_trace_files(file_path)

Expand file paths for fishgrid data to appropriate traces*.raw file names.

Parameters

file_path : str
Path to a fishgrid data directory, or a file therein.

Returns

trace_file_paths : list of str
List of fishgrid traces*.raw files.
def load_fishgrid(file_path)

Load traces that have been recorded with fishgrid (https://github.com/bendalab/fishgrid).

Parameters

file_path : str
Path to a fishgrid data directory, or a file therein.

Returns

data : 2-D array
All data traces as an 2-D numpy array, even for single channel data. First dimension is time, second is channel.
rate : float
Sampling rate of the data in Hz.
unit : str
Unit of the data.
amax : float
Maximum amplitude of data range.

Raises

Filenotfounderror

Invalid or not existing fishgrid files.

def metadata_fishgrid(file_path)

Read meta-data of a fishgrid data set.

Parameters

file_path : str
A fishgrid data directory or a file therein.

Returns

data : nested dict
Nested dictionary with key-value pairs of the meta data.
def markers_fishgrid(file_path)

Read markers of a fishgrid data set.

Parameters

file_path : str
A fishgrid data directory or a file therein.

Returns

locs : 2-D array of ints
Marker positions (first column) and spans (second column) for each marker (rows).
labels : 2-D array of string objects
Labels (first column) and texts (second column) for each marker (rows).
def check_container(filepath)

Check if file is a generic container file.

Supported file formats are:

  • python pickle files (.pkl)
  • numpy files (.npz)
  • matlab files (.mat)

Parameters

filepath : str
Path of the file to check.

Returns

is_container : bool
True, if filepath is a supported container format.
def extract_container_data(data_dict,
datakey=None,
samplekey=['rate', 'Fs', 'fs'],
timekey=['time'],
amplkey=['amax'],
unitkey='unit',
amax=1.0,
unit='a.u.')

Extract data from dictionary loaded from a container file.

Parameters

data_dict : dict
Dictionary of the data items contained in the container.
datakey : None, str, or list of str
Name of the variable holding the data. If None take the variable that is an 2D array and has the largest number of elements.
samplekey : str or list of str
Name of the variable holding the sampling rate.
timekey : str or list of str
Name of the variable holding sampling times. If no sampling rate is available, the sampling rate is retrieved from the sampling times.
amplkey : str or list of str
Name of the variable holding the amplitude range of the data.
unitkey : str
Name of the variable holding the unit of the data.
amax : None or float
If specified and no amplitude range has been found in data_dict, then this is the amplitude range of the data.
unit : None or str
If specified and no unit has been found in data_dict, then return this as the unit of the data.

Returns

data : 2-D array of floats
All data traces as an 2-D numpy array, even for single channel data. First dimension is time, second is channel.
rate : float
Sampling rate of the data in Hz.
unit : str
Unit of the data.
amax : float
Maximum amplitude of data range in unit.

Raises

Valueerror

Invalid key requested.

def load_container(file_path,
datakey=None,
samplekey=['rate', 'Fs', 'fs'],
timekey=['time'],
amplkey=['amax'],
unitkey='unit',
amax=1.0,
unit='a.u.')

Load data from a generic container file.

Supported file formats are:

  • python pickle files (.pkl)
  • numpy files (.npz)
  • matlab files (.mat)

Parameters

file_path : str
Path of the file to load.
datakey : None, str, or list of str
Name of the variable holding the data. If None take the variable that is an 2D array and has the largest number of elements.
samplekey : str or list of str
Name of the variable holding the sampling rate.
timekey : str or list of str
Name of the variable holding sampling times. If no sampling rate is available, the sampling rate is retrieved from the sampling times.
amplkey : str
Name of the variable holding the amplitude range of the data.
unitkey : str
Name of the variable holding the unit of the data. If unitkey is not a valid key, then return unitkey as the unit.
amax : None or float
If specified and no amplitude range has been found in the data container, then this is the amplitude range of the data.
unit : None or str
If specified and no unit has been found in the data container, then return this as the unit of the data.

Returns

data : 2-D array of floats
All data traces as an 2-D numpy array, even for single channel data. First dimension is time, second is channel.
rate : float
Sampling rate of the data in Hz.
unit : str
Unit of the data.
amax : float
Maximum amplitude of data range.

Raises

Valueerror

Invalid key requested.

def extract_container_metadata(data_dict, metadatakey=['metadata', 'info'])

Extract metadata from dictionary loaded from a container file.

Parameters

data_dict : dict
Dictionary of the data items contained in the container.
metadatakey : str or list of str
Name of the variable holding the metadata.

Returns

metadata : nested dict
Nested dictionary with key-value pairs of the meta data.
def metadata_container(file_path, metadatakey=['metadata', 'info'])

Read meta-data of a container file.

Parameters

file_path : str
A container file.
metadatakey : str or list of str
Name of the variable holding the metadata.

Returns

metadata : nested dict
Nested dictionary with key-value pairs of the meta data.
def extract_container_markers(data_dict,
poskey=['positions'],
spanskey=['spans'],
labelskey=['labels'],
descrkey=['descriptions'])

Extract markers from dictionary loaded from a container file.

Parameters

data_dict : dict
Dictionary of the data items contained in the container.
poskey : str or list of str
Name of the variable holding positions of markers.
spanskey : str or list of str
Name of the variable holding spans of markers.
labelskey : str or list of str
Name of the variable holding labels of markers.
descrkey : str or list of str
Name of the variable holding descriptions of markers.

Returns

locs : 2-D array of ints
Marker positions (first column) and spans (second column) for each marker (rows).
labels : 2-D array of string objects
Labels (first column) and texts (second column) for each marker (rows).
def markers_container(file_path,
poskey=['positions'],
spanskey=['spans'],
labelskey=['labels'],
descrkey=['descriptions'])

Read markers of a container file.

Parameters

file_path : str
A container file.
poskey : str or list of str
Name of the variable holding positions of markers.
spanskey : str or list of str
Name of the variable holding spans of markers.
labelskey : str or list of str
Name of the variable holding labels of markers.
descrkey : str or list of str
Name of the variable holding descriptions of markers.

Returns

locs : 2-D array of ints
Marker positions (first column) and spans (second column) for each marker (rows).
labels : 2-D array of string objects
Labels (first column) and texts (second column) for each marker (rows).
def check_raw(filepath)

Check if file is a raw file.

The following extensions are interpreted as raw files:

  • raw files (*.raw)
  • LabView scandata (*.scandat)

Parameters

filepath : str
Path of the file to check.

Returns

is_raw : bool
True, if filepath is a raw format.
def load_raw(file_path, rate=44000, channels=1, dtype=numpy.float32, amax=1.0, unit='a.u.')

Load data from a raw file.

Raw files just contain the data and absolutely no metadata, not even the smapling rate, number of channels, etc. Supported file formats are:

  • raw files (*.raw)
  • LabView scandata (*.scandat)

Parameters

file_path : str
Path of the file to load.
rate : float
Sampling rate of the data in Hertz.
channels : int
Number of channels multiplexed in the data.
dtype : str or numpy.dtype
The data type stored in the file.
amax : float
The amplitude range of the data.
unit : str
The unit of the data.

Returns

data : 2-D array of floats
All data traces as an 2-D numpy array, even for single channel data. First dimension is time, second is channel.
rate : float
Sampling rate of the data in Hz.
unit : str
Unit of the data.
amax : float
Maximum amplitude of data range.
def load_audioio(file_path, verbose=0, gainkey=['AIMaxVolt', 'gain'], sep='.', amax=1.0, unit='a.u.')

Load data from an audio file.

See the load_audio() function of the audioio package for more infos.

Parameters

file_path : str
Path of the file to load.
verbose : int
If > 0 show detailed error/warning messages.
gainkey : str or list of str
Key in the file's metadata that holds some gain information. If found, the data will be multiplied with the gain, and if available, the corresponding unit is returned. See the audioio.get_gain() function for details.
sep : str
String that separates section names in gainkey.
amax : float
If specified and no gain has been found in the metadata, then use this as the amplitude range.
unit : str
If specified and no gain has been found in the metadata, then return this as the unit of the data.

Returns

data : 2-D array of floats
All data traces as an 2-D numpy array, even for single channel data. First dimension is time, second is channel.
rate : float
Sampling rate of the data in Hz.
unit : str
Unit of the data if found in the metadata (see gainkey), otherwise unit.
amax : float
Maximum amplitude of data range.
def load_data(file_path, verbose=0, **kwargs)

Load time-series data from a file.

Parameters

file_path : str
Path and name of the file to load.
verbose : int
If > 0 show detailed error/warning messages.
**kwargs : dict
Further keyword arguments that are passed on to the format specific loading functions. For example: - amax: the amplitude range of the data. - 'unit': the unit of the data.

Returns

data : 2-D array
All data traces as an 2-D numpy array, even for single channel data. First dimension is time, second is channel.
rate : float
Sampling rate of the data in Hz.
unit : str
Unit of the data.
amax : float
Maximum amplitude of data range.

Raises

Valueerror

file_path is empty string.

def metadata(file_path, **kwargs)

Read meta-data from a data file.

Parameters

file_path : str
The full path and name of the file to load. For some file formats several files can be provided in a list.
**kwargs : dict
Further keyword arguments that are passed on to the format specific loading functions.

Returns

meta_data : nested dict
Meta data contained in the file. Keys of the nested dictionaries are always strings. If the corresponding values are dictionaries, then the key is the section name of the metadata contained in the dictionary. All other types of values are values for the respective key. In particular they are strings, or list of strings. But other simple types like ints or floats are also allowed.

Raises

Valueerror

file_path is empty string.

def markers(file_path)

Read markers of a data file.

Parameters

file_path : str or file handle
The data file.

Returns

locs : 2-D array of ints
Marker positions (first column) and spans (second column) for each marker (rows).
labels : 2-D array of string objects
Labels (first column) and texts (second column) for each marker (rows).

Raises

Valueerror

file_path is empty string.

def demo(file_path, plot=False)
def main(*cargs)

Call demo with command line arguments.

Parameters

cargs : list of str
Command line arguments as provided by sys.argv[1:]

Classes

class DataLoader (file_path=None, buffersize=10.0, backsize=0.0, verbose=0, **meta_kwargs)

Buffered reading of time-series data for random access of the data in the file.

This allows for reading very large data files that do not fit into memory. A DataLoader instance can be used like a huge read-only numpy array, i.e.

data = DataLoader('path/to/data/file.dat')
x = data[10000:20000,0]

The first index specifies the frame, the second one the channel.

DataLoader first determines the format of the data file and then opens the file (first line). It then reads data from the file as necessary for the requested data (second line).

Supported file formats are

  • audio files via audioio package
  • python pickle files
  • numpy .npz files
  • matlab .mat files
  • relacs trace*.raw files (www.relacs.net)
  • fishgrid traces-*.raw files

Reading sequentially through the file is always possible. If previous data are requested, then the file is read from the beginning. This might slow down access to previous data considerably. Use the backsize argument to the open functions to make sure some data are loaded before the requested frame. Then a subsequent access to the data within backsize seconds before that frame can still be handled without the need to reread the file from the beginning.

Usage:

import thunderlab.dataloader as dl
with dl.DataLoader(file_path, 60.0, 10.0) as data:
    # do something with the content of the file:
    x = data[0:10000,0]
    y = data[10000:20000,0]
    z = x + y

Normal open and close:

data = dl.DataLoader(file_path, 60.0)
x = data[:,:]  # read the whole file
data.close()

that is the same as:

data = dl.DataLoader()
data.open(file_path, 60.0)

Parameters

file_path : str
Name of the file.
buffersize : float
Size of internal buffer in seconds.
backsize : float
Part of the buffer to be loaded before the requested start index in seconds.
verbose : int
If larger than zero show detailed error/warning messages.
meta_kwargs : dict
Keyword arguments that are passed on to the _load_metadata() function.

Attributes

rate : float
The sampling rate of the data in Hertz.
channels : int
The number of channels that are read in.
frames : int
The number of frames in the file.
format : str or None
Format of the audio file.
encoding : str or None
Encoding/subtype of the audio file.
shape : tuple
Number of frames and channels of the data.
ndim : int
Number of dimensions: always 2 (frames and channels).
unit : str
Unit of the data.
ampl_min : float
Minimum amplitude the file format supports.
ampl_max : float
Maximum amplitude the file format supports.

Methods

  • len(): the number of frames
  • open(): open a data file.
  • open_*(): open a data file of a specific format.
  • close(): close the file.
  • metadata(): metadata of the file.
  • markers(): markers of the file.
  • set_unwrap(): Set parameters for unwrapping clipped data.

Construtor for initializing 2D arrays (times x channels).

Expand source code
class DataLoader(AudioLoader):
    """Buffered reading of time-series data for random access of the data in the file.
    
    This allows for reading very large data files that do not fit into
    memory.  A `DataLoader` instance can be used like a huge
    read-only numpy array, i.e.
    ```
    data = DataLoader('path/to/data/file.dat')
    x = data[10000:20000,0]
    ```
    The first index specifies the frame, the second one the channel.

    `DataLoader` first determines the format of the data file and then
    opens the file (first line). It then reads data from the file as
    necessary for the requested data (second line).

    Supported file formats are

    - audio files via `audioio` package
    - python pickle files
    - numpy .npz files
    - matlab .mat files
    - relacs trace*.raw files (www.relacs.net)
    - fishgrid traces-*.raw files

    Reading sequentially through the file is always possible. If
    previous data are requested, then the file is read from the
    beginning. This might slow down access to previous data
    considerably. Use the `backsize` argument to the open functions to
    make sure some data are loaded before the requested frame. Then a
    subsequent access to the data within `backsize` seconds before that
    frame can still be handled without the need to reread the file
    from the beginning.

    Usage:
    ------
    ```
    import thunderlab.dataloader as dl
    with dl.DataLoader(file_path, 60.0, 10.0) as data:
        # do something with the content of the file:
        x = data[0:10000,0]
        y = data[10000:20000,0]
        z = x + y
    ```
    
    Normal open and close:
    ```
    data = dl.DataLoader(file_path, 60.0)
    x = data[:,:]  # read the whole file
    data.close()
    ```    
    that is the same as:
    ```
    data = dl.DataLoader()
    data.open(file_path, 60.0)
    ```
    
    Parameters
    ----------
    file_path: str
        Name of the file.
    buffersize: float
        Size of internal buffer in seconds.
    backsize: float
        Part of the buffer to be loaded before the requested start index in seconds.
    verbose: int
        If larger than zero show detailed error/warning messages.
    meta_kwargs: dict
        Keyword arguments that are passed on to the _load_metadata() function.

    Attributes
    ----------
    rate: float
        The sampling rate of the data in Hertz.
    channels: int
        The number of channels that are read in.
    frames: int
        The number of frames in the file.
    format: str or None
        Format of the audio file.
    encoding: str or None
        Encoding/subtype of the audio file.
    shape: tuple
        Number of frames and channels of the data.
    ndim: int
        Number of dimensions: always 2 (frames and channels).
    unit: str
        Unit of the data.
    ampl_min: float
        Minimum amplitude the file format supports.
    ampl_max: float
        Maximum amplitude the file format supports.

    Methods
    -------

    - `len()`: the number of frames
    - `open()`: open a data file.
    - `open_*()`: open a data file of a specific format.
    - `close()`: close the file.
    - `metadata()`: metadata of the file.
    - `markers()`: markers of the file.
    - `set_unwrap()`: Set parameters for unwrapping clipped data.

    """

    def __init__(self, file_path=None, buffersize=10.0, backsize=0.0,
                 verbose=0, **meta_kwargs):
        super().__init__(None, buffersize, backsize,
                         verbose, **meta_kwargs)
        if file_path is not None:
            self.open(file_path, buffersize, backsize, verbose, **meta_kwargs)

    def __getitem__(self, key):
        return super(DataLoader, self).__getitem__(key)
 
    def __next__(self):
        return super(DataLoader, self).__next__()

    
    # relacs interface:        
    def open_relacs(self, file_path, buffersize=10.0, backsize=0.0,
                    verbose=0, amax=1.0):
        """Open relacs data files (www.relacs.net) for reading.

        Parameters
        ----------
        file_path: str
            Path to a relacs data directory or a file therein.
        buffersize: float
            Size of internal buffer in seconds.
        backsize: float
            Part of the buffer to be loaded before the requested start index in seconds.
        verbose: int
            If > 0 show detailed error/warning messages.
        amax: float
            The amplitude range of the data.

        Raises
        ------
        ValueError: .gz files not supported.
        """
        self.verbose = verbose
        
        if self.sf is not None:
            self._close_relacs()

        trace_file_paths = relacs_trace_files(file_path)

        # open trace files:
        self.sf = []
        self.frames = None
        self.rate = None
        self.unit = ''
        self.filepath = None
        if len(trace_file_paths) > 0:
            self.filepath = os.path.dirname(trace_file_paths[0])
        for path in sorted(trace_file_paths):
            if path[-3:] == '.gz':
                raise ValueError('.gz files not supported')
            sf = open(path, 'rb')
            self.sf.append(sf)
            if verbose > 0:
                print(f'open_relacs(file_path) with file_path={path}')
            # file size:
            sf.seek(0, os.SEEK_END)
            frames = sf.tell()//4
            if self.frames is None:
                self.frames = frames
            elif self.frames != frames:
                diff = self.frames - frames
                if diff > 1 or diff < -2:
                    raise ValueError('number of frames of traces differ')
                elif diff >= 0:
                    self.frames = frames
            sf.seek(0)
            # retrieve sampling rate and unit:
            rate, us = relacs_samplerate_unit(path)
            if self.rate is None:
                self.rate = rate
            elif rate != self.rate:
                raise ValueError('sampling rates of traces differ')
            if len(self.unit) == 0:
                self.unit = us
            elif us != self.unit:
                raise ValueError('unit of traces differ')
        self.channels = len(self.sf)
        self.shape = (self.frames, self.channels)
        self.size = self.frames * self.channels
        self.ndim = len(self.shape)
        self.format = 'RELACS'
        self.encoding = 'FLOAT'
        self.bufferframes = int(buffersize*self.rate)
        self.backframes = int(backsize*self.rate)
        self.init_buffer()
        self.offset = 0
        self.close = self._close_relacs
        self.load_audio_buffer = self._load_buffer_relacs
        self.ampl_min = -amax
        self.ampl_max = +amax
        self._load_metadata = self._metadata_relacs
        # TODO: load markers:
        self._locs = np.zeros((0, 2), dtype=int)
        self._labels = np.zeros((0, 2), dtype=object)
        self._load_markers = None
        return self

    def _close_relacs(self):
        """Close the relacs data files.
        """
        if self.sf is not None:
            for file in self.sf:
                file.close()
            self.sf = None

    def _load_buffer_relacs(self, r_offset, r_size, buffer):
        """Load new data from relacs data file.

        Parameters
        ----------
        r_offset: int
           First frame to be read from file.
        r_size: int
           Number of frames to be read from file.
        buffer: ndarray
           Buffer where to store the loaded data.
        """
        for i, file in enumerate(self.sf):
            file.seek(r_offset*4)
            data = file.read(r_size*4)
            buffer[:, i] = np.frombuffer(data, dtype=np.float32)
        

    def _metadata_relacs(self, store_empty=False, first_only=False):
        """ Load meta-data of a relacs data set.
        """
        info_path = os.path.join(self.filepath, 'info.dat')
        if not os.path.exists(info_path):
            return {}
        return relacs_header(info_path, store_empty, first_only)

    
    # fishgrid interface:        
    def open_fishgrid(self, file_path, buffersize=10.0, backsize=0.0,
                      verbose=0):
        """Open fishgrid data files (https://github.com/bendalab/fishgrid) for reading.

        Parameters
        ----------
        file_path: str
            Path to a fishgrid data directory, or a file therein.
        buffersize: float
            Size of internal buffer in seconds.
        backsize: float
            Part of the buffer to be loaded before the requested start index in seconds.
        verbose: int
            If > 0 show detailed error/warning messages.
        """
        self.verbose = verbose
        
        if self.sf is not None:
            self._close_fishgrid()

        trace_file_paths = fishgrid_trace_files(file_path)
        self.filepath = None
        if len(trace_file_paths) > 0:
            self.filepath = os.path.dirname(trace_file_paths[0])
        self._load_metadata = metadata_fishgrid
        self._load_markers = markers_fishgrid

        # open grid files:
        grids = fishgrid_grids(self.metadata())
        grid_sizes = [r*c for r,c in grids]
        self.channels = 0
        for g, path in enumerate(trace_file_paths):
            self.channels += grid_sizes[g]
        self.sf = []
        self.grid_channels = []
        self.grid_offs = []
        offs = 0
        self.frames = None
        self.rate = get_number(self.metadata(), 'Hz', 'AISampleRate')
        v, self.unit = get_number_unit(self.metadata(), 'AIMaxVolt')
        if v is not None:
            self.ampl_min = -v
            self.ampl_max = +v
            
        for g, path in enumerate(trace_file_paths):
            sf = open(path, 'rb')
            self.sf.append(sf)
            if verbose > 0:
                print(f'open_fishgrid(file_path) with file_path={path}')
            # grid channels:
            self.grid_channels.append(grid_sizes[g])
            self.grid_offs.append(offs)
            offs += grid_sizes[g]
            # file size:
            sf.seek(0, os.SEEK_END)
            frames = sf.tell()//4//grid_sizes[g]
            if self.frames is None:
                self.frames = frames
            elif self.frames != frames:
                diff = self.frames - frames
                if diff > 1 or diff < -2:
                    raise ValueError('number of frames of traces differ')
                elif diff >= 0:
                    self.frames = frames
            sf.seek(0)
        self.shape = (self.frames, self.channels)
        self.size = self.frames * self.channels
        self.ndim = len(self.shape)
        self.format = 'FISHGRID'
        self.encoding = 'FLOAT'
        self.bufferframes = int(buffersize*self.rate)
        self.backframes = int(backsize*self.rate)
        self.init_buffer()
        self.offset = 0
        self.close = self._close_fishgrid
        self.load_audio_buffer = self._load_buffer_fishgrid
        return self

    def _close_fishgrid(self):
        """Close the fishgrid data files.
        """
        if self.sf is not None:
            for file in self.sf:
                file.close()
            self.sf = None

    def _load_buffer_fishgrid(self, r_offset, r_size, buffer):
        """Load new data from relacs data file.

        Parameters
        ----------
        r_offset: int
           First frame to be read from file.
        r_size: int
           Number of frames to be read from file.
        buffer: ndarray
           Buffer where to store the loaded data.
        """
        for file, gchannels, goffset in zip(self.sf, self.grid_channels, self.grid_offs):
            file.seek(r_offset*4*gchannels)
            data = file.read(r_size*4*gchannels)
            buffer[:, goffset:goffset+gchannels] = np.frombuffer(data, dtype=np.float32).reshape((-1, gchannels))


    # container interface:
    def open_container(self, file_path, buffersize=10.0,
                       backsize=0.0, verbose=0, datakey=None,
                       samplekey=['rate', 'Fs', 'fs'],
                       timekey=['time'], amplkey=['amax'], unitkey='unit',
                       metadatakey=['metadata', 'info'],
                       poskey=['positions'],
                       spanskey=['spans'], labelskey=['labels'],
                       descrkey=['descriptions'],
                       amax=1.0, unit='a.u.'):
        """Open generic container file.

        Supported file formats are:

        - python pickle files (.pkl)
        - numpy files (.npz)
        - matlab files (.mat)

        Parameters
        ----------
        file_path: str
            Path to a container file.
        buffersize: float
            Size of internal buffer in seconds.
        backsize: float
            Part of the buffer to be loaded before the requested start index in seconds.
        verbose: int
            If > 0 show detailed error/warning messages.
        datakey: None, str, or list of str
            Name of the variable holding the data.  If `None` take the
            variable that is an 2D array and has the largest number of
            elements.
        samplekey: str or list of str
            Name of the variable holding the sampling rate.
        timekey: str or list of str
            Name of the variable holding sampling times.
            If no sampling rate is available, the sampling rate is retrieved
            from the sampling times.
        amplkey: str or list of str
            Name of the variable holding the amplitude range of the data.
        unitkey: str
            Name of the variable holding the unit of the data.
        metadatakey: str or list of str
            Name of the variable holding the metadata.
        poskey: str or list of str
            Name of the variable holding positions of markers.
        spanskey: str or list of str
            Name of the variable holding spans of markers.
        labelskey: str or list of str
            Name of the variable holding labels of markers.
        descrkey: str or list of str
            Name of the variable holding descriptions of markers.
        amax: None or float
            If specified and no amplitude range has been found in the data
            container, then this is the amplitude range of the data.
        unit: None or str
            If specified and no unit has been found in the data container,
            then return this as the unit of the data.

        Raises
        ------
        ValueError:
            Invalid key requested.
        """
        self.verbose = verbose
        data_dict = {}
        ext = os.path.splitext(file_path)[1]
        if ext == '.pkl':
            import pickle
            with open(file_path, 'rb') as f:
                data_dict = pickle.load(f)
            self.format = 'PKL'
        elif ext == '.npz':
            data_dict = np.load(file_path)
            self.format = 'NPZ'
        elif ext == '.mat':
            from scipy.io import loadmat
            data_dict = loadmat(file_path, squeeze_me=True)
            self.format = 'MAT'
        self.buffer, self.rate, self.unit, amax = \
            extract_container_data(data_dict, datakey, samplekey,
                                   timekey, amplkey, unitkey, amax, unit)
        self.filepath = file_path
        self.channels = self.buffer.shape[1]
        self.frames = self.buffer.shape[0]
        self.shape = self.buffer.shape
        self.ndim = self.buffer.ndim
        self.size = self.buffer.size
        self.encoding = self.numpy_encodings[self.buffer.dtype]
        self.ampl_min = -amax
        self.ampl_max = +amax
        self.offset = 0
        self.buffer_changed = np.zeros(self.channels, dtype=bool)
        self.bufferframes = self.frames
        self.backsize = 0
        self.close = self._close_container
        self.load_audio_buffer = self._load_buffer_container
        self._metadata = extract_container_metadata(data_dict, metadatakey)
        self._load_metadata = None
        self._locs, self._labels = extract_container_markers(data_dict,
                                                             poskey,
                                                             spanskey,
                                                             labelskey,
                                                             descrkey)
        self._load_markers = None

    def _close_container(self):
        """Close container. """
        pass

    def _load_buffer_container(self, r_offset, r_size, buffer):
        """Load new data from container."""
        buffer[:, :] = self.buffer[r_offset:r_offset + r_size, :]


    # raw data interface:
    def open_raw(self, file_path, buffersize=10.0, backsize=0.0,
                 verbose=0, rate=44000, channels=1, dtype=np.float32,
                 amax=1.0, unit='a.u.'):
        """Load data from a raw file.

        Raw files just contain the data and absolutely no metadata, not
        even the smapling rate, number of channels, etc.
        Supported file formats are:

        - raw files (*.raw)
        - LabView scandata (*.scandat)

        Parameters
        ----------
        file_path: str
            Path of the file to load.
        buffersize: float
            Size of internal buffer in seconds.
        backsize: float
            Part of the buffer to be loaded before the requested start index in seconds.
        verbose: int
            If > 0 show detailed error/warning messages.
        rate: float
            Sampling rate of the data in Hertz.
        channels: int
            Number of channels multiplexed in the data.
        dtype: str or numpy.dtype
            The data type stored in the file.
        amax: float
            The amplitude range of the data.
        unit: str
            The unit of the data.
        """
        self.verbose = verbose
        self.filepath = file_path
        self.sf = open(file_path, 'rb')
        if verbose > 0:
            print(f'open_raw(file_path) with file_path={file_path}')
        self.dtype = np.dtype(dtype)
        self.rate = float(rate)
        # file size:
        self.sf.seek(0, os.SEEK_END)
        self.frames = self.sf.tell()//self.dtype.itemsize
        self.sf.seek(0)
        self.channels = int(channels)
        self.shape = (self.frames, self.channels)
        self.ndim = len(self.shape)
        self.size = self.frames*self.channels
        self.format = 'RAW'
        self.encoding = self.numpy_encodings.get(self.dtype, 'UNKNOWN')
        self.unit = unit
        self.ampl_max = float(amax)
        self.ampl_min = -self.ampl_max
        self.offset = 0
        self.bufferframes = int(buffersize*self.rate)
        self.backframes = int(backsize*self.rate)
        self.init_buffer()
        self.close = self._close_raw
        self.load_audio_buffer = self._load_buffer_raw
        self._metadata = None
        self._load_metadata = None
        self._locs = None
        self._labels = None
        self._load_markers = None

    def _close_raw(self):
        """Close raw file. """
        self.sf.close()
        self.sf = None

    def _load_buffer_raw(self, r_offset, r_size, buffer):
        """Load new data from container."""
        self.sf.seek(r_offset*self.dtype.itemsize)
        raw_data = self.sf.read(r_size*self.dtype.itemsize)
        raw_data = np.frombuffer(raw_data, dtype=self.dtype)
        raw_data = raw_data.reshape(-1, self.channels)
        # recode:
        if self.dtype == np.dtype('int16'):
            data = raw_data.astype('float32')
            data *= self.ampl_max/2**15
        elif self.dtype == np.dtype('int32'):
            data = raw_data.astype(float)
            data *= self.ampl_max/2**31
        elif self.dtype == np.dtype('int64'):
            data = raw_data.astype(float)
            data *= self.ampl_max/2**63
        else:
            data = raw_data
        buffer[:, :] = data

    
    # audioio interface:        
    def open_audioio(self, file_path, buffersize=10.0, backsize=0.0,
                     verbose=0, gainkey=default_gain_keys, sep='.',
                     amax=None, unit='a.u.'):
        """Open an audio file.

        See the [audioio](https://github.com/bendalab/audioio) package
        for details.

        Parameters
        ----------
        file_path: str
            Path to an audio file.
        buffersize: float
            Size of internal buffer in seconds.
        backsize: float
            Part of the buffer to be loaded before the requested start index
            in seconds.
        verbose: int
            If > 0 show detailed error/warning messages.
        gainkey: str or list of str
            Key in the file's metadata that holds some gain information.
            If found, the data will be multiplied with the gain,
            and if available, the corresponding unit is returned.
            See the [audioio.get_gain()](https://bendalab.github.io/audioio/api/audiometadata.html#audioio.audiometadata.get_gain) function for details.
        sep: str
            String that separates section names in `gainkey`.
        amax: None or float
            If specified and no gain has been found in the metadata,
            then use this as the amplitude range.
        unit: None or str
            If specified and no gain has been found in the metadata,
            then this is the unit of the data.

        """
        self.verbose = verbose
        super(DataLoader, self).open(file_path, buffersize, backsize, verbose)
        md = self.metadata()
        fac, unit = get_gain(md, gainkey, sep, amax, unit)
        if fac is None:
            self.gain_fac = 1.0 
        else:
            self.gain_fac = fac
            self._load_buffer_audio_org = self.load_audio_buffer
            self.load_audio_buffer = self._load_buffer_audioio
        self.ampl_min *= self.gain_fac
        self.ampl_max *= self.gain_fac
        self.unit = unit
        return self
    
    def _load_buffer_audioio(self, r_offset, r_size, buffer):
        """Load and scale new data from an audio file.

        Parameters
        ----------
        r_offset: int
           First frame to be read from file.
        r_size: int
           Number of frames to be read from file.
        buffer: ndarray
           Buffer where to store the loaded data.
        """
        self._load_buffer_audio_org(r_offset, r_size, buffer)
        buffer *= self.gain_fac

        
    def open(self, file_path, buffersize=10.0, backsize=0.0,
             verbose=0, **kwargs):
        """Open file with time-series data for reading.

        Parameters
        ----------
        file_path: str or list of str
            Path to a data files or directory.
        buffersize: float
            Size of internal buffer in seconds.
        backsize: float
            Part of the buffer to be loaded before the requested start index
            in seconds.
        verbose: int
            If > 0 show detailed error/warning messages.
        **kwargs: dict
            Further keyword arguments that are passed on to the 
            format specific opening functions.
            For example:
            - `amax`: the amplitude range of the data.
            - 'unit': the unit of the data.

        Raises
        ------
        ValueError:
            `file_path` is empty string.
        """
        # list of implemented open functions:
        data_open_funcs = (
            ('relacs', check_relacs, self.open_relacs, 1),
            ('fishgrid', check_fishgrid, self.open_fishgrid, 1),
            ('container', check_container, self.open_container, 1),
            ('raw', check_raw, self.open_raw, 1),
            ('audioio', None, self.open_audioio, 0),
            )
        if len(file_path) == 0:
            raise ValueError('input argument file_path is empty string.')
        # open data:
        for name, check_file, open_file, v in  data_open_funcs:
            if check_file is None or check_file(file_path):
                open_file(file_path, buffersize, backsize, verbose, **kwargs)
                if v*verbose > 1:
                    if self.format is not None:
                        print(f'  format       : {self.format}')
                    if self.encoding is not None:
                        print(f'  encoding     : {self.encoding}')
                    print(f'  sampling rate: {self.rate} Hz')
                    print(f'  channels     : {self.channels}')
                    print(f'  frames       : {self.frames}')
                    print(f'  range        : {self.ampl_max:g}{self.unit}')
                break
        return self

Ancestors

  • audioio.audioloader.AudioLoader
  • audioio.bufferedarray.BufferedArray

Methods

def open_relacs(self, file_path, buffersize=10.0, backsize=0.0, verbose=0, amax=1.0)

Open relacs data files (www.relacs.net) for reading.

Parameters

file_path : str
Path to a relacs data directory or a file therein.
buffersize : float
Size of internal buffer in seconds.
backsize : float
Part of the buffer to be loaded before the requested start index in seconds.
verbose : int
If > 0 show detailed error/warning messages.
amax : float
The amplitude range of the data.

Raises

ValueError: .gz files not supported.

def open_fishgrid(self, file_path, buffersize=10.0, backsize=0.0, verbose=0)

Open fishgrid data files (https://github.com/bendalab/fishgrid) for reading.

Parameters

file_path : str
Path to a fishgrid data directory, or a file therein.
buffersize : float
Size of internal buffer in seconds.
backsize : float
Part of the buffer to be loaded before the requested start index in seconds.
verbose : int
If > 0 show detailed error/warning messages.
def open_container(self,
file_path,
buffersize=10.0,
backsize=0.0,
verbose=0,
datakey=None,
samplekey=['rate', 'Fs', 'fs'],
timekey=['time'],
amplkey=['amax'],
unitkey='unit',
metadatakey=['metadata', 'info'],
poskey=['positions'],
spanskey=['spans'],
labelskey=['labels'],
descrkey=['descriptions'],
amax=1.0,
unit='a.u.')

Open generic container file.

Supported file formats are:

  • python pickle files (.pkl)
  • numpy files (.npz)
  • matlab files (.mat)

Parameters

file_path : str
Path to a container file.
buffersize : float
Size of internal buffer in seconds.
backsize : float
Part of the buffer to be loaded before the requested start index in seconds.
verbose : int
If > 0 show detailed error/warning messages.
datakey : None, str, or list of str
Name of the variable holding the data. If None take the variable that is an 2D array and has the largest number of elements.
samplekey : str or list of str
Name of the variable holding the sampling rate.
timekey : str or list of str
Name of the variable holding sampling times. If no sampling rate is available, the sampling rate is retrieved from the sampling times.
amplkey : str or list of str
Name of the variable holding the amplitude range of the data.
unitkey : str
Name of the variable holding the unit of the data.
metadatakey : str or list of str
Name of the variable holding the metadata.
poskey : str or list of str
Name of the variable holding positions of markers.
spanskey : str or list of str
Name of the variable holding spans of markers.
labelskey : str or list of str
Name of the variable holding labels of markers.
descrkey : str or list of str
Name of the variable holding descriptions of markers.
amax : None or float
If specified and no amplitude range has been found in the data container, then this is the amplitude range of the data.
unit : None or str
If specified and no unit has been found in the data container, then return this as the unit of the data.

Raises

Valueerror

Invalid key requested.

def open_raw(self,
file_path,
buffersize=10.0,
backsize=0.0,
verbose=0,
rate=44000,
channels=1,
dtype=numpy.float32,
amax=1.0,
unit='a.u.')

Load data from a raw file.

Raw files just contain the data and absolutely no metadata, not even the smapling rate, number of channels, etc. Supported file formats are:

  • raw files (*.raw)
  • LabView scandata (*.scandat)

Parameters

file_path : str
Path of the file to load.
buffersize : float
Size of internal buffer in seconds.
backsize : float
Part of the buffer to be loaded before the requested start index in seconds.
verbose : int
If > 0 show detailed error/warning messages.
rate : float
Sampling rate of the data in Hertz.
channels : int
Number of channels multiplexed in the data.
dtype : str or numpy.dtype
The data type stored in the file.
amax : float
The amplitude range of the data.
unit : str
The unit of the data.
def open_audioio(self,
file_path,
buffersize=10.0,
backsize=0.0,
verbose=0,
gainkey=['AIMaxVolt', 'gain'],
sep='.',
amax=None,
unit='a.u.')

Open an audio file.

See the audioio package for details.

Parameters

file_path : str
Path to an audio file.
buffersize : float
Size of internal buffer in seconds.
backsize : float
Part of the buffer to be loaded before the requested start index in seconds.
verbose : int
If > 0 show detailed error/warning messages.
gainkey : str or list of str
Key in the file's metadata that holds some gain information. If found, the data will be multiplied with the gain, and if available, the corresponding unit is returned. See the audioio.get_gain() function for details.
sep : str
String that separates section names in gainkey.
amax : None or float
If specified and no gain has been found in the metadata, then use this as the amplitude range.
unit : None or str
If specified and no gain has been found in the metadata, then this is the unit of the data.
def open(self, file_path, buffersize=10.0, backsize=0.0, verbose=0, **kwargs)

Open file with time-series data for reading.

Parameters

file_path : str or list of str
Path to a data files or directory.
buffersize : float
Size of internal buffer in seconds.
backsize : float
Part of the buffer to be loaded before the requested start index in seconds.
verbose : int
If > 0 show detailed error/warning messages.
**kwargs : dict
Further keyword arguments that are passed on to the format specific opening functions. For example: - amax: the amplitude range of the data. - 'unit': the unit of the data.

Raises

Valueerror

file_path is empty string.