Module thunderlab.dataloader
Load time-series data from files.
data, rate, unit, amax = load_data('data/file.wav')
The function data_loader()
loads the whole time-series from the file
as a numpy array of floats.
First dimension is frames, second is
channels. In contrast to the audioio.load_audio()
function, the
values of the data array are not restricted between -1 and 1. They can
assume any value wihin the range -amax
to +amax
with the returned
unit
.
data = DataLoader('data/file.wav', 60.0)
or
with DataLoader('data/file.wav', 60.0) as data:
Create an DataLoader
object that loads chuncks of 60 seconds long data
on demand. data
can be used like a read-only numpy array of floats.
Supported file formats
- python pickle files
- numpy .npz files
- matlab .mat files
- audio files via
audioio
package - LabView .scandat files
- relacs trace*.raw files (https://www.relacs.net)
- fishgrid traces-*.raw files (https://github.com/bendalab/fishgrid)
Metadata
Many file formats allow to store metadata that further describe the
stored time series data. We handle them as nested dictionary of key-value
pairs. Load them with the metadata()
function:
metadata = metadata('data/file.mat')
Markers
Some file formats also allow to store markers that mark specific
positions in the time series data. Load marker positions and spans (in
the 2-D array locs
) and label and text strings (in the 2-D array
labels
) with the markers()
function:
locs, labels = markers('data.wav')
Aditional, format specific functions
extract_container_metadata()
: extract metadata from dictionary loaded from a container file.relacs_samplerate_unit()
: retrieve sampling rate and unit from a relacs stimuli.dat file.relacs_header()
: read key-value pairs from relacs *.dat file headers.fishgrid_grids()
: retrieve grid sizes from a fishgrid.cfg file.fishgrid_spacings()
: spacing between grid electrodes.
Global variables
var data_loader_funcs
-
List of implemented load functions.
Each element of the list is a tuple with the data format's name, its check and its load function.
Functions
def relacs_samplerate_unit(filepath, channel=0)
-
Retrieve sampling rate and unit from a relacs stimuli.dat file.
Parameters
filepath
:str
- Path to a relacs data directory, or a file in a relacs data directory.
channel
:int
- Channel (trace) number, if
filepath
does not specify a trace-*.raw file.
Returns
samplerate
:float
- Sampling rate in Hertz
unit
:str
- Unit of the trace, can be empty if not found
Raises
IOError/FileNotFoundError:
- If the stimuli.dat file does not exist.
Valueerror
stimuli.dat file does not contain sampling rate.
def relacs_header(filepath,
store_empty=False,
first_only=False,
lower_keys=False,
flat=False,
add_sections=False)-
Read key-value pairs from a relacs *.dat file header.
Parameters
filepath
:str
- A relacs *.dat file, can be also a zipped .gz file.
store_empty
:bool
- If
False
do not add meta data with empty values. first_only
:bool
- If
False
only store the first element of a list. lower_keys
:bool
- Make all keys lower case.
flat
:bool
- Do not make a nested dictionary. Use this option also to read in very old relacs metadata with ragged left alignment.
add_sections
:bool
- If
True
, prepend keys with sections names separated by '.' to make them unique.
Returns
data
:dict
- Nested dictionary with key-value pairs of the file header.
Raises
IOError/FileNotFoundError:
- If
filepath
cannot be opened.
def check_relacs(file_path)
-
Check for valid relacs file.
Parameters
file_path
:str
- Path to a relacs data directory, or a file in a relacs data directory.
Returns
is_relacs
:boolean
True
iffile_path
is a valid relacs directory or is a file therein. def relacs_trace_files(file_path)
-
Expand file path for relacs data to appropriate trace*.raw file names.
Parameters
file_path
:str
- Path to a relacs data directory, or a file in a relacs data directory.
Returns
trace_file_paths
:list
ofstr
- List of relacs trace*.raw files.
def load_relacs(file_path, amax=1.0)
-
Load traces that have been recorded with relacs (https://github.com/relacs/relacs).
Parameters
file_path
:str
- Path to a relacs data directory, or a file in a relacs data directory.
amax
:float
- The amplitude range of the data.
Returns
data
:2-D array
- All data traces as an 2-D numpy array, even for single channel data. First dimension is time, second is channel.
rate
:float
- Sampling rate of the data in Hz
unit
:str
- Unit of the data
amax
:float
- Maximum amplitude of data range.
Raises
Valueerror
- Invalid name for relacs trace-*.raw file.
- Sampling rates of traces differ.
- Unit of traces differ.
def metadata_relacs(file_path,
store_empty=False,
first_only=False,
lower_keys=False,
flat=False,
add_sections=False)-
Read meta-data of a relacs data set.
Parameters
file_path
:str
- A relacs data directory or a file therein.
store_empty
:bool
- If
False
do not add meta data with empty values. first_only
:bool
- If
False
only store the first element of a list. lower_keys
:bool
- Make all keys lower case.
flat
:bool
- Do not make a nested dictionary. Use this option also to read in very old relacs metadata with ragged left alignment.
add_sections
:bool
- If
True
, prepend keys with sections names separated by '.' to make them unique.
Returns
data
:nested dict
- Nested dictionary with key-value pairs of the meta data.
def fishgrid_spacings(metadata, unit='m')
-
Spacing between grid electrodes.
Parameters
metadata
:dict
- Fishgrid metadata obtained from
metadata_fishgrid()
. unit
:str
- Unit in which to return the spacings.
Returns
grid_dist
:list
oftuple
offloat
- For each grid the distances between rows and columns in
unit
.
def fishgrid_grids(metadata)
-
Retrieve grid sizes from a fishgrid.cfg file.
Parameters
metadata
:dict
- Fishgrid metadata obtained from
metadata_fishgrid()
.
Returns
grids
:list
oftuple
ofint
- For each grid the number of rows and columns.
def check_fishgrid(file_path)
-
Check for valid fishgrid file (https://github.com/bendalab/fishgrid).
Parameters
file_path
:str
- Path to a fishgrid data directory or a file in a fishgrid data directory.
Returns
is_fishgrid
:bool
True
iffile_path
is a valid fishgrid data directory or a file therein.
def fishgrid_trace_files(file_path)
-
Expand file paths for fishgrid data to appropriate traces*.raw file names.
Parameters
file_path
:str
- Path to a fishgrid data directory, or a file therein.
Returns
trace_file_paths
:list
ofstr
- List of fishgrid traces*.raw files.
def load_fishgrid(file_path)
-
Load traces that have been recorded with fishgrid (https://github.com/bendalab/fishgrid).
Parameters
file_path
:str
- Path to a fishgrid data directory, or a file therein.
Returns
data
:2-D array
- All data traces as an 2-D numpy array, even for single channel data. First dimension is time, second is channel.
rate
:float
- Sampling rate of the data in Hz.
unit
:str
- Unit of the data.
amax
:float
- Maximum amplitude of data range.
Raises
Filenotfounderror
Invalid or not existing fishgrid files.
def metadata_fishgrid(file_path)
-
Read meta-data of a fishgrid data set.
Parameters
file_path
:str
- A fishgrid data directory or a file therein.
Returns
data
:nested dict
- Nested dictionary with key-value pairs of the meta data.
def markers_fishgrid(file_path)
-
Read markers of a fishgrid data set.
Parameters
file_path
:str
- A fishgrid data directory or a file therein.
Returns
locs
:2-D array
ofints
- Marker positions (first column) and spans (second column) for each marker (rows).
labels
:2-D array
ofstring objects
- Labels (first column) and texts (second column) for each marker (rows).
def check_container(filepath)
-
Check if file is a generic container file.
Supported file formats are:
- python pickle files (.pkl)
- numpy files (.npz)
- matlab files (.mat)
Parameters
filepath
:str
- Path of the file to check.
Returns
is_container
:bool
True
, iffilepath
is a supported container format.
def extract_container_data(data_dict,
datakey=None,
samplekey=['rate', 'Fs', 'fs'],
timekey=['time'],
amplkey=['amax'],
unitkey='unit',
amax=1.0,
unit='a.u.')-
Extract data from dictionary loaded from a container file.
Parameters
data_dict
:dict
- Dictionary of the data items contained in the container.
datakey
:None, str,
orlist
ofstr
- Name of the variable holding the data.
If
None
take the variable that is an 2D array and has the largest number of elements. samplekey
:str
orlist
ofstr
- Name of the variable holding the sampling rate.
timekey
:str
orlist
ofstr
- Name of the variable holding sampling times. If no sampling rate is available, the sampling rate is retrieved from the sampling times.
amplkey
:str
orlist
ofstr
- Name of the variable holding the amplitude range of the data.
unitkey
:str
- Name of the variable holding the unit of the data.
amax
:None
orfloat
- If specified and no amplitude range has been found in
data_dict
, then this is the amplitude range of the data. unit
:None
orstr
- If specified and no unit has been found in
data_dict
, then return this as the unit of the data.
Returns
data
:2-D array
offloats
- All data traces as an 2-D numpy array, even for single channel data. First dimension is time, second is channel.
rate
:float
- Sampling rate of the data in Hz.
unit
:str
- Unit of the data.
amax
:float
- Maximum amplitude of data range in
unit
.
Raises
Valueerror
Invalid key requested.
def load_container(file_path,
datakey=None,
samplekey=['rate', 'Fs', 'fs'],
timekey=['time'],
amplkey=['amax'],
unitkey='unit',
amax=1.0,
unit='a.u.')-
Load data from a generic container file.
Supported file formats are:
- python pickle files (.pkl)
- numpy files (.npz)
- matlab files (.mat)
Parameters
file_path
:str
- Path of the file to load.
datakey
:None, str,
orlist
ofstr
- Name of the variable holding the data.
If
None
take the variable that is an 2D array and has the largest number of elements. samplekey
:str
orlist
ofstr
- Name of the variable holding the sampling rate.
timekey
:str
orlist
ofstr
- Name of the variable holding sampling times. If no sampling rate is available, the sampling rate is retrieved from the sampling times.
amplkey
:str
- Name of the variable holding the amplitude range of the data.
unitkey
:str
- Name of the variable holding the unit of the data.
If
unitkey
is not a valid key, then returnunitkey
as theunit
. amax
:None
orfloat
- If specified and no amplitude range has been found in the data container, then this is the amplitude range of the data.
unit
:None
orstr
- If specified and no unit has been found in the data container, then return this as the unit of the data.
Returns
data
:2-D array
offloats
- All data traces as an 2-D numpy array, even for single channel data. First dimension is time, second is channel.
rate
:float
- Sampling rate of the data in Hz.
unit
:str
- Unit of the data.
amax
:float
- Maximum amplitude of data range.
Raises
Valueerror
Invalid key requested.
def extract_container_metadata(data_dict, metadatakey=['metadata', 'info'])
-
Extract metadata from dictionary loaded from a container file.
Parameters
data_dict
:dict
- Dictionary of the data items contained in the container.
metadatakey
:str
orlist
ofstr
- Name of the variable holding the metadata.
Returns
metadata
:nested dict
- Nested dictionary with key-value pairs of the meta data.
def metadata_container(file_path, metadatakey=['metadata', 'info'])
-
Read meta-data of a container file.
Parameters
file_path
:str
- A container file.
metadatakey
:str
orlist
ofstr
- Name of the variable holding the metadata.
Returns
metadata
:nested dict
- Nested dictionary with key-value pairs of the meta data.
def extract_container_markers(data_dict,
poskey=['positions'],
spanskey=['spans'],
labelskey=['labels'],
descrkey=['descriptions'])-
Extract markers from dictionary loaded from a container file.
Parameters
data_dict
:dict
- Dictionary of the data items contained in the container.
poskey
:str
orlist
ofstr
- Name of the variable holding positions of markers.
spanskey
:str
orlist
ofstr
- Name of the variable holding spans of markers.
labelskey
:str
orlist
ofstr
- Name of the variable holding labels of markers.
descrkey
:str
orlist
ofstr
- Name of the variable holding descriptions of markers.
Returns
locs
:2-D array
ofints
- Marker positions (first column) and spans (second column) for each marker (rows).
labels
:2-D array
ofstring objects
- Labels (first column) and texts (second column) for each marker (rows).
def markers_container(file_path,
poskey=['positions'],
spanskey=['spans'],
labelskey=['labels'],
descrkey=['descriptions'])-
Read markers of a container file.
Parameters
file_path
:str
- A container file.
poskey
:str
orlist
ofstr
- Name of the variable holding positions of markers.
spanskey
:str
orlist
ofstr
- Name of the variable holding spans of markers.
labelskey
:str
orlist
ofstr
- Name of the variable holding labels of markers.
descrkey
:str
orlist
ofstr
- Name of the variable holding descriptions of markers.
Returns
locs
:2-D array
ofints
- Marker positions (first column) and spans (second column) for each marker (rows).
labels
:2-D array
ofstring objects
- Labels (first column) and texts (second column) for each marker (rows).
def check_raw(filepath)
-
Check if file is a raw file.
The following extensions are interpreted as raw files:
- raw files (*.raw)
- LabView scandata (*.scandat)
Parameters
filepath
:str
- Path of the file to check.
Returns
is_raw
:bool
True
, iffilepath
is a raw format.
def load_raw(file_path, rate=44000, channels=1, dtype=numpy.float32, amax=1.0, unit='a.u.')
-
Load data from a raw file.
Raw files just contain the data and absolutely no metadata, not even the smapling rate, number of channels, etc. Supported file formats are:
- raw files (*.raw)
- LabView scandata (*.scandat)
Parameters
file_path
:str
- Path of the file to load.
rate
:float
- Sampling rate of the data in Hertz.
channels
:int
- Number of channels multiplexed in the data.
dtype
:str
ornumpy.dtype
- The data type stored in the file.
amax
:float
- The amplitude range of the data.
unit
:str
- The unit of the data.
Returns
data
:2-D array
offloats
- All data traces as an 2-D numpy array, even for single channel data. First dimension is time, second is channel.
rate
:float
- Sampling rate of the data in Hz.
unit
:str
- Unit of the data.
amax
:float
- Maximum amplitude of data range.
def load_audioio(file_path, verbose=0, gainkey=['AIMaxVolt', 'gain'], sep='.', amax=1.0, unit='a.u.')
-
Load data from an audio file.
See the
load_audio()
function of theaudioio
package for more infos.Parameters
file_path
:str
- Path of the file to load.
verbose
:int
- If > 0 show detailed error/warning messages.
gainkey
:str
orlist
ofstr
- Key in the file's metadata that holds some gain information. If found, the data will be multiplied with the gain, and if available, the corresponding unit is returned. See the audioio.get_gain() function for details.
sep
:str
- String that separates section names in
gainkey
. amax
:float
- If specified and no gain has been found in the metadata, then use this as the amplitude range.
unit
:str
- If specified and no gain has been found in the metadata, then return this as the unit of the data.
Returns
data
:2-D array
offloats
- All data traces as an 2-D numpy array, even for single channel data. First dimension is time, second is channel.
rate
:float
- Sampling rate of the data in Hz.
unit
:str
- Unit of the data if found in the metadata (see
gainkey
), otherwiseunit
. amax
:float
- Maximum amplitude of data range.
def load_data(file_path, verbose=0, **kwargs)
-
Load time-series data from a file.
Parameters
file_path
:str
- Path and name of the file to load.
verbose
:int
- If > 0 show detailed error/warning messages.
**kwargs
:dict
- Further keyword arguments that are passed on to the
format specific loading functions.
For example:
-
amax
: the amplitude range of the data. - 'unit': the unit of the data.
Returns
data
:2-D array
- All data traces as an 2-D numpy array, even for single channel data. First dimension is time, second is channel.
rate
:float
- Sampling rate of the data in Hz.
unit
:str
- Unit of the data.
amax
:float
- Maximum amplitude of data range.
Raises
Valueerror
file_path
is empty string. def metadata(file_path, **kwargs)
-
Read meta-data from a data file.
Parameters
file_path
:str
- The full path and name of the file to load. For some file formats several files can be provided in a list.
**kwargs
:dict
- Further keyword arguments that are passed on to the format specific loading functions.
Returns
meta_data
:nested dict
- Meta data contained in the file. Keys of the nested dictionaries are always strings. If the corresponding values are dictionaries, then the key is the section name of the metadata contained in the dictionary. All other types of values are values for the respective key. In particular they are strings, or list of strings. But other simple types like ints or floats are also allowed.
Raises
Valueerror
file_path
is empty string. def markers(file_path)
-
Read markers of a data file.
Parameters
file_path
:str
orfile handle
- The data file.
Returns
locs
:2-D array
ofints
- Marker positions (first column) and spans (second column) for each marker (rows).
labels
:2-D array
ofstring objects
- Labels (first column) and texts (second column) for each marker (rows).
Raises
Valueerror
file_path
is empty string. def demo(file_path, plot=False)
def main(*cargs)
-
Call demo with command line arguments.
Parameters
cargs
:list
ofstr
- Command line arguments as provided by sys.argv[1:]
Classes
class DataLoader (file_path=None, buffersize=10.0, backsize=0.0, verbose=0, **meta_kwargs)
-
Buffered reading of time-series data for random access of the data in the file.
This allows for reading very large data files that do not fit into memory. A
DataLoader
instance can be used like a huge read-only numpy array, i.e.data = DataLoader('path/to/data/file.dat') x = data[10000:20000,0]
The first index specifies the frame, the second one the channel.
DataLoader
first determines the format of the data file and then opens the file (first line). It then reads data from the file as necessary for the requested data (second line).Supported file formats are
- audio files via
audioio
package - python pickle files
- numpy .npz files
- matlab .mat files
- relacs trace*.raw files (www.relacs.net)
- fishgrid traces-*.raw files
Reading sequentially through the file is always possible. If previous data are requested, then the file is read from the beginning. This might slow down access to previous data considerably. Use the
backsize
argument to the open functions to make sure some data are loaded before the requested frame. Then a subsequent access to the data withinbacksize
seconds before that frame can still be handled without the need to reread the file from the beginning.Usage:
import thunderlab.dataloader as dl with dl.DataLoader(file_path, 60.0, 10.0) as data: # do something with the content of the file: x = data[0:10000,0] y = data[10000:20000,0] z = x + y
Normal open and close:
data = dl.DataLoader(file_path, 60.0) x = data[:,:] # read the whole file data.close()
that is the same as:
data = dl.DataLoader() data.open(file_path, 60.0)
Parameters
file_path
:str
- Name of the file.
buffersize
:float
- Size of internal buffer in seconds.
backsize
:float
- Part of the buffer to be loaded before the requested start index in seconds.
verbose
:int
- If larger than zero show detailed error/warning messages.
meta_kwargs
:dict
- Keyword arguments that are passed on to the _load_metadata() function.
Attributes
rate
:float
- The sampling rate of the data in Hertz.
channels
:int
- The number of channels that are read in.
frames
:int
- The number of frames in the file.
format
:str
orNone
- Format of the audio file.
encoding
:str
orNone
- Encoding/subtype of the audio file.
shape
:tuple
- Number of frames and channels of the data.
ndim
:int
- Number of dimensions: always 2 (frames and channels).
unit
:str
- Unit of the data.
ampl_min
:float
- Minimum amplitude the file format supports.
ampl_max
:float
- Maximum amplitude the file format supports.
Methods
len()
: the number of framesopen()
: open a data file.open_*()
: open a data file of a specific format.close()
: close the file.metadata()
: metadata of the file.markers()
: markers of the file.set_unwrap()
: Set parameters for unwrapping clipped data.
Construtor for initializing 2D arrays (times x channels).
Expand source code
class DataLoader(AudioLoader): """Buffered reading of time-series data for random access of the data in the file. This allows for reading very large data files that do not fit into memory. A `DataLoader` instance can be used like a huge read-only numpy array, i.e. ``` data = DataLoader('path/to/data/file.dat') x = data[10000:20000,0] ``` The first index specifies the frame, the second one the channel. `DataLoader` first determines the format of the data file and then opens the file (first line). It then reads data from the file as necessary for the requested data (second line). Supported file formats are - audio files via `audioio` package - python pickle files - numpy .npz files - matlab .mat files - relacs trace*.raw files (www.relacs.net) - fishgrid traces-*.raw files Reading sequentially through the file is always possible. If previous data are requested, then the file is read from the beginning. This might slow down access to previous data considerably. Use the `backsize` argument to the open functions to make sure some data are loaded before the requested frame. Then a subsequent access to the data within `backsize` seconds before that frame can still be handled without the need to reread the file from the beginning. Usage: ------ ``` import thunderlab.dataloader as dl with dl.DataLoader(file_path, 60.0, 10.0) as data: # do something with the content of the file: x = data[0:10000,0] y = data[10000:20000,0] z = x + y ``` Normal open and close: ``` data = dl.DataLoader(file_path, 60.0) x = data[:,:] # read the whole file data.close() ``` that is the same as: ``` data = dl.DataLoader() data.open(file_path, 60.0) ``` Parameters ---------- file_path: str Name of the file. buffersize: float Size of internal buffer in seconds. backsize: float Part of the buffer to be loaded before the requested start index in seconds. verbose: int If larger than zero show detailed error/warning messages. meta_kwargs: dict Keyword arguments that are passed on to the _load_metadata() function. Attributes ---------- rate: float The sampling rate of the data in Hertz. channels: int The number of channels that are read in. frames: int The number of frames in the file. format: str or None Format of the audio file. encoding: str or None Encoding/subtype of the audio file. shape: tuple Number of frames and channels of the data. ndim: int Number of dimensions: always 2 (frames and channels). unit: str Unit of the data. ampl_min: float Minimum amplitude the file format supports. ampl_max: float Maximum amplitude the file format supports. Methods ------- - `len()`: the number of frames - `open()`: open a data file. - `open_*()`: open a data file of a specific format. - `close()`: close the file. - `metadata()`: metadata of the file. - `markers()`: markers of the file. - `set_unwrap()`: Set parameters for unwrapping clipped data. """ def __init__(self, file_path=None, buffersize=10.0, backsize=0.0, verbose=0, **meta_kwargs): super().__init__(None, buffersize, backsize, verbose, **meta_kwargs) if file_path is not None: self.open(file_path, buffersize, backsize, verbose, **meta_kwargs) def __getitem__(self, key): return super(DataLoader, self).__getitem__(key) def __next__(self): return super(DataLoader, self).__next__() # relacs interface: def open_relacs(self, file_path, buffersize=10.0, backsize=0.0, verbose=0, amax=1.0): """Open relacs data files (www.relacs.net) for reading. Parameters ---------- file_path: str Path to a relacs data directory or a file therein. buffersize: float Size of internal buffer in seconds. backsize: float Part of the buffer to be loaded before the requested start index in seconds. verbose: int If > 0 show detailed error/warning messages. amax: float The amplitude range of the data. Raises ------ ValueError: .gz files not supported. """ self.verbose = verbose if self.sf is not None: self._close_relacs() trace_file_paths = relacs_trace_files(file_path) # open trace files: self.sf = [] self.frames = None self.rate = None self.unit = '' self.filepath = None if len(trace_file_paths) > 0: self.filepath = os.path.dirname(trace_file_paths[0]) for path in sorted(trace_file_paths): if path[-3:] == '.gz': raise ValueError('.gz files not supported') sf = open(path, 'rb') self.sf.append(sf) if verbose > 0: print(f'open_relacs(file_path) with file_path={path}') # file size: sf.seek(0, os.SEEK_END) frames = sf.tell()//4 if self.frames is None: self.frames = frames elif self.frames != frames: diff = self.frames - frames if diff > 1 or diff < -2: raise ValueError('number of frames of traces differ') elif diff >= 0: self.frames = frames sf.seek(0) # retrieve sampling rate and unit: rate, us = relacs_samplerate_unit(path) if self.rate is None: self.rate = rate elif rate != self.rate: raise ValueError('sampling rates of traces differ') if len(self.unit) == 0: self.unit = us elif us != self.unit: raise ValueError('unit of traces differ') self.channels = len(self.sf) self.shape = (self.frames, self.channels) self.size = self.frames * self.channels self.ndim = len(self.shape) self.format = 'RELACS' self.encoding = 'FLOAT' self.bufferframes = int(buffersize*self.rate) self.backframes = int(backsize*self.rate) self.init_buffer() self.offset = 0 self.close = self._close_relacs self.load_audio_buffer = self._load_buffer_relacs self.ampl_min = -amax self.ampl_max = +amax self._load_metadata = self._metadata_relacs # TODO: load markers: self._locs = np.zeros((0, 2), dtype=int) self._labels = np.zeros((0, 2), dtype=object) self._load_markers = None return self def _close_relacs(self): """Close the relacs data files. """ if self.sf is not None: for file in self.sf: file.close() self.sf = None def _load_buffer_relacs(self, r_offset, r_size, buffer): """Load new data from relacs data file. Parameters ---------- r_offset: int First frame to be read from file. r_size: int Number of frames to be read from file. buffer: ndarray Buffer where to store the loaded data. """ for i, file in enumerate(self.sf): file.seek(r_offset*4) data = file.read(r_size*4) buffer[:, i] = np.frombuffer(data, dtype=np.float32) def _metadata_relacs(self, store_empty=False, first_only=False): """ Load meta-data of a relacs data set. """ info_path = os.path.join(self.filepath, 'info.dat') if not os.path.exists(info_path): return {} return relacs_header(info_path, store_empty, first_only) # fishgrid interface: def open_fishgrid(self, file_path, buffersize=10.0, backsize=0.0, verbose=0): """Open fishgrid data files (https://github.com/bendalab/fishgrid) for reading. Parameters ---------- file_path: str Path to a fishgrid data directory, or a file therein. buffersize: float Size of internal buffer in seconds. backsize: float Part of the buffer to be loaded before the requested start index in seconds. verbose: int If > 0 show detailed error/warning messages. """ self.verbose = verbose if self.sf is not None: self._close_fishgrid() trace_file_paths = fishgrid_trace_files(file_path) self.filepath = None if len(trace_file_paths) > 0: self.filepath = os.path.dirname(trace_file_paths[0]) self._load_metadata = metadata_fishgrid self._load_markers = markers_fishgrid # open grid files: grids = fishgrid_grids(self.metadata()) grid_sizes = [r*c for r,c in grids] self.channels = 0 for g, path in enumerate(trace_file_paths): self.channels += grid_sizes[g] self.sf = [] self.grid_channels = [] self.grid_offs = [] offs = 0 self.frames = None self.rate = get_number(self.metadata(), 'Hz', 'AISampleRate') v, self.unit = get_number_unit(self.metadata(), 'AIMaxVolt') if v is not None: self.ampl_min = -v self.ampl_max = +v for g, path in enumerate(trace_file_paths): sf = open(path, 'rb') self.sf.append(sf) if verbose > 0: print(f'open_fishgrid(file_path) with file_path={path}') # grid channels: self.grid_channels.append(grid_sizes[g]) self.grid_offs.append(offs) offs += grid_sizes[g] # file size: sf.seek(0, os.SEEK_END) frames = sf.tell()//4//grid_sizes[g] if self.frames is None: self.frames = frames elif self.frames != frames: diff = self.frames - frames if diff > 1 or diff < -2: raise ValueError('number of frames of traces differ') elif diff >= 0: self.frames = frames sf.seek(0) self.shape = (self.frames, self.channels) self.size = self.frames * self.channels self.ndim = len(self.shape) self.format = 'FISHGRID' self.encoding = 'FLOAT' self.bufferframes = int(buffersize*self.rate) self.backframes = int(backsize*self.rate) self.init_buffer() self.offset = 0 self.close = self._close_fishgrid self.load_audio_buffer = self._load_buffer_fishgrid return self def _close_fishgrid(self): """Close the fishgrid data files. """ if self.sf is not None: for file in self.sf: file.close() self.sf = None def _load_buffer_fishgrid(self, r_offset, r_size, buffer): """Load new data from relacs data file. Parameters ---------- r_offset: int First frame to be read from file. r_size: int Number of frames to be read from file. buffer: ndarray Buffer where to store the loaded data. """ for file, gchannels, goffset in zip(self.sf, self.grid_channels, self.grid_offs): file.seek(r_offset*4*gchannels) data = file.read(r_size*4*gchannels) buffer[:, goffset:goffset+gchannels] = np.frombuffer(data, dtype=np.float32).reshape((-1, gchannels)) # container interface: def open_container(self, file_path, buffersize=10.0, backsize=0.0, verbose=0, datakey=None, samplekey=['rate', 'Fs', 'fs'], timekey=['time'], amplkey=['amax'], unitkey='unit', metadatakey=['metadata', 'info'], poskey=['positions'], spanskey=['spans'], labelskey=['labels'], descrkey=['descriptions'], amax=1.0, unit='a.u.'): """Open generic container file. Supported file formats are: - python pickle files (.pkl) - numpy files (.npz) - matlab files (.mat) Parameters ---------- file_path: str Path to a container file. buffersize: float Size of internal buffer in seconds. backsize: float Part of the buffer to be loaded before the requested start index in seconds. verbose: int If > 0 show detailed error/warning messages. datakey: None, str, or list of str Name of the variable holding the data. If `None` take the variable that is an 2D array and has the largest number of elements. samplekey: str or list of str Name of the variable holding the sampling rate. timekey: str or list of str Name of the variable holding sampling times. If no sampling rate is available, the sampling rate is retrieved from the sampling times. amplkey: str or list of str Name of the variable holding the amplitude range of the data. unitkey: str Name of the variable holding the unit of the data. metadatakey: str or list of str Name of the variable holding the metadata. poskey: str or list of str Name of the variable holding positions of markers. spanskey: str or list of str Name of the variable holding spans of markers. labelskey: str or list of str Name of the variable holding labels of markers. descrkey: str or list of str Name of the variable holding descriptions of markers. amax: None or float If specified and no amplitude range has been found in the data container, then this is the amplitude range of the data. unit: None or str If specified and no unit has been found in the data container, then return this as the unit of the data. Raises ------ ValueError: Invalid key requested. """ self.verbose = verbose data_dict = {} ext = os.path.splitext(file_path)[1] if ext == '.pkl': import pickle with open(file_path, 'rb') as f: data_dict = pickle.load(f) self.format = 'PKL' elif ext == '.npz': data_dict = np.load(file_path) self.format = 'NPZ' elif ext == '.mat': from scipy.io import loadmat data_dict = loadmat(file_path, squeeze_me=True) self.format = 'MAT' self.buffer, self.rate, self.unit, amax = \ extract_container_data(data_dict, datakey, samplekey, timekey, amplkey, unitkey, amax, unit) self.filepath = file_path self.channels = self.buffer.shape[1] self.frames = self.buffer.shape[0] self.shape = self.buffer.shape self.ndim = self.buffer.ndim self.size = self.buffer.size self.encoding = self.numpy_encodings[self.buffer.dtype] self.ampl_min = -amax self.ampl_max = +amax self.offset = 0 self.buffer_changed = np.zeros(self.channels, dtype=bool) self.bufferframes = self.frames self.backsize = 0 self.close = self._close_container self.load_audio_buffer = self._load_buffer_container self._metadata = extract_container_metadata(data_dict, metadatakey) self._load_metadata = None self._locs, self._labels = extract_container_markers(data_dict, poskey, spanskey, labelskey, descrkey) self._load_markers = None def _close_container(self): """Close container. """ pass def _load_buffer_container(self, r_offset, r_size, buffer): """Load new data from container.""" buffer[:, :] = self.buffer[r_offset:r_offset + r_size, :] # raw data interface: def open_raw(self, file_path, buffersize=10.0, backsize=0.0, verbose=0, rate=44000, channels=1, dtype=np.float32, amax=1.0, unit='a.u.'): """Load data from a raw file. Raw files just contain the data and absolutely no metadata, not even the smapling rate, number of channels, etc. Supported file formats are: - raw files (*.raw) - LabView scandata (*.scandat) Parameters ---------- file_path: str Path of the file to load. buffersize: float Size of internal buffer in seconds. backsize: float Part of the buffer to be loaded before the requested start index in seconds. verbose: int If > 0 show detailed error/warning messages. rate: float Sampling rate of the data in Hertz. channels: int Number of channels multiplexed in the data. dtype: str or numpy.dtype The data type stored in the file. amax: float The amplitude range of the data. unit: str The unit of the data. """ self.verbose = verbose self.filepath = file_path self.sf = open(file_path, 'rb') if verbose > 0: print(f'open_raw(file_path) with file_path={file_path}') self.dtype = np.dtype(dtype) self.rate = float(rate) # file size: self.sf.seek(0, os.SEEK_END) self.frames = self.sf.tell()//self.dtype.itemsize self.sf.seek(0) self.channels = int(channels) self.shape = (self.frames, self.channels) self.ndim = len(self.shape) self.size = self.frames*self.channels self.format = 'RAW' self.encoding = self.numpy_encodings.get(self.dtype, 'UNKNOWN') self.unit = unit self.ampl_max = float(amax) self.ampl_min = -self.ampl_max self.offset = 0 self.bufferframes = int(buffersize*self.rate) self.backframes = int(backsize*self.rate) self.init_buffer() self.close = self._close_raw self.load_audio_buffer = self._load_buffer_raw self._metadata = None self._load_metadata = None self._locs = None self._labels = None self._load_markers = None def _close_raw(self): """Close raw file. """ self.sf.close() self.sf = None def _load_buffer_raw(self, r_offset, r_size, buffer): """Load new data from container.""" self.sf.seek(r_offset*self.dtype.itemsize) raw_data = self.sf.read(r_size*self.dtype.itemsize) raw_data = np.frombuffer(raw_data, dtype=self.dtype) raw_data = raw_data.reshape(-1, self.channels) # recode: if self.dtype == np.dtype('int16'): data = raw_data.astype('float32') data *= self.ampl_max/2**15 elif self.dtype == np.dtype('int32'): data = raw_data.astype(float) data *= self.ampl_max/2**31 elif self.dtype == np.dtype('int64'): data = raw_data.astype(float) data *= self.ampl_max/2**63 else: data = raw_data buffer[:, :] = data # audioio interface: def open_audioio(self, file_path, buffersize=10.0, backsize=0.0, verbose=0, gainkey=default_gain_keys, sep='.', amax=None, unit='a.u.'): """Open an audio file. See the [audioio](https://github.com/bendalab/audioio) package for details. Parameters ---------- file_path: str Path to an audio file. buffersize: float Size of internal buffer in seconds. backsize: float Part of the buffer to be loaded before the requested start index in seconds. verbose: int If > 0 show detailed error/warning messages. gainkey: str or list of str Key in the file's metadata that holds some gain information. If found, the data will be multiplied with the gain, and if available, the corresponding unit is returned. See the [audioio.get_gain()](https://bendalab.github.io/audioio/api/audiometadata.html#audioio.audiometadata.get_gain) function for details. sep: str String that separates section names in `gainkey`. amax: None or float If specified and no gain has been found in the metadata, then use this as the amplitude range. unit: None or str If specified and no gain has been found in the metadata, then this is the unit of the data. """ self.verbose = verbose super(DataLoader, self).open(file_path, buffersize, backsize, verbose) md = self.metadata() fac, unit = get_gain(md, gainkey, sep, amax, unit) if fac is None: self.gain_fac = 1.0 else: self.gain_fac = fac self._load_buffer_audio_org = self.load_audio_buffer self.load_audio_buffer = self._load_buffer_audioio self.ampl_min *= self.gain_fac self.ampl_max *= self.gain_fac self.unit = unit return self def _load_buffer_audioio(self, r_offset, r_size, buffer): """Load and scale new data from an audio file. Parameters ---------- r_offset: int First frame to be read from file. r_size: int Number of frames to be read from file. buffer: ndarray Buffer where to store the loaded data. """ self._load_buffer_audio_org(r_offset, r_size, buffer) buffer *= self.gain_fac def open(self, file_path, buffersize=10.0, backsize=0.0, verbose=0, **kwargs): """Open file with time-series data for reading. Parameters ---------- file_path: str or list of str Path to a data files or directory. buffersize: float Size of internal buffer in seconds. backsize: float Part of the buffer to be loaded before the requested start index in seconds. verbose: int If > 0 show detailed error/warning messages. **kwargs: dict Further keyword arguments that are passed on to the format specific opening functions. For example: - `amax`: the amplitude range of the data. - 'unit': the unit of the data. Raises ------ ValueError: `file_path` is empty string. """ # list of implemented open functions: data_open_funcs = ( ('relacs', check_relacs, self.open_relacs, 1), ('fishgrid', check_fishgrid, self.open_fishgrid, 1), ('container', check_container, self.open_container, 1), ('raw', check_raw, self.open_raw, 1), ('audioio', None, self.open_audioio, 0), ) if len(file_path) == 0: raise ValueError('input argument file_path is empty string.') # open data: for name, check_file, open_file, v in data_open_funcs: if check_file is None or check_file(file_path): open_file(file_path, buffersize, backsize, verbose, **kwargs) if v*verbose > 1: if self.format is not None: print(f' format : {self.format}') if self.encoding is not None: print(f' encoding : {self.encoding}') print(f' sampling rate: {self.rate} Hz') print(f' channels : {self.channels}') print(f' frames : {self.frames}') print(f' range : {self.ampl_max:g}{self.unit}') break return self
Ancestors
- audioio.audioloader.AudioLoader
- audioio.bufferedarray.BufferedArray
Methods
def open_relacs(self, file_path, buffersize=10.0, backsize=0.0, verbose=0, amax=1.0)
-
Open relacs data files (www.relacs.net) for reading.
Parameters
file_path
:str
- Path to a relacs data directory or a file therein.
buffersize
:float
- Size of internal buffer in seconds.
backsize
:float
- Part of the buffer to be loaded before the requested start index in seconds.
verbose
:int
- If > 0 show detailed error/warning messages.
amax
:float
- The amplitude range of the data.
Raises
ValueError: .gz files not supported.
def open_fishgrid(self, file_path, buffersize=10.0, backsize=0.0, verbose=0)
-
Open fishgrid data files (https://github.com/bendalab/fishgrid) for reading.
Parameters
file_path
:str
- Path to a fishgrid data directory, or a file therein.
buffersize
:float
- Size of internal buffer in seconds.
backsize
:float
- Part of the buffer to be loaded before the requested start index in seconds.
verbose
:int
- If > 0 show detailed error/warning messages.
def open_container(self,
file_path,
buffersize=10.0,
backsize=0.0,
verbose=0,
datakey=None,
samplekey=['rate', 'Fs', 'fs'],
timekey=['time'],
amplkey=['amax'],
unitkey='unit',
metadatakey=['metadata', 'info'],
poskey=['positions'],
spanskey=['spans'],
labelskey=['labels'],
descrkey=['descriptions'],
amax=1.0,
unit='a.u.')-
Open generic container file.
Supported file formats are:
- python pickle files (.pkl)
- numpy files (.npz)
- matlab files (.mat)
Parameters
file_path
:str
- Path to a container file.
buffersize
:float
- Size of internal buffer in seconds.
backsize
:float
- Part of the buffer to be loaded before the requested start index in seconds.
verbose
:int
- If > 0 show detailed error/warning messages.
datakey
:None, str,
orlist
ofstr
- Name of the variable holding the data.
If
None
take the variable that is an 2D array and has the largest number of elements. samplekey
:str
orlist
ofstr
- Name of the variable holding the sampling rate.
timekey
:str
orlist
ofstr
- Name of the variable holding sampling times. If no sampling rate is available, the sampling rate is retrieved from the sampling times.
amplkey
:str
orlist
ofstr
- Name of the variable holding the amplitude range of the data.
unitkey
:str
- Name of the variable holding the unit of the data.
metadatakey
:str
orlist
ofstr
- Name of the variable holding the metadata.
poskey
:str
orlist
ofstr
- Name of the variable holding positions of markers.
spanskey
:str
orlist
ofstr
- Name of the variable holding spans of markers.
labelskey
:str
orlist
ofstr
- Name of the variable holding labels of markers.
descrkey
:str
orlist
ofstr
- Name of the variable holding descriptions of markers.
amax
:None
orfloat
- If specified and no amplitude range has been found in the data container, then this is the amplitude range of the data.
unit
:None
orstr
- If specified and no unit has been found in the data container, then return this as the unit of the data.
Raises
Valueerror
Invalid key requested.
def open_raw(self,
file_path,
buffersize=10.0,
backsize=0.0,
verbose=0,
rate=44000,
channels=1,
dtype=numpy.float32,
amax=1.0,
unit='a.u.')-
Load data from a raw file.
Raw files just contain the data and absolutely no metadata, not even the smapling rate, number of channels, etc. Supported file formats are:
- raw files (*.raw)
- LabView scandata (*.scandat)
Parameters
file_path
:str
- Path of the file to load.
buffersize
:float
- Size of internal buffer in seconds.
backsize
:float
- Part of the buffer to be loaded before the requested start index in seconds.
verbose
:int
- If > 0 show detailed error/warning messages.
rate
:float
- Sampling rate of the data in Hertz.
channels
:int
- Number of channels multiplexed in the data.
dtype
:str
ornumpy.dtype
- The data type stored in the file.
amax
:float
- The amplitude range of the data.
unit
:str
- The unit of the data.
def open_audioio(self,
file_path,
buffersize=10.0,
backsize=0.0,
verbose=0,
gainkey=['AIMaxVolt', 'gain'],
sep='.',
amax=None,
unit='a.u.')-
Open an audio file.
See the audioio package for details.
Parameters
file_path
:str
- Path to an audio file.
buffersize
:float
- Size of internal buffer in seconds.
backsize
:float
- Part of the buffer to be loaded before the requested start index in seconds.
verbose
:int
- If > 0 show detailed error/warning messages.
gainkey
:str
orlist
ofstr
- Key in the file's metadata that holds some gain information. If found, the data will be multiplied with the gain, and if available, the corresponding unit is returned. See the audioio.get_gain() function for details.
sep
:str
- String that separates section names in
gainkey
. amax
:None
orfloat
- If specified and no gain has been found in the metadata, then use this as the amplitude range.
unit
:None
orstr
- If specified and no gain has been found in the metadata, then this is the unit of the data.
def open(self, file_path, buffersize=10.0, backsize=0.0, verbose=0, **kwargs)
-
Open file with time-series data for reading.
Parameters
file_path
:str
orlist
ofstr
- Path to a data files or directory.
buffersize
:float
- Size of internal buffer in seconds.
backsize
:float
- Part of the buffer to be loaded before the requested start index in seconds.
verbose
:int
- If > 0 show detailed error/warning messages.
**kwargs
:dict
- Further keyword arguments that are passed on to the
format specific opening functions.
For example:
-
amax
: the amplitude range of the data. - 'unit': the unit of the data.
Raises
Valueerror
file_path
is empty string.
- audio files via