thunderfish

Automatically detect and analyze all EOD waveforms in a short recording.

Authors

The Neuroethology-lab at the Institute of Neuroscience at the University of Tübingen:

  • Jan Benda
  • Jörg Henninger (harmonic groups)
  • Juan Sehuanes (best window)
  • Till Raab (annotation)
  • Liz Weerdmeester (pulse clustering)

Principles of operation

The sole task of thunderfish is to automatically detect and analyze all EOD waveforms in a short recording. A short recording is typically no longer than about 30 s. The recordings are made either with a fishfinder (a stick with two electrodes used to find electric fish in the field) or standaradized head-tail recordings in a little tank.

  1. A segment for further waveform analysis is identified in the recording (bestwindow module). In this segment the amplitude of the recording is largest while at the same time most stable and not clipped.
  2. A powerspectrum of a given frequency resolution is computed (powerspectrum module) and potential EOD frequencies of wave-type fish are detected in this power spectrum based on their harmonic structure (harmonics module).
  3. EODs of pulse-type fish are detected and clustered according to their width, amplitude, and shape.
  4. For each pulse and wave-type fish detected in the recording an averaged waveform is computed and its properties are analyzed (eodanalysis module)

The files generated by thunderfish on EOD waveform properties can be summarized in single files by means of the collectfish script and then analyzed and explored with the eodexplorer.

Command line arguments

thunderfish --help

returns

usage: thunderfish.py [-h] [--version] [-v] [-V] [-c] [--channel CHANNEL] [-t TIME] [-T] [-m {w,p,wp}] [-a] [-S] [-j [JOBS]]
                      [-s] [-f {dat,ascii,csv,rtai,md,tex,html,py}] [-p] [-P rtpwse] [-M PDFFILE] [-l [MINFREQ]] [-o OUTPATH]
                      [-k] [-i KWARGS] [-b]
                      [file [file ...]]

Analyze EOD waveforms of weakly electric fish.

positional arguments:
  file                  name of a file with time series data of an EOD recording, may include wildcards

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v                    verbosity level. Increase by specifying -v multiple times, or like -vvv
  -V                    level for debugging plots. Increase by specifying -V multiple times, or like -VVV
  -c                    save configuration to file thunderfish.cfg after reading all configuration files
  --channel CHANNEL     channel to be analyzed (defaults to first channel, negative channel selects all channels)
  -t TIME               start time of analysis window in recording: "beginning", "center", "end", "best", or time in seconds
                        (overwrites "windowPosition" in cofiguration file)
  -T                    add start time of analysis file to output file names
  -m {w,p,wp}           extract wave "w" and/or pulse "p" fish EODs
  -a                    plot all EOD waveforms
  -S                    plot spectra for all EOD waveforms
  -j [JOBS]             number of jobs run in parallel. Without argument use all CPU cores.
  -s                    save analysis results to files
  -f {dat,ascii,csv,rtai,md,tex,html,py}
                        file format used for saving analysis results, defaults to the format specified in the configuration file
                        or "csv"
  -p                    save output plot of each recording as pdf file
  -P rtpwse             save subplots as separate pdf files: r) recording with best window, t) data trace with detected pulse
                        fish, p) power spectrum with detected wave fish, w/W) mean EOD waveform, s/S) EOD spectrum, e/E) EOD
                        waveform and spectra. Capital letters produce a single multipage pdf containing plots of all detected
                        fish
  -M PDFFILE            save all plots of all recordings in a multi pages pdf file. Disables parallel jobs.
  -l [MINFREQ]          logarithmic frequency axis in power spectrum with optional minimum frequency (defaults to 100 Hz)
  -o OUTPATH            path where to store results and figures (defaults to current working directory)
  -k                    keep path of input file when saving analysis files, i.e. append path of input file to OUTPATH
  -i KWARGS             key-word arguments for the data loader function
  -b                    show the cost function of the best window algorithm

version 1.9.9 by Benda-Lab (2015-2022)

examples:
- analyze the single file data.wav interactively:
  > thunderfish data.wav
- extract wavefish only:
  > thunderfish -m w data.wav
- automatically analyze all wav files in the current working directory and save analysis results and plot to files:
  > thunderfish -s -p *.wav
- analyze all wav files in the river1/ directory, use all CPUs, and write files directly to "results/":
  > thunderfish -j -s -p -o results/ river1/*.wav
- analyze all wav files in the river1/ directory and write files to "results/river1/":
  > thunderfish -s -p -o results/ -k river1/*.wav
- write configuration file:
  > thunderfish -c

Configuration file

Many parameters of the algorithms used by thunderfish can be set via a configuration file.

Generate the configuration file by executing

thunderfish -c

This first reads in all configuration files found (see below) and then writes the file thunderfish.cfg into the current working directory.

Whenever you run thunderfish it searches for configuration files in

  1. the current working directory
  2. the directory of each input file
  3. the parent directories of each input file, up to three levels up.

Best practice is to move the configuration file at the root of the file tree where data files of a recording session are stored.

Use the -v switch to see which configuration files are loaded:

thunderfish -v data.wav

Open the configuration file in your favourite editor and edit the settings. Each parameter is briefly explained in the comment preceding the parameter.

Important configuration parameter

The list of configuration parameter is overwhelming and most of them you do not need to touch at all. Here is a list of the few that matter (in the rder as they appear in the configuration file):

  • frequencyResolution: this sets the nnft parameter for computing the power spectrum such to achieve the requested resolution in frequency. The longer your analysis window the smaller you can set the resultion (not smaller then the inverse analysis window).

  • numberPSDWindows: If larger than one then only fish that are present in all windows are reported. If you have very stationary data (from a restrained fish, not from a fishfinder) you may set this to one.

  • lowThresholdFactor, highThresholdFactor: play around with these numbers if not all wavefish are detected or if too many peaks are detected in the power spectrum.

  • mainsFreq: Set it to the frequency of your mains power supply (50 or 60 Hz) or to zero if you have hum-free recordings.

  • maxRelativePower: Usually, the higher the harmonics the less power it has. In order to discard signals whose power does not decay set this -10 or -20 dB.

  • maxGroups: Set to 1 if you know that only a single fish is in your recording.

  • minDataAmplitude, maxDataAmplitude: If the maximum voltage range your recording device differs from -1 to 1 (default for WAV files), set these two parameter to the limits, so that clipped recordings are detected as such.

  • windowSize: How much of the data should be used for analysis. If you have stationary data (from a restrained fish, not from a fishfinder) you may want to use the full recording by setting this to zero.

  • windowPosition: Where to place the analysis window: at the "beginning", "center", or "end" of the recording. If set to "best" (default) thunderfish searches for the most stationary data segment of the requested length. Can be overwritten from the command line with the -t argument.

  • pulseWidthPercentile: If low frequency pulse fish are missed then reduce this number.

  • eodMaxEODs: The average waveform is estimated by averaging over at maximum this number of EODs. If wavefish change their frequency then you do not want to set this number too high (10 to 100 is enough for reducing noise). If you have several fish on your recording then this number needs to be high (1000) to average away the other fish. Set it to zero in order to use all EODs in the data segment selected for analysis.

  • flipWaveEOD, flipPulseEOD: In case if fishfinder recordings you do not know the orientation of the fish relative to your electrode. That is you do not know the polarity of your recording. Setting this to auto flips the sign of the averaged EOD waveform to a standardized polarity (wave-type fish: larger peak relative to average is positive, pulse-type fish: the first of the two largest peaks is positive).

  • fileFormat: sets the default file format to be used for storing the analysis results.

Summary plot

In the plot you can press

  • q: Close the plot and show the next one or quit.
  • p: Play the analyzed section of the reording on the default audio device.
  • o: Switch on zoom mode. You can draw a rectangle with the mouse to zoom in.
  • Backspace: Zoom back.
  • f: Toggle full screen mode.

Output files

With the -s switch analysis results are saved to files and no interactive output is generated.

Output files are placed in the current working directory if no path is specified via the -o switch. If the path specified via -o does not exist it is created.

With the -k switch the pathes of the input files are appended to the output path. This allows you to analyse recordings organized in a nested directory structure in one step and write the files in the same structure. For example:

thunderfish -s -k -o analysis river1/habitatA/*.wav river1/habitatB/*.wav river2/*.wav

will store the files in

analysis/river1/habitatA/
analysis/river1/habitatB/
analysis/river2/

whereas without the -k switch all files are stored in

analysis/

To make use of all the cores of your CPU apply the -j switch.

The following files are generated:

  • RECORDING-CHANNEL-TIME-eodwaveform-N.EXT: averaged EOD waveform
  • RECORDING-CHANNEL-TIME-waveeodfs.EXT: list of all detected EOD frequencies and powers of wave-type fish
  • RECORDING-CHANNEL-TIME-wavefish.EXT: list of properties of good EODs of wave-type fish
  • RECORDING-CHANNEL-TIME-wavespectrum-N.EXT: for each wave-type fish the Fourier spectrum
  • RECORDING-CHANNEL-TIME-pulsefish.EXT: list of properties of good EODs of pulse-type fish
  • RECORDING-CHANNEL-TIME-pulsepeaks-N.EXT: for each pulse-type fish properties of peaks and troughs
  • RECORDING-CHANNEL-TIME-pulsespectrum-N.EXT: for each pulse-type fish the power spectrum of a single pulse

Filenames are composed of the basename of the input file (RECORDING). In case the input files contain more than a single channel channel specification is appended (CH), a 'c' followed by the channel number. In case the start time of the analysis window was requested to be saved into the file name (-T option), this start time is added to the file name (TIME) as an 't' followed by the start time floored to integer seconds, and an 's'. Fish detected in the recordings are numbered, starting with 0 (N). The file extension depends on the chosen file format (EXT). The following sections describe the content of the generated files.

RECORDING-CHANNEL-TIME-eodwaveform-N.EXT

For each fish the average waveform with standard deviation and fit.

time mean std fit
ms a.u. a.u. a.u.
-1.746 -0.34837 0.01194 -0.34562
-1.723 -0.30700 0.01199 -0.30411
-1.701 -0.26664 0.01146 -0.26383
-1.678 -0.22713 0.01153 -0.22426
-1.655 -0.18706 0.01187 -0.18428

The columns contain:

  1. time Time in milliseconds.
  2. mean Averaged waveform in the unit of the input data.
  3. std Corresponding standard deviation.
  4. fit A fit to the averaged waveform. In case of a wave fish this is a Fourier series, for pulse fish it is an exponential fit to the tail of the last peak.

RECORDING-CHANNEL-TIME-waveeodfs.EXT

List of all detected EOD frequencies and powers of wave-type fish. These might be more than listed in RECORDING-CHANNEL-TIME-wavefish.EXT.

index EODf datapower
- Hz dB
1 111.33 -33.35
2 132.81 -37.86
0 580.08 -22.01
3 608.89 -45.45

The columns contain:

  1. index Index of the fish (the number that is also used to number the files).
  2. EODf EOD frequency in Hertz.
  3. datapower Power of this EOD in decibel (sum over all peaks in the power spectrum of the recording).

RECORDING-CHANNEL-TIME-wavefish.EXT

Fundamental EOD frequency and other properties of each wave-type fish detected in the recording.

recording waveform timing
tstart twindow index EODf p-p-amplitude power datapower thd dbdiff maxdb noise rmserror clipped flipped n ncrossings peakwidth troughwidth leftpeak rightpeak lefttrough righttrough p-p-distance reltroughampl
s s - Hz a.u. dB dB % dB dB % % % - - - % % % % % % % %
4.25 8.00 0 580.08 0.22755 -21.28 -22.01 149.81 2.93 -9.22 0.3 0.36 0.0 0 3300 1 76.15 23.85 69.12 7.03 11.10 12.75 18.13 312.11
4.25 8.00 1 111.33 0.00713 -50.80 -34.09 67.60 7.18 -29.48 34.7 2.94 0.0 0 888 2 44.00 56.00 19.50 24.51 16.55 39.45 41.05 73.54
4.25 8.00 2 132.81 0.01029 -46.47 -37.87 46.49 8.40 -32.48 22.0 1.55 0.0 0 1059 2 49.11 50.89 25.30 23.82 29.72 21.16 53.54 103.40
4.25 8.00 3 608.89 0.00258 -59.51 -45.45 100.84 15.01 -22.24 40.9 1.37 0.0 0 4868 2 36.29 63.71 22.14 14.15 42.93 20.78 57.08 91.60
4.25 8.00 4 1979.49 0.00177 -61.18 -61.72 58.05 13.10 -24.18 33.0 2.08 0.0 0 15833 2 53.94 46.06 22.94 31.00 24.67 21.38 55.67 131.60

The columns contain:

  1. tstart Start time of the analysis window in the recording in seconds.
  2. twindow Duration of the analysis window in seconds.
  3. index Index of the fish (the number that is also used to number the files).
  4. EODf EOD frequency in Hertz.
  5. p-p-amplitude Peak-to-peak amplitude of the extracted waveform in the units of the input data.
  6. power Power of the extracted EOD waveform, i.e. sum of the squared Fourier amplitudes, in decibel.
  7. datapower Power of the EOD waveform from the spectrum of the original data in decibel.
  8. thd: Total harmonic distortion, i.e. square root of sum of amplitudes squared of harmonics relative to amplitude of fundamental.
  9. dbdiff Smoothness of power spectrum as standard deviation of differences in decibel power.
  10. maxdb Maximum power of higher harmonics relative to peak power in decibel.
  11. noise Root-mean-squared standard error of the averaged EOD waveform relative to the peak-to_peak amplitude in percent.
  12. rmserror Root-mean-squared difference between the averaged EOD waveform and the fit of the Fourier series relative to the peak-to_peak amplitude in percent.
  13. clipped Percentage of recording that is clipped.
  14. flipped Whether the waveform was flipped.
  15. n Number of EODs used for computing the averaged EOD waveform.
  16. ncrossings Number of zero crossing per EOD period.
  17. peakwidth Width of the peak at the averaged amplitude relative to EOD period.
  18. troughwidth Width of the trough at the averaged amplitude relative to EOD period.
  19. leftpeak Time from positive zero crossing to peak relative to EOD period.
  20. rightpeak Time from peak to negative zero crossing relative to EOD period.
  21. lefttrough Time from negative zero crossing to trough relative to EOD period.
  22. righttrough Time from trough to positive zero crossing relative to EOD period.
  23. p-p-distance Time between peak and trough relative to EOD period.
  24. reltroughampl Amplitude of trough relative to peak amplitude.

RECORDING-CHANNEL-TIME-wavespectrum-N.EXT

The parameter of the Fourier series fitted to the waveform of a wave-type fish.

harmonics frequency amplitude relampl relpower phase datapower
- Hz a.u. % dB rad a.u.^2/Hz
0 728.16 0.32610 100.00 0.00 0.0000 1.0137e-01
1 1456.32 0.22146 67.91 -3.36 2.4706 4.1881e-02
2 2184.48 0.03215 9.86 -20.12 -1.9333 7.6623e-04
3 2912.63 0.03733 11.45 -18.83 -0.6807 8.6311e-04
4 3640.79 0.02039 6.25 -24.08 3.0997 2.3089e-04

The columns contain:

  1. harmonics Index of the harmonics. The first one with index 0 is the fundamental frequency.
  2. frequency Frequency of the harmonics in Hertz.
  3. amplitude Amplitude of each harmonics obtained by fitting a Fourier series to the data in the unit of the input data.
  4. relampl Amplitude of each harmonics relative to the amplitude of the fundamental in percent.
  5. relpower Power of each harmonics relative to fundamental in decibel.
  6. phase Phase of each harmonics obtained by fitting a Fourier series to the data in radians ranging from 0 to 2 pi.
  7. datapower Power spectral density of the harmonics from the original power spectrum of the data.

RECORDING-CHANNEL-TIME-pulsefish.EXT

Properties of each pulse-type fish detected in the recording.

recording waveform power spectrum
tstart twindow index EODf period max-ampl min-ampl p-p-amplitude noise clipped flipped tstart tend width P2-P1-dist tau firstpeak lastpeak n peakfreq peakpower poweratt5 poweratt50 lowcutoff
s s - Hz ms a.u. a.u. a.u. % % - ms ms ms ms ms - - - Hz dB dB dB Hz
4.00 8.00 0 32.22 31.03 0.26557 0.21912 0.48469 0.1 0.0 0 4000.000 1.344 1.687 0.250 0.087 1 2 235 1130.86 -81.56 -27.84 -22.84 98.14

The columns contain:

  1. tstart Start time of the analysis window in the recording in seconds.
  2. twindow Duration of the analysis window in seconds.
  3. index Index of the fish (the number that is also used to number the files).
  4. EODf EOD frequency in Hertz.
  5. period Period between two pulses (1/EODf) in milliseconds.
  6. max-ampl Amplitude of the largest peak (P1 peak) in the units of the input data.
  7. min-ampl Amplitude of the largest trough in the units of the input data.
  8. p-p-amplitude Peak-to-peak amplitude in the units of the input data.
  9. noise Root-mean-squared standard error of the averaged EOD waveform relative to the peak-to_peak amplitude in percent.
  10. clipped Percentage of recording that is clipped.
  11. flipped Whether the waveform was flipped.
  12. tstart Time where the pulse starts relative to P1 in milliseconds.
  13. tend Time where the pulse ends relative to P1 in milliseconds.
  14. width Total width of the pulse in milliseconds.
  15. P2-P1-dist: Distance between P2 and P1 in milliseconds. Zero if p2 is not present.
  16. tau Time constant of the exponential decay of the tail of the pulse in milliseconds.
  17. firstpeak Index of the first peak in the pulse (i.e. -1 for P-1)
  18. lastpeak Index of the last peak in the pulse (i.e. 3 for P3)
  19. n Number of EODs used for computing the averaged EOD waveform.
  20. peakfreq Frequency at the peak power of the single pulse spectrum in Hertz.
  21. peakpower Peak power of the single pulse spectrum in decibel.
  22. poweratt5 How much the average power below 5 Hz is attenuated relative to the peak power in decibel.
  23. poweratt50 How much the average power below 50 Hz is attenuated relative to the peak power in decibel.
  24. lowcutoff Frequency at which the power reached half of the peak power relative to the initial power in Hertz.

RECORDING-CHANNEL-TIME-pulsepeaks-N.EXT

Properties of peaks and troughs of a pulse-type fish's EOD.

P time amplitude relampl width
- ms a.u. % ms
1 0.000 0.78409 100.00 0.333
2 0.385 -0.85939 -109.60 0.248

The columns contain:

  1. P Name of the peak/trough. Peaks and troughs are numbered sequentially. P1 is the largest peak with positive amplitude.
  2. time Time of the peak/trough relative to P1 in milliseconds.
  3. amplitude Amplitude of the peak/trough in the unit of the input data.
  4. relampl Amplitude of the peak/trough relative to the amplitude of P1.
  5. width Width of the peak/trough at half height in milliseconds.

RECORDING-CHANNEL-TIME-pulsespectrum-N.EXT

The power spectrum of a single EOD pulse of a pulse-type fish:

frequency power
Hz a.u.^2/Hz
0.00 4.7637e-10
0.34 9.5284e-10
0.67 9.5314e-10
1.01 9.5363e-10
1.35 9.5432e-10
1.68 9.5522e-10

The columns contain:

  1. frequency Frequency in Hertz.
  2. power Power spectral density.