thunderfish

Detect, analyze, and plot all EOD waveforms in a short recording.

Authors

The Neuroethology-lab at the Institute of Neuroscience at the University of Tübingen:

  • Jan Benda
  • Jörg Henninger (harmonic groups)
  • Juan Sehuanes (best window)
  • Till Raab (annotation)
  • Liz Weerdmeester (pulse clustering)

Principles of operation

thunderfish automatically detects and analyzes all wave- and pulse-type EOD waveforms in a short recording. A short recording is typically no longer than about 30s. The recordings are made either with a fishfinder (a stick with two electrodes used to find electric fish in the field) or standardized head-tail recordings in a little tank.

  1. A segment for further waveform analysis is identified in the recording (bestwindow module). In this segment the amplitude of the recording is largest while at the same time most stable and not clipped.
  2. A powerspectrum of a given frequency resolution is computed (powerspectrum module) and potential EOD frequencies of wave-type fish are detected in this power spectrum based on their harmonic structure (harmonics module).
  3. EODs of pulse-type fish are detected and clustered according to their width, amplitude, and shape (pulse module).
  4. For each pulse and wave-type fish detected in the recording an averaged waveform is computed and its properties are analyzed (eodanalysis module)

The files generated by thunderfish on EOD waveform properties can be summarized in single files by means of the collectfish script and then analyzed and explored with the eodexplorer.

Command line arguments

thunderfish --help

returns

usage: thunderfish.py [-h] [--version] [-v] [-V] [-c] [--channel CHANNEL] [-t TIME] [-T] [-m {w,p,wp}] [-a] [-S] [-b]
                      [-l [MINFREQ]] [-p] [-M PDFFILE] [-P rtpwse] [-d PATH] [-j [JOBS]] [-s] [-z]
                      [-f {dat,ascii,csv,rtai,md,tex,html,py}] [-o OUTPATH] [-k] [-i KWARGS]
                      [file [file ...]]

Analyze EOD waveforms of weakly electric fish.

positional arguments:
  file                  name of a file with time series data of an EOD recording, may include wildcards

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v                    verbosity level. Increase by specifying -v multiple times, or like -vvv
  -V                    level for debugging plots. Increase by specifying -V multiple times, or like -VVV
  -c                    save configuration to file thunderfish.cfg after reading all configuration files
  --channel CHANNEL     channel to be analyzed (defaults to first channel, negative channel selects all channels)
  -t TIME               start time of analysis window in recording: "beginning", "center", "end", "best", or time in seconds
                        (overwrites "windowPosition" in cofiguration file)
  -T                    add start time of analysis file to output file names
  -m {w,p,wp}           extract wave "w" and/or pulse "p" fish EODs
  -a                    show all EOD waveforms in the summary plot
  -S                    plot spectra for all EOD waveforms in the summary plot
  -b                    indicate bad EODs in legend of power spectrum
  -l [MINFREQ]          logarithmic frequency axis in power spectrum with optional minimum frequency (defaults to 100 Hz)
  -p                    save output plots as pdf files
  -M PDFFILE            save all summary plots of all recordings in a multi page pdf file. Disables parallel jobs.
  -P rtpwse             save subplots as separate pdf files: r) recording with analysis window, t) data trace with detected
                        pulse fish, p) power spectrum with detected wave fish, w/W) mean EOD waveform, s/S) EOD spectrum, e/E)
                        EOD waveform and spectra, d) the default summary plot. Capital letters produce a single multipage pdf
                        containing plots of all detected fish
  -d PATH               path to raw EOD recordings needed for plotting based on analysis results
  -j [JOBS]             number of jobs run in parallel. Without argument use all CPU cores.
  -s                    save analysis results to files
  -z                    save analysis results in a single zip file
  -f {dat,ascii,csv,rtai,md,tex,html,py}
                        file format used for saving analysis results, defaults to the format specified in the configuration file
                        or "csv"
  -o OUTPATH            path where to store results and figures (defaults to current working directory)
  -k                    keep path of input file when saving analysis files, i.e. append path of input file to OUTPATH
  -i KWARGS             key-word arguments for the data loader function

version 1.9.9 by Benda-Lab (2015-2022)

examples:
- analyze the single file data.wav interactively:
  > thunderfish data.wav
- extract wavefish only:
  > thunderfish -m w data.wav
- automatically analyze all wav files in the current working directory and save analysis results and plot to files:
  > thunderfish -s -p *.wav
- analyze all wav files in the river1/ directory, use all CPUs, and write files directly to "results/":
  > thunderfish -j -s -p -o results/ river1/*.wav
- analyze all wav files in the river1/ directory and write files to "results/river1/":
  > thunderfish -s -p -o results/ -k river1/*.wav
- write configuration file:
  > thunderfish -c

Configuration file

Many parameters of the algorithms used by thunderfish can be set via a configuration file.

Generate the configuration file by executing

thunderfish -c

This first reads in all configuration files found (see below) and then writes the file thunderfish.cfg into the current working directory.

Whenever you run thunderfish it searches for configuration files in

  1. the current working directory
  2. the directory of each input file
  3. the parent directories of each input file, up to three levels up.

Best practice is to move the configuration file to the root of the file tree where data files of a recording session are stored.

Use the -v switch twice to see which configuration files are loaded:

thunderfish -vv data.wav

Open the configuration file in your favourite editor and edit the settings. Each parameter is briefly explained in the comment preceding the parameter.

Important configuration parameter

The list of configuration parameter is overwhelming and most of them you do not need to touch at all. Here is a list of the few that matter (in the order as they appear in the configuration file):

  • frequencyResolution: this sets the nnft parameter for computing the power spectrum such to achieve the requested resolution in frequency. The longer your analysis window the smaller you can set the resultion (not smaller then the inverse analysis window).

  • numberPSDWindows: If larger than one then only fish that are present in all windows are reported. If you have very stationary data (from a restrained fish, not from a fishfinder) you may set this to one.

  • lowThresholdFactor, highThresholdFactor: play around with these numbers if not all wavefish are detected or if too many peaks are detected in the power spectrum.

  • mainsFreq: Set it to the frequency of your mains power supply (50 or 60 Hz) or to zero if you have hum-free recordings.

  • maxRelativePower: Usually, the higher the harmonics the less power it has. In order to discard signals whose power does not decay set this -10 or -20 dB.

  • maxGroups: Set to 1 if you know that only a single fish is in your recording.

  • minDataAmplitude, maxDataAmplitude: If the maximum voltage range your recording device differs from -1 to 1 (default for WAV files), set these two parameter to the limits, so that clipped recordings are detected as such.

  • windowSize: How much of the data should be used for analysis. If you have stationary data (from a restrained fish, not from a fishfinder) you may want to use the full recording by setting this to zero.

  • windowPosition: Where to place the analysis window: at the "beginning", "center", or "end" of the recording. If set to "best" (default) thunderfish searches for the most stationary data segment of the requested length. Can be overwritten from the command line with the -t argument.

  • pulseWidthPercentile: If low frequency pulse fish are missed then reduce this number.

  • eodMaxEODs: The average waveform is estimated by averaging over at maximum this number of EODs. If wavefish change their frequency then you do not want to set this number too high (10 to 100 is enough for reducing noise). If you have several fish on your recording then this number needs to be high (1000) to average away the other fish. Set it to zero in order to use all EODs in the data segment selected for analysis.

  • flipWaveEOD, flipPulseEOD: In case of recordings with a fishfinder you do not know the orientation of the fish relative to your electrode. That is you do not know the polarity of your recording. Setting this to auto flips the sign of the averaged EOD waveform to a standardized polarity (wave-type fish: larger peak relative to average is positive, pulse-type fish: the first of the two largest peaks is positive).

  • fileFormat: sets the default file format to be used for storing the analysis results.

Summary plots

By default, thunderfish simply displays the analysis results in a summary plot.

You can produce these plots either from the recording files or directly from the saved analysis results (.csv or .zip files, see next section). So to analyse some recordings and save the summary plots to pdf files in the images/ folder you call

thunderfish -p -o images data/*.wav

But this might take a while since the analysis is costly.

Alternatively you might first analyze the recordings and save the analyis results as zip files in a results/ folder:

thunderfish -j -s -z -o results data/*.wav

Afterwards, you then can quickly look at the results by calling

thunderfish -d data/ results/*.zip

and press q to flip through the plots. The -d option tells thunderfish where it finds the corresponding files with the recordings.

In the summary plots you can press

  • q: Close the plot window and show the next one or quit.
  • p: Play the analyzed section of the reording on the default audio device.
  • o: Switch on zoom mode. You can draw a rectangle with the mouse to zoom in.
  • Backspace: Zoom back.
  • f: Toggle full screen mode.

By default, the summary plots display at maximum four EOD waveforms with the largest amplitudes. If only a single waveform is found, then its spectrum is displayed as well. The frequencies of the power spectrum of the recording are shown on a linear scale. These behaviors can be modified by the following command line options:

  • -a: plot all detected EOD waveforms in summary plots.
  • -S: plot spectra for all displayed EOD waveforms.
  • -b: indicate bad EODs in the legend of the power spectrum of the recording.
  • -l [MINFREQ]: plot the power spectrum on a logarithmic frequency scale. The optional argument in addition allows to set the minimum frequency MINFREQ in Hertz that is displayed (defaults to 100 Hz).

The plots can alternatively be saved to pdf files via the -p option. They are named RECORDING.pdf, where RECORDING is the base name of the recording file. See next section on how to define the output folder (-o and -k options).

The summary plots of all analyzed recordings can also be stored in a single, multi-page pdf file, where the results of each recording are plotted on a separate page. For this supply a name for the pdf file via the -M option.

The various subplots of the summary plot can also be viewd separately or saved in separate pdf files. For this, use the -P option. It expects as an argument a string whose characters specify what you want to plot:

  • r: plot the whole recording with the analysis window indicated (saved into RECORDING-recording.pdf).
  • t: plot of a small section of the data trace with detected pulse fish (saved into RECORDING-trace.pdf).
  • p: power spectrum of the recording with detected wave fish (saved into RECORDING-psd.pdf).
  • w/W: annotated mean EOD waveforms (saved into RECORDING-waveforms.pdf).
  • s/S: spectrum of the EOD waveform (saved into RECORDING-spectrum.pdf).
  • e/E: Both the annotated EOD waveform and its spectrum (saved into RECORDING-eods.pdf).
  • d: the default summary plot (saved into RECORDING.pdf).

Capital letters produce a single multipage pdf containing the specified plots of all detected fish of a recording. For example,

thunderfish -p -P pE -d data/ -o images/ results/*.zip

produces pdf files with the power spectrum of the recording and with all the EOD waveforms together with their spectra in the folder images/. For computing the power spectrum thunderfish needs the raw data that it finds in the data/ folder.

Output pathes

Output files (plots and/or analysis results) are placed in the current working directory if no path is specified via the -o switch. If the path specified via -o does not exist it is created.

With the -k switch the pathes of the input files are appended to the output path. This allows you to analyse recordings organized in a nested directory structure in one step and write the files in the same structure. For example:

thunderfish -s -k -o analysis river1/habitatA/*.wav river1/habitatB/*.wav river2/*.wav

will store the files in

analysis/river1/habitatA/
analysis/river1/habitatB/
analysis/river2/

whereas without the -k switch all files are stored in

analysis/

To make use of all the cores of your CPU apply the -j switch.

Analysis results

With the -s switch analysis results are saved to files and no interactive output plots are generated.

The many output files (see below) can be combined into a single zip archive (one per recording) with the -z option.

The following files are generated:

  • RECORDING-CHANNEL-TIME-eodwaveform-N.EXT: averaged EOD waveform
  • RECORDING-CHANNEL-TIME-waveeodfs.EXT: list of all detected EOD frequencies and powers of wave-type fish
  • RECORDING-CHANNEL-TIME-wavefish.EXT: list of properties of good EODs of wave-type fish
  • RECORDING-CHANNEL-TIME-wavespectrum-N.EXT: for each wave-type fish the Fourier spectrum
  • RECORDING-CHANNEL-TIME-pulsefish.EXT: list of properties of good EODs of pulse-type fish
  • RECORDING-CHANNEL-TIME-pulsepeaks-N.EXT: for each pulse-type fish properties of peaks and troughs
  • RECORDING-CHANNEL-TIME-pulsetimes-N.EXT: for each pulse-type fish the time points of detected EODs
  • RECORDING-CHANNEL-TIME-pulsespectrum-N.EXT: for each pulse-type fish the power spectrum of a single pulse

Filenames are composed of the basename of the input file (RECORDING). In case the input files contain more than a single channel channel specification is appended (CH), a 'c' followed by the channel number. In case the start time of the analysis window was requested to be saved into the file name (-T option), this start time is added to the file name (TIME) as an 't' followed by the start time floored to integer seconds, and an 's'. Fish detected in the recordings are numbered, starting with 0 (N). The file extension depends on the chosen file format (EXT). The following sections describe the content of the generated files.

RECORDING-CHANNEL-TIME-eodwaveform-N.EXT

For each fish the average waveform with standard deviation and fit.

time mean std fit
ms a.u. a.u. a.u.
-1.746 -0.34837 0.01194 -0.34562
-1.723 -0.30700 0.01199 -0.30411
-1.701 -0.26664 0.01146 -0.26383
-1.678 -0.22713 0.01153 -0.22426
-1.655 -0.18706 0.01187 -0.18428

The columns contain:

  1. time Time in milliseconds.
  2. mean Averaged waveform in the unit of the input data.
  3. std Corresponding standard deviation.
  4. fit A fit to the averaged waveform. In case of a wave fish this is a Fourier series, for pulse fish it is an exponential fit to the tail of the last peak.

RECORDING-CHANNEL-TIME-waveeodfs.EXT

List of all detected EOD frequencies and powers of wave-type fish. These might be more than listed in RECORDING-CHANNEL-TIME-wavefish.EXT.

index EODf datapower
- Hz dB
1 111.33 -33.35
2 132.81 -37.86
0 580.08 -22.01
3 608.89 -45.45

The columns contain:

  1. index Index of the fish (the number that is also used to number the files).
  2. EODf EOD frequency in Hertz.
  3. datapower Power of this EOD in decibel (sum over all peaks in the power spectrum of the recording).

RECORDING-CHANNEL-TIME-wavefish.EXT

Fundamental EOD frequency and other properties of each wave-type fish detected in the recording.

recording waveform timing
twin window winclipped samplerate nfft dfreq index EODf p-p-amplitude power datapower thd dbdiff maxdb noise rmserror clipped flipped n ncrossings peakwidth troughwidth leftpeak rightpeak lefttrough righttrough p-p-distance reltroughampl
s s % kHz - Hz - Hz a.u. dB dB % dB dB % % % - - - % % % % % % % %
4.25 8.00 0.00 32.000 65536 0.49 0 580.08 0.22755 -21.28 -22.01 149.81 2.93 -9.22 0.3 0.36 0.0 0 3300 1 76.15 23.85 69.12 7.03 11.10 12.75 18.13 312.11
4.25 8.00 0.00 32.000 65536 0.49 1 111.33 0.00713 -50.80 -34.09 67.60 7.18 -29.48 34.7 2.94 0.0 0 888 2 44.00 56.00 19.50 24.51 16.55 39.45 41.05 73.54
4.25 8.00 0.00 32.000 65536 0.49 2 132.81 0.01029 -46.47 -37.87 46.49 8.40 -32.48 22.0 1.55 0.0 0 1059 2 49.11 50.89 25.30 23.82 29.72 21.16 53.54 103.40
4.25 8.00 0.00 32.000 65536 0.49 3 608.89 0.00258 -59.51 -45.45 100.84 15.01 -22.24 40.9 1.37 0.0 0 4868 2 36.29 63.71 22.14 14.15 42.93 20.78 57.08 91.60
4.25 8.00 0.00 32.000 65536 0.49 4 1979.49 0.00177 -61.18 -61.72 58.05 13.10 -24.18 33.0 2.08 0.0 0 15833 2 53.94 46.06 22.94 31.00 24.67 21.38 55.67 131.60

The columns contain:

  1. twin Start time of the analysis window in the recording in seconds.
  2. window Duration of the analysis window in seconds.
  3. winclipped: Fraction of analysis window that is clipped.
  4. samplerate: Sampling rate of the recording.
  5. nfft: Number of samples used for FFT to compute power spectrum.
  6. dfreq: Frequency resolution of power spectrum.
  7. index Index of the fish (the number that is also used to number the files).
  8. EODf EOD frequency in Hertz.
  9. p-p-amplitude Peak-to-peak amplitude of the extracted waveform in the units of the input data.
  10. power Power of the extracted EOD waveform, i.e. sum of the squared Fourier amplitudes, in decibel.
  11. datapower Power of the EOD waveform from the spectrum of the original data in decibel.
  12. thd: Total harmonic distortion, i.e. square root of sum of amplitudes squared of harmonics relative to amplitude of fundamental.
  13. dbdiff Smoothness of power spectrum as standard deviation of differences in decibel power.
  14. maxdb Maximum power of higher harmonics relative to peak power in decibel.
  15. noise Root-mean-squared standard error of the averaged EOD waveform relative to the peak-to_peak amplitude in percent.
  16. rmserror Root-mean-squared difference between the averaged EOD waveform and the fit of the Fourier series relative to the peak-to_peak amplitude in percent.
  17. clipped Percentage of recording that is clipped.
  18. flipped Whether the waveform was flipped.
  19. n Number of EODs used for computing the averaged EOD waveform.
  20. ncrossings Number of zero crossing per EOD period.
  21. peakwidth Width of the peak at the averaged amplitude relative to EOD period.
  22. troughwidth Width of the trough at the averaged amplitude relative to EOD period.
  23. leftpeak Time from positive zero crossing to peak relative to EOD period.
  24. rightpeak Time from peak to negative zero crossing relative to EOD period.
  25. lefttrough Time from negative zero crossing to trough relative to EOD period.
  26. righttrough Time from trough to positive zero crossing relative to EOD period.
  27. p-p-distance Time between peak and trough relative to EOD period.
  28. reltroughampl Amplitude of trough relative to peak amplitude.

RECORDING-CHANNEL-TIME-wavespectrum-N.EXT

The parameter of the Fourier series fitted to the waveform of a wave-type fish.

harmonics frequency amplitude relampl relpower phase datapower
- Hz a.u. % dB rad a.u.^2/Hz
0 728.16 0.32610 100.00 0.00 0.0000 1.0137e-01
1 1456.32 0.22146 67.91 -3.36 2.4706 4.1881e-02
2 2184.48 0.03215 9.86 -20.12 -1.9333 7.6623e-04
3 2912.63 0.03733 11.45 -18.83 -0.6807 8.6311e-04
4 3640.79 0.02039 6.25 -24.08 3.0997 2.3089e-04

The columns contain:

  1. harmonics Index of the harmonics. The first one with index 0 is the fundamental frequency.
  2. frequency Frequency of the harmonics in Hertz.
  3. amplitude Amplitude of each harmonics obtained by fitting a Fourier series to the data in the unit of the input data.
  4. relampl Amplitude of each harmonics relative to the amplitude of the fundamental in percent.
  5. relpower Power of each harmonics relative to fundamental in decibel.
  6. phase Phase of each harmonics obtained by fitting a Fourier series to the data in radians ranging from 0 to 2 pi.
  7. datapower Power spectral density of the harmonics from the original power spectrum of the data.

RECORDING-CHANNEL-TIME-pulsefish.EXT

Properties of each pulse-type fish detected in the recording.

recording waveform power spectrum
twin window winclipped samplerate nfft dfreq index EODf period max-ampl min-ampl p-p-amplitude noise clipped flipped tstart tend width P2-P1-dist tau firstpeak lastpeak n peakfreq peakpower poweratt5 poweratt50 lowcutoff
s s % kHz - Hz - Hz ms a.u. a.u. a.u. % % - ms ms ms ms ms - - - Hz dB dB dB Hz
4.00 8.00 0.00 32.000 65536 0.49 0 32.22 31.03 0.26557 0.21912 0.48469 0.1 0.0 0 -0.344 1.344 1.687 0.250 0.087 1 2 235 1130.86 -81.56 -27.84 -22.84 98.14

The columns contain:

  1. twin Start time of the analysis window in the recording in seconds.
  2. window Duration of the analysis window in seconds.
  3. winclipped: Fraction of analysis window that is clipped.
  4. samplerate: Sampling rate of the recording.
  5. nfft: Number of samples used for FFT to compute power spectrum.
  6. dfreq: Frequency resolution of power spectrum.
  7. index Index of the fish (the number that is also used to number the files).
  8. EODf EOD frequency in Hertz.
  9. period Period between two pulses (1/EODf) in milliseconds.
  10. max-ampl Amplitude of the largest peak (P1 peak) in the units of the input data.
  11. min-ampl Amplitude of the largest trough in the units of the input data.
  12. p-p-amplitude Peak-to-peak amplitude in the units of the input data.
  13. noise Root-mean-squared standard error of the averaged EOD waveform relative to the peak-to_peak amplitude in percent.
  14. clipped Percentage of recording that is clipped.
  15. flipped Whether the waveform was flipped.
  16. tstart Time where the pulse starts relative to P1 in milliseconds.
  17. tend Time where the pulse ends relative to P1 in milliseconds.
  18. width Total width of the pulse in milliseconds.
  19. P2-P1-dist: Distance between P2 and P1 in milliseconds. Zero if p2 is not present.
  20. tau Time constant of the exponential decay of the tail of the pulse in milliseconds.
  21. firstpeak Index of the first peak in the pulse (i.e. -1 for P-1)
  22. lastpeak Index of the last peak in the pulse (i.e. 3 for P3)
  23. n Number of EODs used for computing the averaged EOD waveform.
  24. peakfreq Frequency at the peak power of the single pulse spectrum in Hertz.
  25. peakpower Peak power of the single pulse spectrum relative to one in decibel.
  26. poweratt5 How much the average power below 5 Hz is attenuated relative to the peak power in decibel.
  27. poweratt50 How much the average power below 50 Hz is attenuated relative to the peak power in decibel.
  28. lowcutoff Frequency at which the power reached half of the peak power relative to the initial power in Hertz.

RECORDING-CHANNEL-TIME-pulsepeaks-N.EXT

Properties of peaks and troughs of a pulse-type fish's EOD.

P time amplitude relampl width
- ms a.u. % ms
1 0.000 0.78409 100.00 0.333
2 0.385 -0.85939 -109.60 0.248

The columns contain:

  1. P Name of the peak/trough. Peaks and troughs are numbered sequentially. P1 is the largest peak with positive amplitude.
  2. time Time of the peak/trough relative to P1 in milliseconds.
  3. amplitude Amplitude of the peak/trough in the unit of the input data.
  4. relampl Amplitude of the peak/trough relative to the amplitude of P1.
  5. width Width of the peak/trough at half height in milliseconds.

RECORDING-CHANNEL-TIME-pulsetimes-N.EXT

Time points of detected pulse-type EODs.

time
s
0.1043
0.1353
0.1662
0.1971
0.2279
0.2589
0.2898
0.3207

The columns contain: 1. The times of pulse-type EODs in seconds.

RECORDING-CHANNEL-TIME-pulsespectrum-N.EXT

The power spectrum of a single EOD pulse of a pulse-type fish:

frequency power
Hz a.u.^2/Hz
0.00 4.7637e-10
0.34 9.5284e-10
0.67 9.5314e-10
1.01 9.5363e-10
1.35 9.5432e-10
1.68 9.5522e-10

The columns contain:

  1. frequency Frequency in Hertz.
  2. power Power spectral density.