Audio Matcher Configuration¶
The AudioMatcher
reads an audio recording and uses an algorithm to compute a collection of audio fingerprints. This collection of fingerprints is characteristic to the recording. The fingerprints of the recording to be assessed is compared to a collection of fingerprints that is calculated from reference recordings. Every comparison yields an integer number that is a similarity measure of audio recordings. A higher number indicates a stronger similarity to the reference recording.
-
The Fingerprint Algorithm uses a vector of real numbers that represents the input audio signal. This input signal is subdivided into small, overlapping windows. Every window is Fourier transformed to yield the frequency content of the audio signal at the instant the window is located at. In total, the transform yields a two-dimensional spectrogram, where each value is indexed by frequency and time.
-
The next step is to locate local maxima in this second representation of the signal. After sorting the local peaks, pairs of those peaks are formed. Every pair of peaks corresponds to a single audio fingerprint for the signal.
-
Eventually, the signal is represented by a list of audio fingerprints that is characteristic to the input audio.
Configuration¶
Multiple AudioMatcher
profiles can be configured. intaQt will always provide the default profile, as well as default values for minimumConfidence
and referenceDirectory
. Each profile will create its own audio reference file, named <profileName>.audio
.
Syntax
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | AudioMatcher = { profiles = { <profileName String> = { stftWindowSize: <Number>, stftOverlap: <Number>, stftWindowType: <Number>, afpAmplitudeCap: <Number>, afpPairingRange: <Number> }, ... }, minimumConfidence: <Number> referenceDirectory : <String> } |
Parameters
-
profileName - The name assigned to the audio profile
-
stftWindowSize - The Short Time Fourier Tranform (STFT)'s window size
- Default value is set to
1024
- The size of the input signal for audio matching must be larger than or equal to the window size
- Default value is set to
-
stftOverlap - The windows overlap in the
STFT
- Default value is set to
992
- The overlap must be smaller than the window size
- Default value is set to
-
stftWindowType - Defines the averaging window function to be used for the
STFT
- Allowed values are:
0
(Default) - Hann1
- Hamming2
- Blackman3
- No averaging
- Allowed values are:
-
afpAmplitudeCap - The limit value a local maximum needs to exceed in order to be considered a peak that is eligible for fingerprinting
- Default value is set to
15.0
- The values in the spectrogram are derived from a log-scaled power spectrum
- Default value is set to
-
afpPairingRange - The number for pairs that are considered for a specific peak to form fingerprints
- Default value is set to
10
- The larger this number, the more fingerprints are created
- Default value is set to
-
minimumConfidence - The minimum number of matches (such as equals fingerprints) that an audio recording must yield in order to be considered "equal" (sufficiently similar) to a reference recording
- Default value is set to
0
- Raising this number means ignoring all those matches whose matching score (such as number of matching fingerprints) if it is too low
- Default value is set to
-
referenceDirectory - The location directory where the reference recordings are stored
- The default is
"../audio-ref"
relative to intaQt'sbin
directory
- The default is
Important! This is an experimental feature. It is highly recommended to use the default mathematical matching configuration parameters for the AudioMatcher
.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | AudioMatcher = { profiles = { "default" = { stftWindowSize: 1024, stftOverlap: 992, stftWindowType: 0, afpAmplitudeCap: 30.0, afpPairingRange: 10 }, "profile2" = { stftWindowSize: 1024, stftOverlap: 992, stftWindowType: 0, afpAmplitudeCap: 30.0, afpPairingRange: 20 }, ... }, minimumConfidence: 0 referenceDirectory : "../audio-ref" } |