Real-time ASR Dev Blog: April 2023

Thursday, 13 April 2023

Speech Recognition / Phoneme Extraction Tool:

Now integrating into a phoneme extraction tool.
Added Frequency Spectrogram to visually output frequency groups/blocks.
Removed Hilbert @ 2000hz filter. It was causing some frequencies to be inconsistent.
Removed FIR LP @ 1000hz and split samples, split FFT. It did improve frequencies slightly, but not enough to warrant the extra processing time and code complexity.
Updated Fletcher-Munson Curve/Equal loudness.
Updated frequency group ranges.
Updated/fixed a bug with FFT output.
Updated Vowel recognition.
Still no MFC.
Faster (1-5ms per syllable).

Combined vowels "i" and "ee":
It's unable to tell the difference between the short vowel "i" and long vowel "ee", as the FFT decimates frequencies too much to reliably detect a 20hz frequency drop. An MFC setup only for low frequencies may help,... but the real solution is an FFT with higher resolution in that area (and less resolution in upper frequency areas).

There is an error with the hamming window, sine wave sync/frame, or FFT. (Vertical grey bars in the spectrogram). Otherwise it's looking good.

Testing sounds "kay", "key","kai".