Speech Recognition (benchmark of "yes"/"no"):
C version/Console:
Redesigned Volume Normalisation to improve clarity above 500hz, and per-frame clarity. Samples are now run through a Hilbert filter to reduce bass < 500hz, and Peak Volume Normalisation is at the end of each frame instead of all frames.
The new Hilbert filter and frame-based Peak Vol Normalisation improve these issues:
1: Deep voices no longer interfere with/reduce volume normalisation of frequencies above 500hz.
2: Initial frame is now normalised to itself. Other frames are normalised to the peak volume of current & previous frames (which ever is louder - peak volume does not adjust lower). As Vowels are detected to follow Consonants and are generally louder, both are now maximally normalised.
Deleted first frame 1.5x boost.
Updated Consonant Formants and identification.
Redesigned Plosive detection. Frame one plosive ("k"/"p"/"t") loudness minimum must match a fixed value. Frame two(+) plosive ("n"/"m"/"y"/"r") loudness minimum must match ~50% of the last plosive minimum + ~50% of the power of the last frame.
Updated Vowel F1 & F2 transition detection to gradually favour frequencies towards the end of the sound.
Range tuning. Better detection of plosives in consonants "k", "n", "y", detecting both sudden sound increases and rolling increases at any input sound volume range, noise, etc.
Other fixes.
Speech Commands Benchmark:
NO (405 records):
"no" = 48.40% (goal 50%) . "n", "oh", "o", "uh" = 82% (goal 90%).
Error | "yes" = 0.99%
YES (419 records):
"ye"/"yer"/"yeah" = 28.64% (goal 50%). "y", "e", "er", "air", "ah", "a", "uh" = 70-80% (goal 90%).
words.
Error | "no" = 2.15%
Time: 1-2ms each
+4% increase to "no", and +6% increase to "yes". Consonant identification needs to be reworked as the definition for "y" and "n" are too close. "kay-key-kai" works.
The benchmark is more of a test of noise rejection and volume normalisation than anything else. Very happy with the robustness of this now.
New consonant identification should see at least a 10%+ improvement to "yes" with less error. If error stays around <= 1% then more alternate vowels can be used and a higher result is possible.

