Real-time ASR Dev Blog

Monday, 10 July 2023

Speech Recognition / Phoneme Extraction Tool:

Updated equal-loudness
Reduced noise floor dynamic step up/down amount
Lowered initial volume pick up
Replaced Inverse Blackman window with a 2x amplified Hann window. Similar quality, large speed gain.
Improved consonant framing (plosive find)
Redesigned consonant identification
Added Vowel formant 1 Focusing. Frequencies starting at or above 656hz are now soley detected to transition at or under this amount (to avoid interference from strong nasal tone).
Reduced nasal tone volume (875-1000hz) by half

Working:
"kay, kee, kai"
"no" x3 variations