Real-time ASR Dev Blog

Sunday, 15 October 2023

Speech Recognition (benchmark of "yes"/"no"):

C version/Console:
Custom wave file loader for benchmarking
Benchmarking a target word
Noise Floor now based on RMS instead of Volume Peak.
Noise Floor Raise 'step' changed from fixed value to 2x current noise floor RMS.
Voice volume minimum changed to 5x current noise floor RMS (range: 3x - 6x).
Removed one-frame "click/pop" noise.
Combined Vol Peak Normalisation with FFT loading for a speed improvement.
Updated Vowel F1 & F2 transition analysis
Updated Vowel frequency group definitions. First three (281, 375, 468hz...).
Updated Equal Loudness
Updated Consonant identification
Other fixes

Speech Commands Benchmark:
BEFORE                        AFTER
"no" (405 records):
"no" = 9.14%                  to: 36.30% (goal 50%)
"n" or "oh" = 36.5%        to: 86.9% (goal 90%)

Time: 0-1ms each.

Very tough benchmark. Many samples are thwart with noise (blips, click/pop, static, paper sounds). There was a large change to detection by only changing noise floor from Peak to RMS. Another large change after redesigning consonant identification.

Work on noise elimination is needed, and testing "yes".