Real-time ASR Dev Blog: December 2021

Friday, 3 December 2021

Speech Recognition:

I changed the three IIR Resonator filters to eight, and now have 50-100% more SNR with more stable reproducible numbers, compared to the 20% SNR before.

The numbers in the picture below represent Complexity (busyness) of the signal at specific frequencies. For different vowels/consonants the busyness should change greatly depending on frequency, however the frequencies are overlapping so they're mostly all equally busy. The Q of the IIR Resonator filters can be increased though to produce more isolation so I will be changing that.

cF1 is 300hz ... cF8 is 2750hz.

Frame size is 20ms, and duty cycle is only 1-3ms of that 20ms. So the thread goes to sleep 85-95% of the time.

Now also using a program called Praat to analyse my voice so I can see which frequencies should be the most busy depending on vowel/consonant.

There are a couple of other things I found out about accents as well to make vowels/consonants switching possible, but that depends on a clearer output.