Friday, 5 March 2021

NLP / WIC Benchmark:

Increased processing speed by adding Binary Searching/Indexing.

Huge results.

Also added to Spell Checking (800 words), and Words (3700).

Original word lists are unsorted, so they are hashed & sorted during run-time. 

Processing 5428 sentences x2:
Before: 2600ms
After: 76ms prepare time.
After: linear searching the hash lists: 1700ms (900ms faster)
After: binary searching the hash lists: 930ms (1670ms faster)

There are other processes but for the spell/word search alone, Hashed/Linear resulted in a 1.5x speed improvement, and Hashed/Binary resulted in a 2.8x improvement.

Monday, 1 March 2021

NLP / WIC Benchmark:

There's now 3700 words (+1400). 900 WIC pattern sentences (+800). Re-added spell-checking, so the full WIC test takes about 2.5 seconds to complete.

The scale and pickup is actually immense. Each of the 900 WIC pattern sentences has 4-10 Symbolic Words, and each Symbolic Word represents 10-500 words. So each of the 900 WIC sentences can pick up a very high numer of variations.

One side effect is that I'll need to drop the old "Intention" categories used for the chatbot and use these new WIC categories instead as this picks up a much better variety. There are about 50 different groups (will be merging some) along the lines of:

"person or thing started to move / person or thing has him..."
"the object/concept of a had-thing"
"had the concept when..."
"a motion was taken / apply a rule / have-take the concept-chance to..."
"i play/avoid the / objects moved/ordered/fell to the
"logic-action an object"
"moving-action the object"
"an object of objects / vivid objects/objectives of"

These will be better used with the chatbot.

Speech Recognition (benchmark of "yes"/"no"): C version/Console: Redesigned Volume Normalisation to improve clarity abov...