NLP

Use: Synonyms, homophones, sentence start, missing word, intention/direction, chatbot.

Type: Word & Sentence Compression Pattern Matching. Dataless, non-statistic/algorithm.

 

Features:

Pros: Offline, No data, no training, no learning, instant reponse, white-box, multiple language, name & location privacy (unrecognised words are invisible) 

Cons: Small word dictionary size (<5000). Homophones less accurate.

Speed: 5500 sentence comparison tests in ~500ms.

Requirement: Windows/Linux. MCU.

Language: C


Definition:

The NLP type is a word and sentence compressor.

Words are matched in a basic spell correction dictionary, then a word dictionary. Matches from the dictionary return one 8-bit character symbol for broad sentence matching, and two other 8-bit symbols for context and uniqueness. There are 49 total word groups for the first symbol, and up to 256 for the second and third individually.

Sentences made of 4-10 one character symbols, where each symbol is 1 of 49 options, each containing hundreds to thousands of words. This means each sentence can detect hundreds of millions of sentences which contain similar meaning. These sentences are grouped and stored in a pre-defined list which compress the sentence to an intention symbol.

For a chatbot, the developer may use one-character intentions, with multiple one-character word contexts in lists to cover practically all spoken interactions, with a good level of attentiveness to the original sentence and a white-box response. Output can be further modified using a lookup table to output randomness in text/audio response. A good number of text literal/audio responses to cover a range of sentence intentions is 50-100. This makes changing or creating more chatbot personalities very easy.

For the problem of chatbots in realtime environments, it solves:

  • Too much processing power required.
  • Cannot change the personality/no personality.
  • Cannot change the language/only one language.
  • Chatbot not identifying the intention of the user.
  • Chatbot returning poor responses - only pre-determined responses are used.
  • Chatbot terrible voice synthesis - a voice actor can record all lines, including randomised alternates. 
 

Applications: 

Word in Context testing (intention/direction):

  • Comparing two identical words in two different sentences to determine whether the meaning is similar.

Chatbot/NPC companion (whitebox):

  • NLP Structure: Words are compressed into Sentences, Sentences are compressed into Intentions. Two generalised words exist as Topic data.
  • Chatbots: Intentions & Topic data are used in lists to associate with a list of chatbot responses. Chatbot responses may be written/recorded in different languages, use randomised alternates, and change between language at any time (per sentence).

Speech Recognition (benchmark of "yes"/"no"): C version/Console: Redesigned Volume Normalisation to improve clarity abov...