Use: Synonyms, homophones, sentence start, missing word, intention/direction, chatbot.
Type: Word & Sentence Compression Pattern Matching. Dataless, non-statistic/algorithm.
Features:
Pros: Offline, No data, no training, no learning, instant reponse, white-box, multiple language, name & location privacy (unrecognised words are invisible)
Cons: Small word dictionary size (<5000). Homophones less accurate.
Speed: 5500 sentence comparison tests in ~500ms.
Requirement: Windows/Linux. MCU.
Language: C
Definition:
The NLP type is a word and sentence compressor.
Words are matched in a basic spell correction dictionary, then a word dictionary. Matches from the dictionary return one 8-bit character symbol for broad sentence matching, and two other 8-bit symbols for context and uniqueness. There are 49 total word groups for the first symbol, and up to 256 for the second and third individually.
Sentences made of 4-10 one character symbols, where each symbol is 1 of 49 options, each containing hundreds to thousands of words. This means each sentence can detect hundreds of millions of sentences which contain similar meaning. These sentences are grouped and stored in a pre-defined list which compress the sentence to an intention symbol.
For a chatbot, the developer may use one-character intentions, with multiple one-character word contexts in lists to cover practically all spoken interactions, with a good level of attentiveness to the original sentence and a white-box response. Output can be further modified using a lookup table to output randomness in text/audio response. A good number of text literal/audio responses to cover a range of sentence intentions is 50-100. This makes changing or creating more chatbot personalities very easy.
For the problem of chatbots in realtime environments, it solves:
- Too much processing power required.
- Cannot change the personality/no personality.
- Cannot change the language/only one language.
- Chatbot not identifying the intention of the user.
- Chatbot returning poor responses - only pre-determined responses are used.
- Chatbot terrible voice synthesis - a voice actor can record all lines, including randomised alternates.
Applications:
Word in Context testing (intention/direction):
- Comparing two identical words in two different sentences to determine whether the meaning is similar.
Chatbot/NPC companion (whitebox):
- NLP Structure: Words are compressed into Sentences, Sentences are compressed into Intentions. Two generalised words exist as Topic data.
- Chatbots: Intentions & Topic data are used in lists to associate with a list of chatbot responses. Chatbot responses may be written/recorded in different languages, use randomised alternates, and change between language at any time (per sentence).