Available Extension
A list of supporting tools that I officially maintain
Detectors
Emphasis
Detects the emphasis (if any on each token). Based on a dictionary of emphasis tokens and the dependency parser.
Negation
Detects if the a token is negated, such as adjectives, adverbs, verbs ... etc. It's based on a dictionary and the dependency parser.
Sentiment
Detects token negative and positive sentiments. For example That was a good show
, the token show
have a positive sentiment. Each token is given numerical value representing it's sentiment, when the number is negative the sentiment is negative such as that was a bad show
, when the number is positive the sentiment is positive, and when it's 0
then the sentence doesn't have a sentiment description of the token.
Sentence Type
Each sentence might be declarative, exclamatory, interrogative or imperative, This detector tries to analyze the sentence and gives the sentence type based on the tokens, tags and dependency tree.
URLs and Emails
When a sentence has emails, URLs or IP addresses it might mess up the lexer thus both of the POS tagger and dependency parser will give inaccurate results. This detector will add a pre and post processors that makes the lexer treats them as one word.
In addition to that, it can be also used detect URLs, emails and IP addresses.
UK-US Spelling variations
This detector will analyze the given sentence and gives whether it has a UK or US specific spelling variation.
It will also gives a UK/US specific variation when it's applicable.
Sentence Tense
Detecting the exact tense of a given sentence.
Typos and spellchecking
coming soon
Date and time
coming soon
Numbers and units
coming soon
Post/PreProcessors
Slang
The provided POS tagger doesn't play nicely with internet slang. This pre processor will normalize those slangs into a bit more formal tokens. For example w/o
will be converted to without
and gr8
will be converted to great
.
HTML entities
When you're accepting data from the internet this data might be escaped. For example the greater than sign >
might be presented as >
. This will cause inaccuracies in the lexer and thus the POS tagger and the dependency parser. This preprocessor will treat all the HTML entities converting them into their normal form.
Additional Tools
Following tools aren't really extensions to the Fin natural language processor but they might be useful to you when authoring extensions.
Inflectors
This library can be used for english language inflections.
Converting one form of a verb to other forms (present to past participle and such).
Detecting whether a given noun is plural or singular.
Converting from plural to singular and vice versa.
Transforming adjectives to comparative and superlatives.
When looking for a library that provides the above solutions it's highly recommended to use this one, as it has been thoroughly tested and built to be very accurate.
Stemmer
Stems a token to it's root form. It's basically a TypeScript implementation of the porter stemmer.
Spelling Variations
Give the spelling variations of a given token and whether it's a UK or US spelling variation.
String distance
An alternative to the Levenshtein Distance (LD) calculator
Spell checker
Fast and accurate spellchecker for Node.js
Normalizer
Resolves english contractions (I'm
to I am
), replaces confusable characters (”
to "
) and normalizes all-caps sentences.
Lemmatizer
Removes inflections, prefixes and suffixes from the word.
Synonyms
coming soon
Last updated