Available Extension

A list of supporting tools that I officially maintain

Detectors

Emphasis

Detects the emphasis (if any on each token). Based on a dictionary of emphasis tokens and the dependency parser.

Negation

Detects if the a token is negated, such as adjectives, adverbs, verbs ... etc. It's based on a dictionary and the dependency parser.

Sentiment

Detects token negative and positive sentiments. For example That was a good show, the token show have a positive sentiment. Each token is given numerical value representing it's sentiment, when the number is negative the sentiment is negative such as that was a bad show, when the number is positive the sentiment is positive, and when it's 0 then the sentence doesn't have a sentiment description of the token.

Sentence Type

Each sentence might be declarative, exclamatory, interrogative or imperative, This detector tries to analyze the sentence and gives the sentence type based on the tokens, tags and dependency tree.

URLs and Emails

When a sentence has emails, URLs or IP addresses it might mess up the lexer thus both of the POS tagger and dependency parser will give inaccurate results. This detector will add a pre and post processors that makes the lexer treats them as one word.

In addition to that, it can be also used detect URLs, emails and IP addresses.

UK-US Spelling variations

This detector will analyze the given sentence and gives whether it has a UK or US specific spelling variation.

It will also gives a UK/US specific variation when it's applicable.

Sentence Tense

Detecting the exact tense of a given sentence.

Typos and spellchecking

coming soon

Date and time

coming soon

Numbers and units

coming soon

Post/PreProcessors

Slang

The provided POS tagger doesn't play nicely with internet slang. This pre processor will normalize those slangs into a bit more formal tokens. For example w/o will be converted to without and gr8 will be converted to great.

HTML entities

When you're accepting data from the internet this data might be escaped. For example the greater than sign > might be presented as >. This will cause inaccuracies in the lexer and thus the POS tagger and the dependency parser. This preprocessor will treat all the HTML entities converting them into their normal form.

Additional Tools

Following tools aren't really extensions to the Fin natural language processor but they might be useful to you when authoring extensions.

Inflectors

This library can be used for english language inflections.

  • Converting one form of a verb to other forms (present to past participle and such).

  • Detecting whether a given noun is plural or singular.

  • Converting from plural to singular and vice versa.

  • Transforming adjectives to comparative and superlatives.

When looking for a library that provides the above solutions it's highly recommended to use this one, as it has been thoroughly tested and built to be very accurate.

Stemmer

Stems a token to it's root form. It's basically a TypeScript implementation of the porter stemmer.

Spelling Variations

Give the spelling variations of a given token and whether it's a UK or US spelling variation.

String distance

An alternative to the Levenshtein Distance (LD) calculator

Spell checker

Fast and accurate spellchecker for Node.js

Normalizer

Resolves english contractions (I'm to I am), replaces confusable characters ( to ") and normalizes all-caps sentences.

Lemmatizer

Removes inflections, prefixes and suffixes from the word.

Synonyms

coming soon

Last updated