Pre and Post Processing
The Problem
Processing steps, can only be so smart, it can not possibly detect all real world cases. This is why it has been designed to be extensible. Now let's take another real world problem:
If Fin received this sentence:
It will be able to do all the processing correctly. However, if it receives an encoded version:
Things won't be so accurate, The &
is an encoded ampersand &
, known as HTML entity. This HTML entity can be expected from web entries, like social media posts, comments ...etc.
If we run the above example Rick & Morty is a good show
in Fin:
&
will be considered as conjugation coordinate.amp
will be considered as a noun.;
will be considered as a mid sentence punctuation.
This is obviously wrong, and it will lead to inaccurate POS tagging, and thus inaccurate dependency parsing.
The Solution
To solve the aforementioned problem (and other similar problems) we need to use preprocessors. Preprocessors act like an intercepting functions that intercepts any input, decodes it and return a decoded version.
The interceptor we defined above will take the string and replace all occurrences of &
with &
.
Postprocessors
Much like how preprocessors intercept the input string, postprocessors intercept the result object before it get returned to the caller.
Last updated