Set up rules for classifying documents or pages
- Updated: 2023/08/04
This topic describes about the capability to set up rules for classifying documents or pages.
Understanding rules and their usage
[
{
"DocumentTypeID": 0,
"Location": 0,
"Distance": 1,
"Score": 80,
"KBGuid": "00000000-0000-0000-0000-000000000000",
"IsEnabled": true,
"ExpectExactSequence": false,
"TextRulePhrases": [
{
"Text": "Annexure",
"IsNegativePhrase": false,
"PhraseType": 1
}
}
]
Rules are useful when additional guidance is needed to enhance the accuracy of a classification model in determining the most relevant document category. While technically it is possible to do all classification using a rules, it is not the best practice as the management of rules configuration becomes a significant overhead overtime especially when dealing with large number of categories.
Example of a rule file
[
{
"DocumentTypeID": 0,
"Location": 1,
"Distance": 3,
"Score": 90,
"KBGuid": "00000000-0000-0000-0000-000000000000",
"IsEnabled": true,
"ExpectExactSequence": true,
"TextRulePhrases": [
{
"Text": "Annexure",
"IsNegativePhrase": false,
"PhraseType": 1
},
{
"Text": "Terms & Conditions",
"IsNegativePhrase": false,
"PhraseType": 1
},
{
"Text": "Payment Terms",
"IsNegativePhrase": false,
"PhraseType": 1
}
]
},
{
"DocumentTypeID": 2,
"Location": 2,
"Distance": 1,
"Score": 95,
"KBGuid": "00000000-0000-0000-0000-000000000000",
"IsEnabled": true,
"ExpectExactSequence": false,
"TextRulePhrases": [
{
"Text": "Addendum",
"IsNegativePhrase": true,
"PhraseType": 5
}
]
}
]
Configurable properties of a rule file
Configuration | Description |
---|---|
DocumentTypeID |
Currently, this field is not supported. For
any rule being setup it can be kept static text as 0 . |
Location |
This configuration specifies which location
of the document text the rule applicable. The values can be
0, 1, 2, or 3.
|
Distance |
This configuration specifies the distance
between phrases when the look-up is done on the document
text. The rule will only match if the distance is as
specified basis this configuration. The values can be 0, 1,
2, or 3.
|
Score |
After a rule match is performed, a score is assigned to the category (or training folder) associated with that rule. The score value can range from -100 to 100. |
KBGuid |
Currently, this field is not supported. For
any rule being setup it can be kept static text as 00000000-0000-0000-0000-000000000000 |
IsEnabled |
This allows rule to be enabled or disabled by
settingtrue OR false respectively. |
ExpectExactSequence |
When looking up multiple phrases in a rule,
this configuration specifies exact sequence based matching.
For example, if set
true in
the example,"Text":
"Annexure" , "Text":
"Terms & Conditions" , and "Payment Terms" must be present
in the document text in this order for the rule to match. It
is possible for other text to exist between these phrases,
but it is important that the order of these phrases is
consecutive, with one following the other.Note: Unless its very
clear that the expected sequence will follow a specific
pattern its recommended to keep this configuration as
false |
|
The |
Text |
Text -
specifies the phrase text value that needs to be looked up
against the document text |
IsNegativePhrase |
IsNegativePhrase - specifies whether the lookup
condition is a negative phrase type of lookup. When set true in the example, this will
mean that "Text": "Addendum" is
not present in the document text for
the rule to match. |
PhraseType |
PhraseType
specifies type of match that will be used when hrase text value
is looked up against the document text
|