Set up rules for classifying documents or pages
- Updated: 2023/08/04
This topic describes about the capability to set up rules for classifying documents or pages.
Understanding rules and their usage
[
{
"DocumentTypeID": 0,
"Location": 0,
"Distance": 1,
"Score": 80,
"KBGuid": "00000000-0000-0000-0000-000000000000",
"IsEnabled": true,
"ExpectExactSequence": false,
"TextRulePhrases": [
{
"Text": "Annexure",
"IsNegativePhrase": false,
"PhraseType": 1
}
}
]Rules are useful when additional guidance is needed to enhance the accuracy of a classification model in determining the most relevant document category. While technically it is possible to do all classification using a rules, it is not the best practice as the management of rules configuration becomes a significant overhead overtime especially when dealing with large number of categories.
Example of a rule file
[
{
"DocumentTypeID": 0,
"Location": 1,
"Distance": 3,
"Score": 90,
"KBGuid": "00000000-0000-0000-0000-000000000000",
"IsEnabled": true,
"ExpectExactSequence": true,
"TextRulePhrases": [
{
"Text": "Annexure",
"IsNegativePhrase": false,
"PhraseType": 1
},
{
"Text": "Terms & Conditions",
"IsNegativePhrase": false,
"PhraseType": 1
},
{
"Text": "Payment Terms",
"IsNegativePhrase": false,
"PhraseType": 1
}
]
},
{
"DocumentTypeID": 2,
"Location": 2,
"Distance": 1,
"Score": 95,
"KBGuid": "00000000-0000-0000-0000-000000000000",
"IsEnabled": true,
"ExpectExactSequence": false,
"TextRulePhrases": [
{
"Text": "Addendum",
"IsNegativePhrase": true,
"PhraseType": 5
}
]
}
]
Configurable properties of a rule file
| Configuration | Description | 
|---|---|
| DocumentTypeID | Currently, this field is not supported. For
									any rule being setup it can be kept static text as 0. | 
| Location | This configuration specifies which location
										of the document text the rule applicable. The values can be
										0, 1, 2, or 3. 
 | 
| Distance | This configuration specifies the distance
										between phrases when the look-up is done on the document
										text. The rule will only match if the distance is as
										specified basis this configuration. The values can be 0, 1,
										2, or 3. 
 | 
| Score | After a rule match is performed, a score is assigned to the category (or training folder) associated with that rule. The score value can range from -100 to 100. | 
| KBGuid | Currently, this field is not supported. For
									any rule being setup it can be kept static text as 00000000-0000-0000-0000-000000000000 | 
| IsEnabled | This allows rule to be enabled or disabled by
										setting trueORfalserespectively. | 
| ExpectExactSequence | When looking up multiple phrases in a rule,
										this configuration specifies exact sequence based matching.
										For example, if set  truein
										the example,"Text":
											"Annexure","Text":
											"Terms & Conditions", and"Payment Terms"must be present
										in the document text in this order for the rule to match. It
										is possible for other text to exist between these phrases,
										but it is important that the order of these phrases is
										consecutive, with one following the other.Note: Unless its very
											clear that the expected sequence will follow a specific
											pattern its recommended to keep this configuration as
												 false | 
| 
 | The  | 
| Text | Text-
									specifies the phrase text value that needs to be looked up
									against the document text | 
| IsNegativePhrase | IsNegativePhrase- specifies whether the lookup
									condition is a negative phrase type of lookup. When settruein the example, this will
									mean that"Text": "Addendum"is
										not present in the document text for
									the rule to match. | 
| PhraseType | PhraseTypespecifies type of match that will be used when hrase text value
									is looked up against the document text
 
 
 |