Specificity
The specificity refers to the rarity of a word in a corpus, if a word appears in a significant number of documents, it is therefore less representative of a particular subject or document than a word that appears only in that document or corpus.
At present, the importance of a word in a document should not be considered as its frequency of occurrence only, but should be weighted by an indicator if the word is common or rare in all documents.
The relevance of a term is increased according to its rarity within the BigSea Wide Corpus.
Thus, the presence of a rare term is synonymous with a “score” of high specificity.
In Moonfish, a color code is assigned by specificity interval as follows:
High specificity - color green
Medium specificity - color orange
Low specificity - color red