

The specificity refers to the rarity of a word in a corpus, if a word appears in a significant number of documents, it is therefore less representative of a particular subject or document than a word that appears only in that document or corpus.

At present, the importance of a word in a document should not be considered as its frequency of occurrence only, but should be weighted by an indicator if the word is common or rare in all documents.

The relevance of a term is increased according to its rarity within the BigSea Wide Corpus

Thus, the presence of a rare term is synonymous with a “score” of high specificity.

In Moonfish, a color code is assigned by specificity interval as follows:

  • High specificity - color green

  • Medium specificity - color orange

  • Low specificity - color red

Related content

MoonFish FAQs
More like this
More like this