资 源 简 介
Hunpos is an open source reimplementation of TnT, the well known part-of-speech tagger by Thorsten Brants.
Features
Free and open source, even for commercial use.
For languages with more complex morphologies, HMM tagging could be quite competitive with the current generation of learning algorithms applying e.g. SVM and CRF methods. A major advantage is that the training/tagging cycle is orders of magnitude faster than in more complex models.
Precision of tagging on unknown and unseen words was a major priority for us during the development of hunpos.
Works smoothly with large tag sets. For example in Hungarian, as in other highly inflecting languages, it is important to preserve detailed morphological information in the POS tags in order to provide useful clues for higher level processing tasks. This leads to a significantly larger tagset than is common in English (744 tags here as opposed to the 36 stan