资 源 简 介
The MIT Language Modeling (MITLM) toolkit is a set of tools designed for the efficient estimation of statistical n-gram language models involving iterative parameter estimation. It achieves much of its efficiency through the use of a compact vector representation of n-grams. Details of the data structure and associated algorithms can be found in the following paper.
Bo-June (Paul) Hsu and James Glass. Iterative Language Model Estimation: Efficient Data Structure & Algorithms. In Proc. Interspeech, 2008.
Currently, MITLM supports the following features:
Smoothing: Modified Kneser-Ney, Kneser-Ney, maximum likelihood
Interpolation: Linear interpolation, count merging, generalized linear interpolation
Evaluation: Perplexity
File formats: ARPA, binary, gzip, bz2
MITLM is available for