资 源 简 介
**Media filter graph metaphor
* Workflow manager for parallel language data
* Configuration-driven, modular filters
* Reusable plug-in architecture
* Standardized base-classes**
Statistical machine translation SMT is growing from an academic novelty to a commercially viable capability. High quality parallel linguistic corpora drive SMT"s high quality translations. If you are looking to transform your existing asset of translation memories (and other parallel language data) into valuable training corpus that can drive new, accurate SMT operations, this tool is for you.
This tool box provides a common framework, reusable filtering interfaces and aligned document work-flow to manage the transformation of ad-hoc data in thousands of documents with millions of sentence pairs into an catalogued set of parallel language corpora. This common framework can manage the work-flow for any open-source NLP, such as sentence breaking, word segmentation (e.g. MeCab for Japanese text)