资 源 简 介
disfluency-detection-tool.tar.gz
This is an off the shelf disfluency detector that takes one sentence per line in a input file, and output the POS tags and disfluency tags of each word. Read README for installation and running.
naaclcode-v2.tar.gz
The second version corrected the data preprocessing part in the first version. Now the data preprocessing procedure is exactly the same as Mark Johnson"s implementation (1). (All words are in lower case, partial words are removed i.e., words whose POS tags are XX, or end in -, e.g., detectio- , spac-).
Now the performance is 0.8344 F score on development dataset and 0.8383 F score on test dataset.
The author would like to thank Matthew Honnibal and Haofeng Zhou for pointing the issues of previous version and their valuable suggestions.
(1) Simon Zwarts; Mark Johnson, The impact of language models and loss functions on repair disfluency detection, in P
文 件 列 表
attach_pos.cpp
corpus_convert.cpp
corpus_convert_FP.cpp
create_new_table.h
get_pos.cpp
install.sh
job.sh
loss.txt
m3n
fun.h
m3n.cpp
main.cpp
freelist.h
fun.cpp
m3n.h
template.step2
template.step3
Makefile
model.fp
model.pos
model.step2
model.step3
pocket_crf
fun.h
crf.h
dat.h
main.cpp
lbfgs.cpp
freelist.h
thread.h
lbfgs.h
crf_thread.cpp
fun.cpp
template.fp
crf.cpp
Makefile
crf_thread.h
template.pos
post_convert.cpp
pre_convert.cpp
punc_model
corpus_convert.cpp
model.fp
model.pos
create_new_table.h
model.step3
corpus_convert_FP.cpp
model.step2
README
real_model
corpus_convert.cpp
model.fp
model.pos
create_new_table.h
model.step3
corpus_convert_FP.cpp
model.step2
replace_test_pos.cpp
split_data.cpp
test.txt
tokenize.cpp