资 源 简 介
应用背景开始是做movielense的数据,可以把原始数据任意划分为train.dat和test.dat,主要是为了做验证实验。非常简单明了,适合初学者看看,如果不喜欢,请轻喷。关键技术# -*- coding: cp936 -*-
from sklearn import cross_validation
c = []
filename = r"Raw.data" #原始数据
out_train = open(r"train.txt","w") #训练集
out_test = open(r"test.txt","w") #测试集
for line in open(filename):
items = line.strip().split(",")
c.append(items)
c_train,c_test = cross_validation.train_test_split(c,test_size=0.1)#size =你需要的比例
for i in c_train:
out_train.write(",".join(i)+"
")
for i in c_test:
out_test.write(",".join(i)+"
")