资 源 简 介
MalGen
Is a set of scripts which generate large, distributed data sets suitable for testing and benchmarking software designed to perform parallel processing on large data sets. The data sets can be thought of as site-entity log files. After an initial seeding, the scripts allow for the data generation to be initiated from a single central node to run the generation concurrently on multiple remote nodes of the cluster.
The data generated follows certain statistical distributions which we believe presents a usable model for such logs.
There are two intended uses for MalGen
1. is to generate a large, possibly distributed, data set for use with analytics.
1. is to generate data for use with benchmarking algorithms or applications.
With the first use, records are generated probabilistically and extra records may be produced so that the entire data set follows the specified distribution. With the second use