资 源 简 介
Light Crawler
An Open Source Crawler for Java. Feature of LightCrawler list down below:
LightCrawler can control the depth of the crawler. Crawler will stop at the pointed depth.
LightCrawler is also Multi-Threads, Easily and Quickly to Build.
LightCrawler can choose which url should be crawled and which should not be crawled by configing forbidden regex or allowed regex.
LightCrawler can judge RSS Feed or HTML and choose the right parser automaticaly.
LightCrawler fetcher can extract Title, Language, Encoding, ContentType, Md5, SimHash FingerPrint. That is important information for user.
Fetch queue stored in memory. it"s fast in writing and reading.
Example
Single thread use:
```
String start_url = "http://www.###.c