资 源 简 介
Introduction
This project is my personal one to research and develop the Java HTML parser.
There are a number of open-source HTML parsers developed by using JAVA. However, Most of those parsers cannot parse some web pages correctly because of an ambiguity of HTML syntax and some of the parsers are too heavy to use.
Developing a HTML parser is definitely differ from XML parser because HTML parser MUST solve and cover the ambiguous syntax by itself. For example, "br" tag is usually used only opened tag, but in the case of XML, you have to close the tag about opened element. This means that HTML parse tree can be built in various ways.
I have been developing the HTML parser, focusing on the high speed ,light weight to use and exact parsing(building DOM tree) by rule. This parser can be used on the Mobile device which has a low level of H/W power(support) and also used on the some S/W modules like IR engines which need rapid parsing module.
Feature