资 源 简 介
The Internet Archive is a 501(c)(3) non-profit Internet library,
offering permanent access for researchers, historians, and scholars to
digital-format historical collections. The Internet Archive is best
known for its "Wayback Machine" access to over 10 years" of public web
site archives; its leading role in the Open Content Alliance mass book
digitization effort; and its free audio and video collections, including
thousands of live music shows.
In partnership with libraries around the world (http://netpreserve.org),
the Internet Archive"s web group has developed open source software in
Java to help organizations build their own web archives, including the
Heritrix crawler, the Wayback archive browser, and NutchWAX tools for
using Nutch/Lucene for web archive full text search.