首页| JavaScript| HTML/CSS| Matlab| PHP| Python| Java| C/C++/VC++| C#| ASP| 其他|
购买积分 购买会员 激活码充值

您现在的位置是:虫虫源码 > 其他 > PWA保留后代今天的知识。

PWA保留后代今天的知识。

  • 资源大小:104.98 kB
  • 上传时间:2021-06-30
  • 下载次数:0次
  • 浏览次数:1次
  • 资源积分:1积分
  • 标      签: pwa 保留 知识 后代 今天

资 源 简 介

The Portuguese Web Archive (PWA) main goal is the preservation and access of web contents that are no longer available online. During the developing of the PWA IR (information retrieval) system we faced limitations in searching speed, quality of results, scalability and usability. To cope with this, we modified the archive-access project (http://archive-access.sourceforge.net/) to support our web archive IR requirements. Nutchwax, Nutch and Wayback’s code were adapted to meet the requirements. Several optimizations were added, such as simplifications in the way document versions are searched and several bottlenecks were resolved. The PWA search engine is a public service at http://archive.pt and a research platform for web archiving. As it predecessor Nutch, it runs over Hadoop clusters for distributed computing following the map-reduce paradigm. Its major features include fast full-text search, URL search, phrase search, faceted search (date, format, site), and sort

相 关 资 源

您 可 能 感 兴 趣 的

同 类 别 推 荐

VIP VIP