资 源 简 介
Researchers working with data from third parties often can only obtain this data in PDF format. Tablaam will be a set of tools and a library for extracting tabular data from PDF documents.
This project is in incubation.
|Programming language: | Java |
|:---------------------|:-----|
|Licensing: | Apache 2.0 |
|Build environment: | Maven |
|Supported project-development IDE: | Eclipse |
|Repository: | Git |
Two tools are envisioned:
an IDE for developing, debugging, and executing mappings from PDF documents to data output files.
a PDF browser/explorer to assist in developing said mappings - providing easy access to attributes of fragments of interest like font, page coordinates, etc.
It is unclear whether these will be thick or thin-client. Some people are afraid to download a thick client, for fear of malware. A thin client has potential disadvantages too, such as speed, server a