Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Updating FTS to enable CJK search will make it possible for researchers to ask exploratory questions of the data and get quick on-the-fly responses.  Full-text search means that these questions can be answered immediately, and so will be practical to investigate in many cases in which examining every page of a scanned text by hand to find out the answer would not be an option worth contemplating. Updating DRS to allow adding OCR to existing PDS objects will allow the OCR for ancient Chinese texts and OCR for many similar projects to enrich existing PDS documents in DRS with OCR as OCR technology for non-Latin languages matures and gets better. This will in turn dramatically increase the use of DRS collections as more researchers will be able to ask new questions about the data.

II.Vision and Approach

Describe the solution:
The solution will consist of an upgraded FTS search index using SOLR which will be additionally tuned to support the CJK requirements. The upgraded index will use the DRS indexing functions to support the updating and full text indexing of documents in the FTS search index. As part of this solution existing documents will need to be reindexed to support updated features within the SOLR environment.The updated FTS search servlet will submit a search and return the results of a search submitted through PDS/Mirador or through an FTS form, in keeping with the current FTS functionality. The PDS Mirador will display the resulting search and the relevant PDS document pages, in keeping with the current FTS functionality. The API used for these functions will be kept as consistent as possible.

...

Define how to measure “done”:


This project will iterate in phases. For each phase a separate definition of "done" will be defined in the project plan. The overall project is considered done once the DRS CJK PDS documents can be indexed and searched in FTS, the results of the searches can be viewed in PDS/Mirador and FTS, and the collection of OCR for existing DRS CJK PDS Documents is successfully ingested into the DRS and is available for searching in FTS and display of results in PDS/Mirador and FTS.

This project will iterate in phases. For each phase a separate definition of "done" will be defined in the project plan.

In Scope:


Support for CJK indexing and search in FTS, delivery of results to PDS/Mirador.

...