...
Describe the solution:
The solution will consist of an upgraded FTS search index using Solr which will be additionally tuned to support the CJK requirements. The upgraded index will use the DRS indexing functions to support the updating and full text indexing of documents in the FTS search index. As part of this solution existing documents will need to be reindexed to support updated features within the Solr environment.The updated FTS search servlet will submit a search and return the results of a search submitted through PDS /Mirador or through an FTS form, in keeping with the current FTS functionality. The PDS Mirador will display the resulting search and the relevant PDS document pages, in keeping with the current FTS functionality. The API used for these functions will be kept as consistent as possible.
...
- Upgrade of the indexing engine
- Develop the DRS OCR batch import function
- Upgrade of the index / ingest process
- Includes re-indexing existing objects
- Update of the search APIs
- Update of PDS /Mirador to support new search solution
- Tuning of search index for CJK materials
...
Define how to measure “done”:
The overall project is considered done once the DRS CJK PDS documents can be indexed and searched in FTS, the results of the searches can be viewed in PDS/Mirador and FTS, and the collection of OCR for existing DRS CJK PDS Documents is successfully ingested into the DRS and is available for searching in FTS and display of results in PDS/Mirador and FTS.:
- Existing PDS documents are re-indexed with the CJK-enabled indexing engine
- Newly ingested PDS documents are indexed with the CJK-enabled indexing engine
- Full text search queries can be submitted through PDS and Full Text Search stand-alone forms
- Results of the full text searches are correctly displayed in PDS and FTS
- CJK OCR for the corpus of ancient Chinese texts (currently residing on Imaging Services servers) has been successfully bulk-added to existing PDS objects in DRS and these objects have been re-indexed
- FTS searches against these PDS objects are working with CJK and the correct results are being displayed in PDS and FTS
- FTS searches against these PDS objects are working with CJK and the correct results are being displayed in PDS and FTS
This project will iterate in phases. For each phase a separate definition of "done" will be defined in the project plan.
In Scope:
Support for CJK indexing and search in FTS, delivery of results to PDS/Mirador.
Enhancement to support bulk-adding CJK OCR to existing PDS objects in the DRS
Out of Scope (for medium and large projects):
Support for indexing and search in FTS for other languages; support for bulk-adding any other metadata or content to the DRS; any UI changes or enhancements to current FTS or PDS/Mirador.
III. Stakeholders/People
Who is the work being done for? (Sponsor)
...
DRS – Digital Repository Service
FTS - Full Text SearchMirador – Software used by LTS for delivery and presentation of Page Turned Objects
OCR – Optical Character Recognition
...