Project Home
Project Overview
Invitation to Service Providers
FAQs
Service Provider Test Sites
Evaluation of Test Sites
Recommendations for Phase 2
 

Recommendations for Phase 2:
Selection of Service Providers and Funding (July 04 - June 05) 

 

Phase 1 of the California Newspaper Digitization Project had five goals:

  1. Determine the amount of information able to be captured digitally from existing microfilm.
    Conclusion
    : While the quality of capture varies among service providers, much of existing microfilm appears to be adequate as a source for digital capture.
  2. Document the strengths and limitations of available search and retrieval software.
    Conclusion: While comparisons among products were frustrated by the absence of a common database (see Recommendations on performance, point 1, below), reviewers felt that the results were sufficiently valuable to justify undertaking a project to create online access to historical California newspapers.
  3. Compare benefits and costs among service providers.
    Conclusion: Features varied widely as did costs; however, because only a minority of products were considered usable without further development, or with minor further development, cost options were narrowed to a relative few.
  4. Estimate production requirements and costs for a one million page database.
    Conclusion: Folding in costs for project operations, but excluding institutional indirect costs, the project can be estimated at $1.5-3 million, depending on content management system/service provider. Production requirements were not addressed in detail.
  5. Create a publicly accessible website to report the findings of the study and to showcase the benefits of online access to historical newspapers.
    Conclusion: the California Newspaper Digitization Project located at http://cpc.stanford.edu/cndp. The Project very much has achieved its goal to showcase the benefits of online access to historical newspapers, and continues to achieve as more visitors explore the site. Many thanks are extended to the service providers who participated in the Project and helped create a broad base of public and professional support for this type of information resource.

Recommendations on features

  1. Select a system with careful attention to both "essential" and "important" features identified in the Phase 1 evaluation.
  2. Ensure that the system has a search feature "select publication" when the database consists of several-or hundreds-of newspaper titles.
  3. Explore a capacity to export machine-readable text files of desired articles if text images can be ocr'd with reasonable accuracy.

Recommendations on performance

  1. Require that all respondents to the RFP digitize the entire roll of test microfilm. Side-by-side comparisons of products in terms of image capture, ocr, and retrieval were rendered impossible in Phase 1 because not all service providers digitized the whole roll of test microfilm. To the best of the evaluation team's ability to tell, there wasn't even a single page digitized by all service providers in order to make visual comparisons, let along comparisons among search capabilities and results.
  2. With comparable databases from the RFP respondents:
    1. determine the best capture settings (e.g., resolution of film imaging and bit depth) for readability of the facsimile image by visual inspection of the display of a given page digitized by the different systems;
    2. determine ocr accuracy by comparing outputs among samples of ocr'd texts;
    3. compare search results across the several participating service providers for relative comprehensiveness and accuracy.
  3. Ascertain from the service providers that the highly desirable short response times demonstrated at most of the test sites will remain short as the database grows to accommodate full runs of titles and many titles.

Recommendations on cost elements

Some price estimates in response to Phase 1 were confidential, so specifics for individual service providers are not included on this website. However, some guidance can be derived from providers' several estimates, with full recognition that they indeed are estimates:

  1. For a one million page newspaper database project, price estimates for digitization and image processing (including ocr) ranged from about $400,000 to $2,600,000. Generally, the more specialized the search capability and sophistication of retrieval, the more expensive the digitization and image processing services.
  2. The RFP should be specific with regard to a need for hosting services. If hosting services on provider-owned hardware and software are required, some providers may choose not to respond to the RFP. Some service providers offered to host public access to the database only on client-purchased equipment, others offered only digitization and content management software, and yet others offered a service to procure the appropriate hardware and train local staff on systems administration rather than host themselves.
  3. Much additional exploration needs to be undertaken relative to preservation repository services for the database. There may prove to be few preservation providers to evaluate relative to conversion providers; the nature of preservation services beyond security, back up, and periodic copying from medium to medium, need to be clearly understood.