Table of Contents
- Embargoed Content
- Navigating Articles and Pages
- Manipulating Articles and Pages
- Premium Accounts
- Technical Matters
The CDNC has partnered with companies to allow them to digitize microfilm in our California Newspaper Microfilm Archive (CNMA). The companies cover the costs of digitization and provide copies of the data they produce to the CDNC. During an "embargo period" the images the company produced are only available to view on the company's website. The CDNC does, however, index and make searchable the computer-generated text.
By default, searching embargoed content is disabled in the CDNC. To enable it, click on “Search” and then click on “Availability: Only available”, which will disappear from the screen:
Or click on the three horizontal lines to the right of the search bar:
And then un-select “Search only issues that are available to view” from the pop-up:
See the sections below for more information on searching. Search results for embargoed content will have an image of a lock . If you click on a search result, you will get a grey screen with a note on when the embargoed content will become available in the CDNC and the URL to visit in the meantime.
You can perform a simple search by typing keywords in the search box on the home page and clicking the magnifying glass symbol . The search engine will return results that include all of your search terms.
You can search for an exact phrase by placing quotation marks around your search terms. For example the search "new plymouth" looks like this:
You can use Boolean operators AND, OR and NOT to refine your search results. AND (include all of the words) and NOT (without the words) narrow your search; OR (with at least one of the words) broadens your search. For example, “plymouth NOT new” will retrieve articles about Plymouth but not New Plymouth.
To access the advanced search feature click on “Search” at the top-right of the page and then on the three horizontal lines to the right of the search bar, see above.
By default, Veridian searches all publications. To search a specific title, highlight it in the list. To highlight more than one title, hold down the "Shift" or "Control" keys to highlight a range or selection of titles.
In addition to searching by exact phrase, Boolean operators, or specific titles, Advanced Search allows you to limit your search by:
A date range: select the beginning and end Day, Month and Year.
Headlines: by default Veridian searches all text. To search only headlines, select “Article headlines” under “Search within”.
Comments and Tags: to search within comments and tags other users have contributed, click either option.
By default Veridian returns a short summary of the text around the terms you searched for. Within the "Advanced Search" pop-up you can display a preview image instead of text after each result by selecting “Images”, or have no summary at all by selecting “None”:
After clicking "Search", you can refine your search by "Publication", "Category", "Decade" or "Word count" by choosing one or more options (or filters) listed along the left-hand side.
Selected filters used to refine a search appear in the upper left-hand corner in orange rectangles. To remove a filter click on it, and to remove all filters click "Clear all".
You can browse by title or date by clicking on “Browse” in the upper-right hand corner and selecting the "Titles" or "Dates" option in the pull-down menu.
Navigating Articles and Pages
Once you have selected a result from "Search Results" you will be directed to the newspaper page with the search term highlighted.
Directly to the left of the newspaper is a "contents" pane with two tabs: "Issue " and "Article ". Clicking on the "Issue" tab displays either the entire contents, or the pages for an issue. Clicking on "Article" displays the text for the highlighted article. You can minimize this "contents" pane by clicking on the arrow in the separator bar to the right of the "contents" pane. The separator bar can also be dragged left or right, to increase or decrease the size of the adjoining panes.
Directly above the newspaper pages displayed in the "image viewer" is a row with several navigating options:
|Allows you to go back to the previous issue in the title, see all issues of the title, or advance to the next issue.|
|Allows you to go back to the previous result in your search results list, see the entire list of search results, or advance to the next search result.|
|Enlarges the image viewer by removing the green header area, and the footer.|
In the upper right-hand side of the newspaper page image is an icon with three symbols:
|The magnifying glasses zoom in or zoom out on the page. Clicking on the symbol of the scissor opens the clipping tool. See the video below on how to use this clipping feature.|
While hovering over the newspaper a "hand" cursor is displayed. Click and drag the newspaper image to move the page around. Double-clicking anywhere on the newspaper will place the area clicked at the center of the image viewer.
Right-clicking on an article or page will pop-up an options pane:
"Zoom to this article" will enlarge and center the article.
"Clip this article" will move the article to its own separate page for easy viewing.
"Text of this article" will display just the text without the associated image.
"Correct article text" allows you to correct the OCR text, see below.
"Add to private list" allows you to store the article in a user-defined list (premium users).
"PDF of this page" will download a PDF file of the page.
"JPG of this page" will display a high-resolution image of the entire page (premium users).
"Text of this page" will display the text of the entire page without the associated image.
"Correct page text" allows you to correct the OCR text, see below.
"Add to private list" allows you to store the page in a user-defined list (premium users).
"Report page problem" allows you to e-mail the CDNC maintainers about page quality problems.
Note: If you are viewing an issue that has been scanned at the page rather than the article level, you will only see the second part of the menu. See the Technical Matters section for more explanation of page and article level segmentation.
Manipulating Articles and Pages
There are several ways to print an article or page from the options pane. A PDF downloaded using the directions above can be printed. Selecting "Text of this article" or "Text of this page" will allow you to print just the text without the accompanying image using the print option in your browser. Selecting "Clip this article" will allow you to print out an image of the article using the print option in your browser (Note: many articles, particularly long ones, are actually composed of multiple images, each of which might print on a separate page).
Every newspaper page in the CDNC is comprised of an image, and of text associated with that image. Newspaper copy is generated into searchable text using Optical Character Recognition (OCR) software (see below for more detail). Computers often have a hard time reading newspaper print, particularly for papers printed before 1900. The User Text Correction (UTC) feature in the CDNC allows users to correct text that the computer could not properly identify.
To correct text within the CDNC, you must first register and create an account. To register, click on the image in the upper-right-hand corner and then on Register. Upon registration, a verification email will be sent to the email address you provided. After verifying your email address, you'll be able to log-in to the CDNC and correct OCR text.
After registration is complete, there are two ways to access the UTC tool. After you've found the newspaper you would like to correct and it is displayed on the screen, you can start correcting text by:
Clicking on the text you want to correct. This will display it in the "contents pane" to the left of the "image viewer". Click on "Correct this text" in the "contents pane".
Right-clicking on the text you want to correct and selecting "Correct article text" or "Correct page text" from the options pop-up window.
Every line of text in the article will have a corresponding line in the text correction pane. While making corrections, the line in the article is outlined within a red box. Once you are finished making your corrections, click on "Save" or "Save & exit".
The text you've corrected is saved within Veridian, and is now searchable by other users.
It is currently not possible to add a line of text if a line does not already exist. We hope to add this capability in the future.
Optical Character Recognition
Optical Character Recognition, or OCR, is a process by which software reads a page image and translates it into a text file by recognizing the shapes of the letters.
OCR enables searching of large quantities of full-text data, but it is never 100% accurate. The level of accuracy depends on the print quality of the original issue, its condition at the time of microfilming, the level of detail captured by the microfilm scanner, and the quality of the OCR software. Issues with poor quality paper, small print, mixed fonts, multiple column layouts, or damaged pages may have poor OCR accuracy.
To look at the OCR text, choose a page or article and select "Text of this page," or "Text of this article" from the pop-up pane.
The physical layout or structure of newspapers makes them complicated objects to digitize. Each page usually has numerous columns, often with multiple articles per column. Furthermore, articles sometimes run across pages. Segmentation refers to the degree in which this structural complexity is captured in the digitized representation of the printed newspaper.
Page-level segmentation offers access to each page of an issue, but not to any components within a page such as articles, headlines, advertisements, birth and death notices, etc. For each page there is an image and text associated with that page image. Each word contains X and Y coordinates that connect the word to a location on the image and thus allow for highlighting on the image. In the User Text Correction (UTC) tool, text for the entire page is displayed. Almost all post-1923 newspapers in the CDNC are segmented at the page level.
Article-level segmentation offers access to components within a page such as individual articles, headlines, and advertisements. In addition to the image, and text and word coordinates found in page-level data, article-level metadata defines the locations of components and their relationships to one another (for example, when an article extends across several pages, or when a headline is connected to a specific article or articles). Among other things, article-level segmentation often makes it easier to identify the location of a search term on a page, particularly when that term appears numerous times. In the UTC tool, text for a specific component of the page is displayed. Almost all pre-1923 newspapers in the CDNC are segmented at the article level.
Hardware and Software Requirements
In general, you only need a modern web browser like Firefox, Safari, Chrome, or Internet Explorer to search and browse this collection. To view or print downloaded PDFs, you will also need a PDF viewer like Adobe Reader.