|California Digital Newspaper Collection > Help|
You can perform a simple search by typing keywords in the search box on the home page and clicking "Search". The search engine will return results that include all of your search terms.
You can search for an exact phrase by placing quotation marks around your search terms. For example the search "new plymouth" looks like this:
You can use Boolean operators AND, OR and NOT to refine your search results. AND (include all of the words) and NOT (without the words) narrow your search; OR (with at least one of the words) broadens your search. For example, plymouth NOT new will retrieve articles about Plymouth but not New Plymouth.
To access the advanced search feature click on the Search tab at the top-left of the page and then the Advanced Search tab:
By default, Veridian searches all publications. To search a specific title, highlight it in the list. To highlight more than one title, hold down the "Shift" or "Control" keys to highlight a range or selection of titles.
In addition to searching by exact phrase, Boolean operators, or specific titles (like in Basic Search), Advanced Search allows you to limit your search by:
By default Veridian returns 20 results per page, and a short summary of the text around the terms you searched for. Within the "Advanced Search" tab you can:
After clicking "Search", you can refine your search by "Publication", "Category", "Decade" or "Word count" by choosing one or more options (or filters) listed along the left hand side under "Refine search".
Selected filters used to refine a search appear in the upper left hand corner under "Search limited to". To remove a filter click on the "X" to its right, and to remove all filters click "Clear all".
You can browse by title or date by selecting the "Titles" or "Dates" tab at the top of the page.
Navigating Articles and Pages
Once you have selected a result from "Search Results" you will be directed to the newspaper page with the search term highlighted.
Directly to the left of the newspaper is a "contents" pane with two tabs: "Issue Contents" and "Article Text". Clicking on the "Issue Contents" tab displays either the entire contents, or the pages for an issue. Clicking on "Article Text" displays the text for the highlighted article. You can minimize this "contents" pane by clicking on the arrow in the separator bar to the right of the "contents" pane. The separator bar can also be dragged left or right, to increase or decrease the size of the adjoining panes.
Directly above the newspaper pages displayed in the "image viewer" is a row with several navigating options:
While hovering over the newspaper a "hand" cursor is displayed. Click and drag the newspaper image to move the page around. Double-clicking anywhere on the newspaper will place the area clicked at the center of the image viewer.
Right-clicking on an article or page will pop-up an options pane:
Note: If you are viewing an issue that has been scanned at the page rather than the article level, you will only see "PDF of this page", "Text of this page", and "Correct page text". See the Technical Section for more explanation of page and article level segmentation.
Manipulating Articles and Pages
There are several ways to print an article or page from the options pane. A PDF downloaded using the directions above can be printed. Selecting "Text of this article" or "Text of this page" will allow you to print just the text without the accompanying image using the print option in your browser. Selecting "Clip this article" will allow you to print out an image of the article using the print option in your browser (Note: many articles, particularly long ones, are actually composed of multiple images, each of which might print on a separate page).
Every newspaper page in the CDNC is comprised of an image, and of text associated with that image. Newspaper copy is generated into searchable text using Optical Character Recognition (OCR) software (see below for more detail). Computers often have a hard time reading newspaper print, particularly for papers printed before 1900. The User Text Correction (UTC) feature in the CDNC allows users to correct text that the computer could not properly identify.
To correct text within the CDNC, you must first register and create an account. To register, click on Log In in the navigation bar and then on Register on the login page. Upon registration, a verification email will be sent to you. After verifying your email address, you'll be able to log-in to the CDNC and correct OCR text.
After registration is complete, there are two ways to access the UTC tool. After you've found the newspaper you would like to correct and it is displayed on the screen, you can start correcting text by:
Every line of text in the article will have a corresponding line in the text correction pane. While making corrections, the line in the article is outlined within a red box. Once you are finished making your corrections, click on "Save" or "Save & exit".
The text you've corrected is saved within Veridian, and is now searchable by other users.
It is currently not possible to add a line of text if a line does not already exist. We hope to add this capability in the future.
Optical Character Recognition
Optical Character Recognition, or OCR, is a process by which software reads a page image and translates it into a text file by recognizing the shapes of the letters.
OCR enables searching of large quantities of full-text data, but it is never 100% accurate. The level of accuracy depends on the print quality of the original issue, its condition at the time of microfilming, the level of detail captured by the microfilm scanner, and the quality of the OCR software. Issues with poor quality paper, small print, mixed fonts, multiple column layouts, or damaged pages may have poor OCR accuracy.
To look at the OCR text, choose a page or article and select "Text of this page," or "Text of this article" from the pop-up pane.
The physical layout or structure of newspapers makes them complicated objects to digitize. Each page usually has numerous columns, often with multiple articles per column. Furthermore, articles sometimes run across pages. Segmentation refers to the degree in which this structural complexity is captured in the digitized representation of the printed newspaper.
Page-level segmentation offers access to each page of an issue, but not to any components within a page such as articles, headlines, advertisements, birth and death notices, etc. For each page there is an image and text associated with that page image. Each word contains X and Y coordinates that connect the word to a location on the image and thus allow for highlighting on the image. In the User Text Correction (UTC) tool, text for the entire page is displayed. Almost all post-1923 newspapers in the CDNC are segmented at the page level.
Article-level segmentation offers access to components within a page such as individual articles, headlines, and advertisements. In addition to the image, and text and word coordinates found in page-level data, article-level metadata defines the locations of components and their relationships to one another (for example, when an article extends across several pages, or when a headline is connected to a specific article or articles). Among other things, article-level segmentation often makes it easier to identify the location of a search term on a page, particularly when that term appears numerous times. In the UTC tool, text for a specific component of the page is displayed. Almost all pre-1923 newspapers in the CDNC are segmented at the article level.
Hardware and Software Requirements
In general, you only need a modern web browser like Firefox, Safari, Chrome, or Internet Explorer to search and browse this collection. To view or print downloaded PDFs, you will also need a PDF viewer like Adobe Reader.