|California Digital Newspaper Collection > Help|
The CDNC has partnered with companies to allow them to digitize microfilm in our California Newspaper Microfilm Archive (CNMA). The companies cover the costs of digitization and provide copies of the data they produce to the CDNC. During an "embargo period" the images the company produced are only available to view on the company's website. The CDNC does, however, index and make searchable the computer-generated text.
By default, searching embargoed content is disabled in the CDNC. To enable it, go to "Advanced Search" and remove the checkmark next to "Search only issues that are available to view".
See the sections below for more information on searching. Search results for embargoed content will have an image of a lock . If you click on a search result, you will get a grey screen with a note on when the embargoed content will become available in the CDNC and the URL to visit in the meantime.
You can perform a simple search by typing keywords in the search box on the home page and clicking "Search". The search engine will return results that include all of your search terms.
You can search for an exact phrase by placing quotation marks around your search terms. For example the search "new plymouth" looks like this:
You can use Boolean operators AND, OR and NOT to refine your search results. AND (include all of the words) and NOT (without the words) narrow your search; OR (with at least one of the words) broadens your search. For example, plymouth NOT new will retrieve articles about Plymouth but not New Plymouth.
To access the advanced search feature click on the Search tab at the top-left of the page and then the Advanced Search tab:
By default, Veridian searches all publications. To search a specific title, highlight it in the list. To highlight more than one title, hold down the "Shift" or "Control" keys to highlight a range or selection of titles.
In addition to searching by exact phrase, Boolean operators, or specific titles (like in Basic Search), Advanced Search allows you to limit your search by:
By default Veridian returns 20 results per page, and a short summary of the text around the terms you searched for. Within the "Advanced Search" tab you can:
After clicking "Search", you can refine your search by "Publication", "Category", "Decade" or "Word count" by choosing one or more options (or filters) listed along the left hand side under "Refine search".
Selected filters used to refine a search appear in the upper left hand corner under "Search limited to". To remove a filter click on the "X" to its right, and to remove all filters click "Clear all".
You can browse by title or date by selecting the "Titles" or "Dates" tab at the top of the page.
Navigating Articles and Pages
Once you have selected a result from "Search Results" you will be directed to the newspaper page with the search term highlighted.
Directly to the left of the newspaper is a "contents" pane with two tabs: "Issue Contents" and "Article Text". Clicking on the "Issue Contents" tab displays either the entire contents, or the pages for an issue. Clicking on "Article Text" displays the text for the highlighted article. You can minimize this "contents" pane by clicking on the arrow in the separator bar to the right of the "contents" pane. The separator bar can also be dragged left or right, to increase or decrease the size of the adjoining panes.
Directly above the newspaper pages displayed in the "image viewer" is a row with several navigating options:
While hovering over the newspaper a "hand" cursor is displayed. Click and drag the newspaper image to move the page around. Double-clicking anywhere on the newspaper will place the area clicked at the center of the image viewer.
Right-clicking on an article or page will pop-up an options pane:
Note: If you are viewing an issue that has been scanned at the page rather than the article level, you will only see the second part of the menu. See the Technical Section for more explanation of page and article level segmentation.
Manipulating Articles and Pages
There are several ways to print an article or page from the options pane. A PDF downloaded using the directions above can be printed. Selecting "Text of this article" or "Text of this page" will allow you to print just the text without the accompanying image using the print option in your browser. Selecting "Clip this article" will allow you to print out an image of the article using the print option in your browser (Note: many articles, particularly long ones, are actually composed of multiple images, each of which might print on a separate page).
Every newspaper page in the CDNC is comprised of an image, and of text associated with that image. Newspaper copy is generated into searchable text using Optical Character Recognition (OCR) software (see below for more detail). Computers often have a hard time reading newspaper print, particularly for papers printed before 1900. The User Text Correction (UTC) feature in the CDNC allows users to correct text that the computer could not properly identify.
To correct text within the CDNC, you must first register and create an account. To register, click on Log In in the navigation bar and then on Register on the login page. Upon registration, a verification email will be sent to you. After verifying your email address, you'll be able to log-in to the CDNC and correct OCR text.
After registration is complete, there are two ways to access the UTC tool. After you've found the newspaper you would like to correct and it is displayed on the screen, you can start correcting text by:
Every line of text in the article will have a corresponding line in the text correction pane. While making corrections, the line in the article is outlined within a red box. Once you are finished making your corrections, click on "Save" or "Save & exit".
The text you've corrected is saved within Veridian, and is now searchable by other users.
It is currently not possible to add a line of text if a line does not already exist. We hope to add this capability in the future.
For an annual fee, users get access to the following features not available to free accounts:
Download High-Resolution Images
Free accounts can download PDFs. Premium accounts can download high-resolution images suitable for reproduction in print or digital publications.
Right-click on an article or page to reveal the pop-up pane (see above) and select "JPG of this page". You will be redirected to an image of the page. Right-click on the image, select "Save image as", and save the image to your computer. You can now open the saved image with an image viewer/editor.
Recently Viewed Articles
Click on "My Account" on the toolbar at the top of the page and then click on the "Recent Activity" tab. The top pane displays the 10 most recent articles or pages you've viewed.
Click on "My Account" on the toolbar at the top of the page and then click on the "Recent Activity" tab. The second pane, "Recently searched," displays your most recent 10 searches.
Private lists allow you to save articles or pages to uniquely-named lists.
Right-click on an article or page to reveal the pop-up pane (see above) and select "Add to private list". A pop-up will appear. If you don't have any lists created, or want to create a new one, click on "New private list", type in the name of the list, add a note about the article or page you're saving (if you'd like), and then click "Add". To use an existing list, make sure "Existing private list" is selected, choose the list from the pull-down menu, add an optional note, and click "Add".
To see the lists you've created, and articles/pages you've saved to lists, click on "My Account" on the toolbar at the top of the page and then click on the "Private Lists" tab. You can then delete, email or rename lists, and add/edit notes to entries or move or remove entries.
Optical Character Recognition
Optical Character Recognition, or OCR, is a process by which software reads a page image and translates it into a text file by recognizing the shapes of the letters.
OCR enables searching of large quantities of full-text data, but it is never 100% accurate. The level of accuracy depends on the print quality of the original issue, its condition at the time of microfilming, the level of detail captured by the microfilm scanner, and the quality of the OCR software. Issues with poor quality paper, small print, mixed fonts, multiple column layouts, or damaged pages may have poor OCR accuracy.
To look at the OCR text, choose a page or article and select "Text of this page," or "Text of this article" from the pop-up pane.
The physical layout or structure of newspapers makes them complicated objects to digitize. Each page usually has numerous columns, often with multiple articles per column. Furthermore, articles sometimes run across pages. Segmentation refers to the degree in which this structural complexity is captured in the digitized representation of the printed newspaper.
Page-level segmentation offers access to each page of an issue, but not to any components within a page such as articles, headlines, advertisements, birth and death notices, etc. For each page there is an image and text associated with that page image. Each word contains X and Y coordinates that connect the word to a location on the image and thus allow for highlighting on the image. In the User Text Correction (UTC) tool, text for the entire page is displayed. Almost all post-1923 newspapers in the CDNC are segmented at the page level.
Article-level segmentation offers access to components within a page such as individual articles, headlines, and advertisements. In addition to the image, and text and word coordinates found in page-level data, article-level metadata defines the locations of components and their relationships to one another (for example, when an article extends across several pages, or when a headline is connected to a specific article or articles). Among other things, article-level segmentation often makes it easier to identify the location of a search term on a page, particularly when that term appears numerous times. In the UTC tool, text for a specific component of the page is displayed. Almost all pre-1923 newspapers in the CDNC are segmented at the article level.
Hardware and Software Requirements
In general, you only need a modern web browser like Firefox, Safari, Chrome, or Internet Explorer to search and browse this collection. To view or print downloaded PDFs, you will also need a PDF viewer like Adobe Reader.