Digitization Guidelines
Specifications for digital files derived from text-based materials (print, microfilm, microfiche) for SAOA's digital collections.
These are SAOA's technical guidelines for digital files derived from text-based materials (in print, microfilm, or microfiche) to be included in SAOA's digital collections. Digitization providers (commercial entities as well as academic institutions) will be expected to conform to these specifications to ensure consistency of the digital materials for ingest into the SAOA digital asset management system.
The following are the ideal specifications for ingesting image-based material into SAOA's collections.
At the outset of each project, the SAOA Project Manager will schedule a phone consultation with the digitization provider...
- Estimates of the total number of images, volumes, and file size.
- Details regarding the condition of the source material.
- Descriptive Metadata – the metadata should:
- Use Dublin Core or MARC21.
- Be in MARC XML or CSV format.
- Conform to SAOA’s metadata templates.
- Include accurate holdings for serials or multipart titles.
- Be submitted as a sample set during the proposal phase.
NOTE: Forum entries must use UTF-8 encoding, which SAOA’s hosting platform defaults to.
- Structural Metadata – must support file organization and navigation (e.g., by chapter).
- Asset File Types:
- TIFF for preservation
- JPEG, JPEG2000, or PDF for access
- OCR files (recommended):
- .txt
- OCR XML or HOCR
- Image Capture:
- TIFF Master Files:
- 400–600 ppi resolution
- Uncompressed TIFF 6.0
- JHOVE-valid format
- 24-bit color; 8-bit grayscale acceptable in some cases
- One page per image
- JPEG/JP2/PDF Access Files:
- Match or reduce resolution (minimum 300 ppi)
- Compression: 10:1 to 15:1
- File size: 0.5–2.5 MB
- Image Quality:
- Good tone distribution
- Sharp, true-to-original rendering
- Deskewed and cropped to text
- TIFF Master Files:
- File Naming:
- Monographs (Single Volume)
- Format: titleID_YEAR_seq#.tif
- Example: 986786411_1915_00135.tif
- Monographs (Multi-Volume)
- Format: titleID_YEAR_VOL_seq#.tif
- Example: 990512780_1918_003_00115.tif
- Serials
- Numbered Issues:
- Format: titleID_YEAR_VOL_ISSUE_seq#.tif
- Example: 990312980_1915_002_001_00253.tif
- Dated Issues:
- Format: titleID_YYYY-MM-DD_seq#.tif
- Example: 22123199_1921-12-24_00012.tif
- Quarterly Issues:
- Format: titleID_YEAR_Quarter_seq#.tif
- Example: 226114808_1895_Spring_00005.tif
- Numbered Issues:
- General Notes:
- Master and access files must share the same base filename.
- Use OCLC# as title ID.
- Use 5-digit sequence numbers and 3-digit volume/issue codes.
- Monographs (Single Volume)
- Folders:
- For each title, create two folders:
- OCLC#_ShortTitle_TIFF
- OCLC#_ShortTitle_JP2
- Monograph files go into the main folders.
- Serials should use subfolders labeled by vol#/issue#/year.
- Access and master files must match names exactly (only extensions differ).
- For each title, create two folders:
- File Transfer:
- Accepted transfer methods: hard drive, USB, FTP, Dropbox, Google Drive, CD.
Updated on October 16, 2020