Digital Standard 2 – Digital capture & format

Download Digital standard 2: Digital capture and format  (PDF 340.1 KB)

1. Introduction

This document details the State Library of Queensland's standards for the capture of digital objects to meet access and preservation requirements. The standard covers image (photographs, manuscripts, music scores), audio and video files and outlines the file formats for each type of digital object...
The State Library recognises that the application of standards in line with international best practice is critical to the successful implementation and sustainability of digital projects. Digital information is fragile in ways that differ from books or other printed information. Digital files are more easily corrupted or altered without realisation. Digital asset management - long term preservation and storage – must be considered at all stages of the life cycle of digital objects. Ensuring quality standards at the point of capture/creation is an important strategy to protect the State Library’s increasing investment in digitisation.
Other institutions have done a great deal of work to define capture standards for digital objects. The State Library acknowledges, and draws on this body of work. Documents were reviewed from projects such as American Memory, Colorado Digital Project, National Libraries of Australia and New Zealand as well as the State Library of Victoria.

2. Related Documents

  • Directory & File Naming Conventions for Digital Objects
  • State Library of Queensland Digital Standard 1 - Metadata for digital objects and other specified resource types

3. Principles of quality digital objects

The State Library is working towards achieving the following principles for producing quality digital objects:

  1. A digital object will be selected and produced in a way that ensures it supports the Content Strategy.
  2. A digital object is persistent. It is the intention of the State Library that digital objects remain accessible over time despite changing technologies.
  3. An object is digitised from the original version or from the highest quality version available.
  4. An object is digitised in a format that supports intended current and likely future use, including long term preservation, and the creation of derivative copies that support those uses. Consequently, the digital object will be exchangeable across platforms, broadly accessible, and will either be digitised according to a recognised standard or best practice, or it will deviate from standards and practices only for well documented reasons.
  5. A digital object will be named with a persistent, unique identifier that conforms to a well-documented scheme. It will not be named with reference to its absolute file name of address (e.g. as with URLs and other internet addresses) as file names and addresses have a tendency to change.
  6. A digital object can be authenticated. A client should be able to determine that the object is what it purports to be.
  7. A digital object will have, and be associated with, metadata. All digital objects will have descriptive, administrative and technical metadata. Additionally, metadata that supplies information about external relationships to other objects (e.g. the structural metadata that determines how page images from a digitised manuscript or diary relate to one another in some sequence) will be included.

4. File formats

File formats for the State Library’s digital objects have been selected based on such factors as industry standards, sustainability, quality and functionality. Format is particularly significant when capturing a master or archival file. The State Library intends to capture digital objects once, and at the highest quality possible for long term preservation and sustainability.
The following digital formats have been chosen by the State Library of Queensland. (More information on digital formats is available from the National Digital Information Infrastructure and Preservation Program (NDIIPP) at http://www.digitalpreservation.gov/formats.)

4.1. Images

4.1.1. TIFF
TIFF (Tagged Image File Format) 6.0 is used by the State Library as the uncompressed master archival file format for digital reproductions from paper and photographic media such as negatives.
TIFF was developed by Aldus and Microsoft Corp, and the specification was owned by Aldus, which in turn merged with Adobe Systems, Incorporated. Consequently, Adobe Systems now holds the copyright for the TIFF specification. Since it was designed for, and by, developers of printers, scanners and monitors, TIFF is highly flexible and platform-independent and is supported by numerous image processing applications.
TIFF has a wide distribution in the library digitisation industry. It is used as preservation format by the National Libraries of Australia and New Zealand, all Australian state libraries, Library of Congress and many other libraries who capture image files for preservation. TIFF is recommended by the Australian Government Information Management Office.

4.1.2. JPEG
JPEG (JFIF JPEG File Interchange Format) file format is used by the State Library to deliver images online. JPEG derivatives are produced from the TIFF masters.
JPEG is a standardized image compression mechanism. JPEG stands for Joint Photographic Experts Group, the name of the committee that developed the standard. JPEG is one of the most popular image formats used for storing and transmitting images on the Internet. The main reason behind this popularity is the extremely effective compression offered by the JPEG file formats. This compression enables people to quickly transmit (send or receive) image files over the Internet.
JPEG is a lossy compression method performed using discrete cosine transformation, where some data from the original picture is lost. Though the amount of compression depends upon the original image, ratios of 10:1 to 20:1 do not typically cause noticeable loss in the original image.

4.1.3. JPEG2000
JPEG2000 is an image coding system that uses state-of-the-art compression techniques based on wavelet technology. Its architecture lends itself to a wide range of uses from portable digital cameras to advanced pre-press, medical imaging and other key sectors. Compared to JPEG, JPEG2000 offers higher compression without compromising quality, progressive image reconstruction, lossy and lossless compression, and the JP2 file format (.jp2) is XML based metadata.

4.1.4. PDF (Portable Document Format)
PDF (Portable Document Format) is used by the State Library to provide a version of documents, e.g. music scores, whose primary purpose is for downloading and printing. PDF files ensure that documents designed for print retain the same layout and design elements as the original.
State Library is currently investigating preservation formats for PDF documents.
When providing PDF files the State Library follows the Queensland Government’s CUE standard, specifically Module 6: Non-HTML documents.

4.2. Audio

4.2.1. Broadcast WAVE Audio File Format, WAVE_LPCM_BWF
The State Library uses the Broadcast Wave Audio File Format for uncompressed master archival files of audio materials.
The broadcast WAVE file format was developed by the European Broadcast Union (EBU) in 1997 as a file format intended for the exchange of audio material between different broadcast environments and equipment. It is based on the Microsoft WAVE audio file format but adds a “Broadcast Audio Extension” chunk to hold additional metadata.
The Broadcast Wave File format has been widely adopted in both the recording industry and sound archives and is recommended by the Delivery Specifications Committee, Producer’s and Engineer’s Wing, National Academy of the Recording Arts and Sciences as the preferred format for the delivery of music recordings to record companies. It is also the file format for audio materials recommended by IASA (International Association of Sound Archives).
The format specifications and supplements to the format specifications are part of Tech3000 series of publications of the EBU and available at: http://tech.ebu.ch/docs/tech/tech3285.pdf

4.2.2. MP3
MP3 files are used by the State Library to provide compressed audio files for download from the internet.
MP3, or more correctly, MPEG -1 Layer 3, is both the name of a file extension and a type of file. The Moving Picture Experts Group specified three coding schemes for the compression of audio signals (layer1, layer 2, layer 3). Layer 3 (MP3) uses perceptual audio coding and psychoacoustic compression to remove some information from a sound signal. This lossy compression reduces the size of the audio file while maintaining good sound quality (although with detectable loss of fidelity), making it ideal for transmission over the internet.

4.2.3. WMA (Windows Media Audio)
WMA file format is used by the State Library to provide access to streamed audio.
Windows Media Audio (WMA) is a proprietary compressed audio file format developed by Microsoft. It is part of the Windows Media framework. Files in this format can be played using Windows Media Player, Winamp and many alternative media players. Windows Media Player is the default player with Windows applications and is thus widely available.
Further information about Windows Media Audio is available through the Windows Media website at:
https://support.microsoft.com/en-us/help/14209/get-windows-media-player

4.3. Video

AVI and Quicktime .mov are the current digital master/archival formats. SLQ is still investigating the validity of these formats and considering other available formats as a long term preservation master (archival) format

4.3.1. AVI
AVI (Audio Video Interleaved) is a file format for moving image content that wraps a video bitstream with other data chunks. The State Library of Queensland uses .avi as the video source file when producing streaming versions of video content. Uncompressed AVI of motion picture films scanned at 1920x1080 and below are regarded as digital masters and not preservation masters (archival).
More information on this file format is available at the Digital Preservation website at the Library of Congress at:http://www.digitalpreservation.gov/formats/fdd/fdd000059.shtml

4.3.2. MOV (QuickTime File Format)
The .mov multimedia container format was created specifically for Apple's Quicktime software and is sometimes referred to as the Apple QuickTime Movie Format.
The format specifies a multimedia container file that contains one or more tracks, each of which stores a particular type of data: audio, video, effects, or text (e.g. for subtitles). The ability to contain abstract data references for the media data, and the separation of the media data from the media offsets and the track edit lists means that QuickTime is particularly suited for editing, as it is capable of importing and editing in place (without data copying).
Uncompressed MOV of motion picture films scanned at 1920x1080 and below are regarded as digital masters and not preservation masters (archival).
More information on this file format is available at the Apple website at :http://developer.apple.com/library/mac/#documentation/QuickTime/QTFF/QTFFPreface/qtffPreface.html

4.3.3. WMV (Windows Media Video)
WMV file format is used by the State Library to provide access to streamed video.
Windows Media Video (WMV) is a proprietary compressed video file format developed by Microsoft. It is part of the Windows Media framework. Files in this format can be played using Windows Media Player, Winamp and many alternative media players. Windows Media Player is the default player with Windows applications and is thus widely available.
Further information about Windows Media Video and Windows Media Player is available through the Windows Media website at:https://support.microsoft.com/en-us/help/14209/get-windows-media-player

4.3.4. MPEG- 4 Version 2
The mp4 file format is the second MPEG-4 file format developed by the Motion Picture Experts Group (MPEG). It is a container format that can hold a mix of multimedia objects (audio, video, images, animations). This format is intended to serve web and other online applications; mobile devices, i.e., mobile phones and PDAs; and broadcasting and other professional applications.
More information about this file format is available at the National Digital Information Infrastructure and Preservation Program (NDIIPP), a collaborative project managed by the Library of Congress, at:
http://www.digitalpreservation.gov/formats/fdd/fdd000155.shtml

5. Capture standards - images

When capturing images from paper and photographic material, State Library of Queensland will produce a set of digital objects suitable for optimal viewing by clients.
Types of material from which images are produced include photographic copy prints, published works, negatives, music scores and archive and manuscript material.
The set of objects will be drawn from the following -

  1. TIFF file (archival image) – uncompressed master
  2. JPEG file (preview image) – display image with description on the web
  3. JPEG (research image) – larger display on the web
  4. JPEG (thumbnail image) – small display on the web
  5. JPEG 2000 (zoom image) – capability to zoom into image without loss of quality

The usage of each derivative is determined by format and is outlined in Appendix A. (See PDF (PDF 340.1 KB))
In addition, a PDF derivative will be produced for some digital objects to provide an easy print solution for clients, such as published works, music scores, transcripts, finding aids and archive and manuscript materials.

5.1. Image standards

The following tables detail the image capture quality, size and file format for digital image objects at the State Library of Queensland.

 
UseDescriptionResolutionSizeFormat File extension
Archival 8-bit greyscale
24-bit colour
600 ppi
400 ppi
6000 pixels
(across longest dimension)
4000 pixels
(across longest dimension)
TIFF .tif
Preview 8-bit greyscale
24-bit colour
100 ppi
500 pixels
(across longest dimension)
JPEG .jpg
Research 8-bit greyscale
24-bit colour
100 ppi 1000 pixels
(across longest dimension)
JPEG .jpg
Zoom 8-bit component
6 decomposition layers
25 quality layers
8:1 compression Same as TIFF master file JPEG
2000
.jp2
Thumbnail 8-bit greyscale
24-bit colour
72 ppi 150 pixels
(across longest dimension)
JPEG .jpg
PDF 24-bit colour 72 ppi 150 pixels (across longest dimension) PDF .pdf


Directory structure and file naming conventions for all digital images are outlined in the Directory and File Naming Conventions.

5.2. Post scan editing of digital images

There is a concern regarding the potential misuse of digital technology in regard to images. State Library of Queensland is particularly concerned with maintaining the look and feel of an original image, including any defects or imperfections. This is maintained for both ethical reasons and image integrity.
The amount of post scan editing is kept to a minimum. Essentially, the only editing of the image is adjustments to brightness and contrast. Images are not cropped or retouched. Images can be cropped so long as this does not interfere with the integrity of the object – e.g. white borders on a photograph or mount board that adds no aesthetic or historical value to the image. Decisions on cropping will be made by content selectors at the time the request for digitisation is made. This basic editing complies with long standing traditional photographic procedures at SLQ.
Retouching including the removal of labels etc., may occur where the label compromises the integrity of the object. This may be in the instance where a contemporary library label is defacing an object/image. Where there is defacement of a significant part of an object by a contemporary label or the like and this is covering information of an image, original material or artwork, the object will be sent to Conservation for removal of the label where possible before digitisation.
Any editing that has taken place will be reflected in the online record.

6. Capture standards - audio

The issue of digitisation of audio materials is one of interest to institutions and organisations building digital repositories. Music, oral history and ethnological resources have many possible applications in libraries, museums and archives. However, the current technological environment, the expected future obsolescence of analogue formats and the deterioration of analogue materials, requires that these resources be in digital form. Digitising audio materials maximizes their use and potential particularly when using the Internet as a means of disseminating this information.

6.1. Analogue to digital conversion

The conversion of analogue audio materials to digital formats will be undertaken using the capture standards outlined in this document. No editing to de-noise, de-crackle, de-thump, or to remove clicks will be undertaken to the master audio file. Excerpts for delivery over the internet may be edited to remove noise, clicks thumps etc. to present for the web.
The State Library may produce a set of four digital audio objects when capturing audio material. These digital audio objects are:

  1. an uncompressed master WAV file
  2. a derivative MP3 version download from the web
  3. a streamed version for clients who use Windows Media Player on broadband connections.

6.2. Audio standards

 
UseDescriptionFormat File extension
Archival Lossless, uncompressed master audio file.
Sample rate/bit depth –
 Complex/music - 96kH/ 24 bit
 Simple/voice – 48kH/ 24 bit
WAVE_LPCM_BWF .wav
Downloadable Lossy, compressed, downloadable audio file.
128 kbps
MPEG 1- Layer 3 .mp3
Streamed Lossy, compressed, streamed audio file.
Broadband version encoded for 256kbps and 512kbps
Windows Media Audio, Version 9 .wma


Directory structure and file naming conventions for digital audio files are outlined in the Directory and File Naming Conventions.

6.3. Post scan editing of audio files

The State Library of Queensland will produce a master audio file that will be preserved as the archival master copy. Derivative versions of the audio file for delivery over the Internet may be edited to reduce noise, clicking etc. and to present excerpts of the master audio file. It is widely accepted that post scan editing will take place to present sound for the web, but at all times, the master archival copy will be preserved.

7. Capture standards - video

The State Library may produce a set of three digital video objects for the delivery of video materials via the Internet. The uncompressed AVI and QuickTime formats will be regarded as preservation masters (archival) for videos, and digital masters for films scanned at 1920x1080 and below

The digital video objects are:

  1. an uncompressed master AVI file
  2. a streamed version for clients who use Windows Media Player on broadband connections.
  3. a downloadable video file for clients who wish to download video to play on their PC at a later time or onto a mobile device.

Minimum specifications for the following formats are outlined in Appendix B (See PDF (PDF 340.1 KB))

 
UseDescriptionFormat File extension
Digital Master/
Archival
Lossless, uncompressed master video file. AVI/
QuickTime
.avi
.mov
Streamed Lossy, compressed, streamed video file. Broadband version encoded for 256kbps and 512 kbps in a multiple bitrate stream Windows Media Video, Version 9 .wmv
Downloadable Compressed video file format, suitable for download MP4 .mp4


Directory structure and file naming conventions for digital audio files are outlined in the Directory and File Naming Conventions.

9. Management and control of digital objects

Before capturing a new digital object format, e.g. audio or video files, collection specialists and project managers must discuss options with the chair of the Resource Discovery Standards to determine file names and directories according to the conventions outlined in Directory and File Naming Conventions.

10. Preservation and data integrity

The State Library's Digital Preservation Policy outlines the key strategies to be employed to enable the effective preservation of its digital content.
Data integrity, backup and data recovery procedures for digital objects comply with current ICTS policies and procedures.

11. Review process

This standard will be reviewed annually or sooner if required. The Resource Discovery Standards Group will lead the review in consultation with Collection Preservation, Queensland Memory and Information Communications & Telecommunications.
The Resource Discovery Standards Group will manage all additions and amendments to this standard. Amendments will be endorsed by the Content Working Group.

12. Sources consulted

Library membership

Become an SLQ member now to access our services, collections and facilities.

Library Shop online

Discover an eclectic range of books, gifts, reproduction prints and more at the Library Shop.