JTF-HCM Guidelines 2018: Digital Space Occupied


While some collection material in digital formats may occupy physical space because of the media on which it is stored, the management of such material, including projecting future storage and preservation requirements, requires an understanding of the space it occupies in multiples of bytes.


Because the acquisition, description, management, and delivery of born-digital collection material differs, often significantly, from the same for collection material that has been digitized for purposes of online exhibition, service as a surrogate, or for generating derivatives, the guidelines encourage repositories to distinguish, whenever possible, “Born Digital” from “Digitized” collection material when conducting a measure of Digital Space Occupied. A third characterization -- “Digital of Mixed or Unknown Origin” -- is intended to acknowledge and account for the fact that some repositories, in some cases, may find it difficult to accurately and/or confidently distinguish files representing born-digital collection material from files representing digitized or reformatted collection material.


In the context of these guidelines, born digital refers to collection material that was created and is managed in a digital form. As such, all of the following should be categorized as Born Digital collection material:

  • Content such as email, spreadsheets, documents, websites, and other files of any format created, maintained, and acquired from within a computing environment, obtained via server-to-server transfer, forensic imaging, or other process.
  • Audio, video, and other file formats imaged, extracted, or otherwise copied from floppy disks, zip disks, external drives, digital cassettes, computer hard drives, or other storage media, in association with the migration of files to new external media, a server, or a cloud storage environment.
  • Online exhibitions in which born digital or reformatted digital collection material has been contextualized by additional content (curatorial interpretation, narration, annotations, etc.) such that it constitutes a new resource that will be retained and preserved in perpetuity as collection material.

Similarly, in the context of these guidelines, Digitized refers to collection material that has been converted to and is managed in a digital form. As such, all of the following should be categorized as Digitized collection material:

  • Analog audio and video that has been converted to a digital format
  • Books, manuscripts, maps, photographs, posters, etc. that have been digitized for preservation, publication, online exhibition, or another purpose and retained and preserved in perpetuity as collection material.

When it cannot be determined if the files represent Born Digital or Digitized collection material, they should be categorized as Digital of Mixed or Unknown Origin.


A fundamental assumption to the measure of Digital Space Occupied that is called for in these guidelines is that only files that are actively managed as collection material for which the repository provides sustained stewardship are included. Digital files that are produced during the course of service provision, such as scans created in response to patron requests, are not included, nor are digital files created or received by the repository as part of routine operations (correspondence, administrative files, etc.) unless they have been formally accessioned and are being managed as inactive institutional records.


“Actively managed” implies that the files are in a preservation repository or other regularly backed-up storage environment -- that is, any configuration of hard drives, networked servers, and/or cloud-based storage for which measures to extend or ensure the viability of its contents are undertaken. Also implicit in this characterization of “actively managed” is the expectation that files that exist only on external media as acquired or received by the repository, and that have not yet been imaged or extracted to a managed preservation environment, are not to be included in a count of Digital Space Occupied.


The following points provide guidance when measuring Digital Space Occupied.

  1. Digital Space Occupied is reported in multiples of bytes -- bytes, megabytes, gigabytes, and/or or terabytes -- at the discretion of the repository. 
  2. All collection material in digital formats should be categorized as one of the following: Born Digital, Digitized, or Digital of Mixed or Unknown Origin.
  3. Digital files that are described online and therefore discoverable should be distinguished from digital files that have not yet been described online and are therefore not discoverable. Digital files do not need to be described at the file level to be considered “Discoverable.” When it is not possible or practical to discern discoverability, report the Digital Space Occupied as “Discoverability Mixed/Unknown.”
  4. The recommended counts for Digital Space Occupied do not require the categorization of digital files by types of collection material; this categorization is explicitly called for in the optional counts. The types include an “Other Collection Material” category for measuring Digital Space Occupied by files for which one cannot accurately and/or confidently discern the type of collection material represented by the files. 

The following resources may be helpful for calculating a measure of Digital Space Occupied: 

NEXT SECTION: Conducting the Counts and Measures
Physical Space Occupied
GO TO:  Table of Contents