XI. Processing optical discs

Schlesinger collects optical discs in a variety of formats with varying writeability. “Optical discs” generally refers to CDs and DVDs, which are highly common in born-digital processing. Because of the history of optical media manufacturing and marketing, viewing optical discs as disk images can present some confusing patterns.

History of optical disc file systems

Optical discs have been under development and in circulation for decades, and a common use case for them has been audiovisual playback. Because of the need to devise a standard for playback on proprietary digital players in a rapidly expanding market, the ISO 9660 file system was developed in the late 1980s. In the immediate years following the introduction of ISO 9660, several major manufacturers and developers found that the file system was inadequate, particularly regarding file names. Several extensions and replacements materialized in the following years, including Microsoft’s Joliet system and the standardized Universal Disk Format (UDF). Focuses of these file systems included fixing the file name issue and improving the interoperability of the file system.

For manufacturers seeking to make their discs marketable, they found that there was a need to include several file systems to ensure that discs could be read by common consumer operating systems and playback devices. As a result, many optical discs include more than one file system with multiple references to the same set of files. Since optical discs were a heavily used type of storage media in the 1990s through the 2010s, this phenomenon of multiple file systems is common in Schlesinger collections.

Compact_Disc.png
Figure 1. An optical disc, such as a CD or a DVD, is characterized by the iridescent surface of its underside.

Optical discs in FTK

Because of the structure of many optical disk file systems, disk images of optical media tend to reflect a large amount of duplication with some discrepancies between duplicate files. Disk images of optical discs almost always end in the extension ISO—for example, “Data-999_999.iso.” On rare occasion, a disk image failure can mean that we need to duplicate files using a logical capture method. In these cases, optical disc disk images will normally end in the extension AD1.

Viewing two CDs in FTK’s explorer tree shows duplication in the expanded folder hierarchies. In some cases, file systems even truncate the name of the top folder.

optical-explorer.png
Figure 2. Two CD disk images, Data-193_044 and Data-193_049, shoe how the folders containing the CD data are duplicated and their folder names are labeled with the file system in brackets.

 The topmost file system, 040923_1052 [ISO9660], in the above screen capture includes files in a photo CD format. Optical discs are common carriers of digital photos. Because the file system is ISO 9660, note that the files, shown below, include truncated file names and the capitalization is normalized. Examples of file names below include 2NDCLA~1.TIF and ADBARE~1.JPG.

By comparison, the second file system, 040923_1052 [Joliet], contains exactly the same files (note that the total count in both file lists is 91) with the same MD5 hash signatures, showing absolute bit-for-bit parity between the Joliet file system and ISO 9660. Simultaneously, the important metadata difference between the two file systems is that in Joliet, the file names are more complete.

Using the Quick Pick feature, we can show the entire disk image contents of Data-193_044.iso, including both file systems in its child folders. Adding the filter No Duplicate, we can refine the file list view to show only files that have not been flagged as duplicates during FTK’s evidence processing procedure. This is illuminating, because it shows that all ISO 9660 files in Figure 2 have been flagged as primary items, while the Joliet files in Figure 4 are flagged as duplicates. Some processing archivists may prefer to surface files with complete file names, in which case filtering by duplicates in cases with optical CDs requires some close attention to the file path.

In another example, Data-193_049.iso contains even more file systems, with a total of 5 file systems containing identical files: Joliet, HFS, UDF, ISO 9660, and HFS+. Here, there are additional quirks that can occur, such as the Joliet folder truncating the name of the file system folder from “Boston Women’s Fund JPEGS” to “Boston Women’s F.” In addition, while the ISO 9660 folder name and file names have normalized character capitalization and some removed characters, the entire file names have not been truncated in length. Finally, the HFS+ file system is located on an entirely different disk partition entitled DiskRecording. While some of these phenomena have consistency across disk images, others do not, resulting in some randomness when using the Explore tab to appraise files by disk image.

By comparison, Data-193_049.iso’s HFS+ file system hierarchy again contains the same files while there are some differences in the file names.

When we Quick Pick Data-193_049.iso and filter for No Duplicate, we see that the HFS+ file system is privileged this time, possible because the file system is closer in the file hierarchy to the top folder. Proximity to the top folder in a disk image is one of several criteria FTK uses to select primary files and duplicates during its indexing process.

 

Copyright © 2024 The President and Fellows of Harvard College * Accessibility * Support * Request Access * Terms of Use