Glossary relating to born-digital materials
AIP (Archival Information Package): The internal representation of an object into the Digital Preservation Repository, including all data generated upon ingest (e.g., descriptive metadata) needed to manage and preserve it. See also DIP and SIP.
BIT: The fundamental unit of digital information storage, which can have a binary value of either 1 or 0.
BITCURATOR Access project developed tools to help libraries, archives, and museums provide web-based and local access to born-digital materials held on disk images.
BITCURATOR Environment is a Ubuntu-derived Linux distribution geared towards the needs of archivists and librarians. It includes a suite of open source digital forensics and data analysis tools to help collecting institutions process born-digital materials.
BITSTREAM: A sequence of bytes, which has meaningful common properties for the purposes of preservation. A bitstream may be a file or a component of a file.
BORN-DIGITAL are assets that originated in digital form, such as Web sites, wikis, e-books, digital sound recordings, and email.
BYTE: A unit of digital information and measure of data volume, normally equivalent to eight bits. Generally, files, storage devices, and storage capacity are measured in bytes, while data transfer rates are measured in bits. For instance, an SSD may have a storage capacity of 240 GB, while a download may transfer at 10 Mbps. Additionally, bits are also used to describe processor architecture, such as a 32-bit or 64-bit processor.
CHECKSUM is a function used for validating data integrity. Also referred to as MD5 (Message-Digest algorithm 5). An algorithm or formula is applied against the source (typically a file and its content, such as the image of a scanned page from a book) in order to generate a unique, 128-bit hash value often called a checksum. In digital preservation processes, the MD5 checksum from when the content was created is compared to another checksum created after the content has been received or stored over a period of time. The values are compared and, if they match, this indicates that the data (e.g. the scanned page image) is intact and has not been altered.
CRAWL: The activity of using software to recursively download web documents by following links. There are a variety of crawl methods, including: focused crawl, smart crawl, incremental crawl, targeted crawl, and customized crawl.
CRAWLER: Also known as a spider or robot. Software that automatically traverses the web by downloading documents and following links from page to page.
CURATION LIFECYCLE MODEL: A curation lifecycle model documents the relationships between all the stages in the existence of digital information, to enable active management of the resource over time thus maintaining accessibility and usability.
DESCRIPTIVE METADATA: Metadata used for the discovery and interpretation of the digital object. Descriptive metadata may be referred to externally or indirectly by pointing from the digital wrapper to a metadata object, a MARC record, or an EAD instance located elsewhere. Or, descriptive metadata may be embedded in the appropriate section of the digital wrapper.
DIGITAL ART may be as simple as digital photography or it may be much more complex in that it could be mixed media, dynamic, or could require recreation of an entire installation to render it effectively. More complex forms of digital art will likely require one-off solutions.
DIGITAL OBJECT: An entity in which one or more content files and their corresponding metadata are united, physically and/or logically, through the use of a digital wrapper.
DIGITAL WRAPPER: A structured text file that binds digital object content files and their associated metadata together and that specifies the logical relationship of the content files. METS is an emerging, XML-based international standard for wrapping digital library materials. All of the content files and corresponding metadata may be embedded in the digital wrapper and stored with the wrapper. This is physical wrapping or embedding. Or, the content files and metadata may be stored independently of the wrapper and referred to by file pointers from within the wrapper. This is logical wrapping or referencing. A digital object may partake of both kinds of wrapping.
DIP (Dissemination Information Package): An external representation of an object exported from the Digital Preservation Repository, optionally including an Archival Information Package, Submission Information Package, and object metadata. See also AIP and SIP.
DISK IMAGE: A computer file containing the complete contents and structure representing a data storage medium or device, such as a hard drive, floppy disk, optical disc, or USB flash drive. A disk image is usually made by creating a sector-by-sector copy of the source medium, thereby perfectly replicating the structure and contents of a storage device independent of the file system.
ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives. ePADD Phase 2 is being developed from 2015-2018 by staff of the Department of Special Collections and University Archives, Stanford University Libraries (SUL), Stanford University, in collaboration with partners at Harvard University, the Metropolitan New York Library Council (METRO), University of Illinois at Urbana-Champaign, and University of California, Irvine.
EMULATION: The imitation of a computer system, performed by a combination of hardware and software, that allows programs to run between incompatible systems. Or, the ability of a program or device to imitate another program or device.
FC5025: Device Side Data's FC5025 USB 5.25" floppy controller plugs into any computer's USB port and enables you to attach a 5.25" floppy drive. Compatible with TEAC FD-55GFR or equivalent drive (not included with the FC5025).
FEDORA (http://www.fedora-commons.org/) (Flexible Extensible Digital Object Repository Architecture) is a software framework to construct and maintain repositories of digital objects.
FILE: a bitstream which is managed by a file system as a single, named entity.
FIREWIRE: A way to connect different pieces of equipment so they can quickly and easily share information. FireWire (also referred to as IEEE1394 High Performance Serial Bus) is very similar to USB. It preceded the development of USB when it was originally created in 1995 by Apple. FireWire devices are hot pluggable, which means they can be connected and disconnected any time, even with the power on.
FLASH DRIVE: a small device that plugs into computer's USB port and functions as a portable hard drive.
FLIPPY DISKS: People would sometimes fill one side of a 5.25" disk and then flip it over to store more on the other side. Disks used this way are called "flippy" disks. 5.25" disks have a hole, called the index hole, that lets the drive know if the disk is rotating. (The index hole has other purposes also.) The problem with flippy disks is that when the disk is inserted upside-down, the drive cannot see the index hole. Many drives won't read from the disk unless they can see the index hole. There is no recommended drive for reading flippy disks at this time.
FLOPPY DISK, also called a floppy, diskette, or just disk, is a type of disk storage composed of a disk of thin and flexible magnetic storage medium, sealed in a rectangular plastic enclosure lined with fabric that removes dust particles. Floppy disks, initially as 8-inch (200 mm) media and later in 5¼-inch (133 mm) and 3½-inch (90 mm) sizes, were a ubiquitous form of data storage and exchange from the mid-1970s into the first years of the 21st century.
FRED (Forensic Recovery of Evidence Device): The FRED family of forensic workstations (produced by Digital Intelligence of New Berlin, WI) are highly integrated, flexible and modular forensic platforms. Available in mobile, stationary and laboratory configurations, these systems are designed for both the acquisition and examination of computer evidence.
GUI: Graphical user interface; a mouse-based system that contains icons, drop-down menus, and windows where you point and click to indicate what you want to do. All new Windows and Macintosh computers currently being sold utilize this technology.
HYDRA (http://projecthydra.org/) is an open source repository software solution.
INGEST: The process by which a digital object or metadata package is absorbed by a different system than the one that produced it.
JAZ DRIVE is a removable hard disk storage system sold by the Iomega company from 1996 to 2002. Following the success of the Iomega Zip drive, which stored data on removable magnetic cartridges with 100MB nominal capacity, the company developed and released the Jaz drive with 1GB capacity per removable disk, increased to 2GB in 1998.
JPEG (Joint Photographic Experts Group) is the name of the group that developed the standard. JPG is a compression method for images.
JPEG 2000 is a wavelet-based image compression standard. It was created by the Joint Photographic Experts Group committee in the year 2000 with the intention of superseding their original discrete cosine transform-based JPEG standard (created about 1991). The standardized filename extension is JP2.
KRYOFLUX is a USB-based hardware and software solution for preserving software on floppy disks. It was developed by the Software Preservation Society. Works with all major 3.5" and 5.25" drives; works well with selected 3" (e.g. Amstrad FDI-1) drives; also works with 8" (e.g. Shugart 851; might require additional adapter) drives; other types of drives and media currently under investigation. See guide (in process).
LOGICAL (vs. PHYSICAL): High-level versus low-level. Logical implies a higher view than the physical. Users relate to data logically by data element name; however, the actual fields of data are physically located in sectors on a disk. A logical disk image is a copy of specific files made from a storage device, retaining their hierarchical organization within directories or folders. The full path of each file is recorded. Deleted files and un-partitioned space are not copied. Also referred to as a logical copy.
MASTER (copy) is also known as Preservation Master or Archival Master.
METS: A standard for encoding descriptive, administrative, and structural metadata about objects within a digital library, expressed using XML. METS is the national standard for wrapping digital library materials, developed by the Digital Library Federation (DLF) and maintained by the Library of Congress. See the METS web site for more information.
MIGRATION: The transfer of digital objects from one hardware or software configuration to another, or from one generation of computer technology to a subsequent generation. The purpose of migration is to preserve the integrity of digital objects; and to retain the ability for clients to retrieve, display, and use them in the face of constantly changing technology. Migration includes refreshing as a means of digital preservation, however, it is not always possible to make an exact digital copy of a database or other information object and still maintain the compatibility of the object with a new generation of technology.
MODS: An XML schema and data structure and interchange standard, used for the creation of original resource description records (and may also be used as an alternative method for representing MARC data). MODS was developed by the Library of Congress' Network Development and MARC Standards Office. See the MODS web site for more information.
OAIS: A conceptual framework for an archival system dedicated to preserving and maintaining access to digital information over the long term. An archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community. It meets a set of responsibilities that allows an OAIS archive to be distinguished from other uses of the term 'archive'. The term 'Open' in OAIS is used to imply that this Recommendation and future related Recommendations and standards are developed in open forums, and it does not imply that access to the archive is unrestricted. See the OAIS reference model: PDF.
OCR (Optical Character Recognition), computer software designed to convert images of text (usually captured by a scanner) into machine-editable text
PDF (Portable Document Format) is a file format, created by Adobe Systems, for document exchange in a manner independent of the application software, hardware, and operating system.
PHYSICAL (vs. LOGICAL): see logical.
Preservation Description Information (PDI) The information which is necessary for adequate preservation of the Content Information and which can be categorized as Provenance, Reference, Fixity, and Context information.
SIP (Submission Information Package): An external object representation prepared by the producer for the purpose of ingest into the Digital Preservation Repository, where it will be converted automatically to an Archival Information Package. See also AIP and DIP.
TIFF (Tagged Image File Format) is recognized as the best format for preservation and technical longevity.
TXT (.txt) is a file format used for textual documents usually containing very little formatting.
UTF-8 (UCS Unicode Transformation Format—8-bit) is a form of encoding that is backwards compatible with ASCII. The encoding standard is capable of displaying in email and in Internet browsers the standard 128 ASCII characters for English as well as Latin alphabet characters with diacritics, Greek, Cyrillic, Coptic, Armenian, Hebrew, and Arabic characters.
WEB HARVESTING is performed using open-source tools developed by the Internet Archive to crawl and provide access to the content. The data can be kept in the ISO standard WARC (WebARChive) file format. The approach to Web harvesting is fairly stable.
ZIP DRIVE is a medium-to-high-capacity (at the time of its release) removable floppy disk storage system that was introduced by Iomega in late 1994. Originally, Zip disks launched with capacities of 100 MB, but later versions increased this to first 250 MB and then 750 MB.