AIP (Archival Information Package): The internal representation of an object into the Digital Preservation Repository, including all data generated upon ingest (e.g., descriptive metadata) needed to manage and preserve it. See also DIP and SIP.
BIT: The fundamental unit of digital information storage, which can have a binary value of either 1 or 0.
BITSTREAM: A sequence of bytes, which has meaningful common properties for the purposes of preservation. A bitstream may be a file or a component of a file.
BORN-DIGITAL are assets that originated in digital form, such as Web sites, wikis, e-books, digital sound recordings, and email.
BYTE: A unit of digital information and measure of data volume, normally equivalent to eight bits. Generally, files, storage devices, and storage capacity are measured in bytes, while data transfer rates are measured in bits. For instance, an SSD may have a storage capacity of 240 GB, while a download may transfer at 10 Mbps. Additionally, bits are also used to describe processor architecture, such as a 32-bit or 64-bit processor.
CHECKSUM is a function used for validating data integrity. Also referred to as MD5 (Message-Digest algorithm 5). An algorithm or formula is applied against the source (typically a file and its content, such as the image of a scanned page from a book) in order to generate a unique, 128-bit hash value often called a checksum. In digital preservation processes, the MD5 checksum from when the content was created is compared to another checksum created after the content has been received or stored over a period of time. The values are compared and, if they match, this indicates that the data (e.g. the scanned page image) is intact and has not been altered.
CRAWL: The activity of using software to recursively download web documents by following links. There are a variety of crawl methods, including: focused crawl, smart crawl, incremental crawl, targeted crawl, and customized crawl.
CRAWLER: Also known as a spider or robot. Software that automatically traverses the web by downloading documents and following links from page to page.
CURATION LIFECYCLE MODEL: A curation lifecycle model documents the relationships between all the stages in the existence of digital information, to enable active management of the resource over time thus maintaining accessibility and usability.
DESCRIPTIVE METADATA: Metadata used for the discovery and interpretation of the digital object. Descriptive metadata may be referred to externally or indirectly by pointing from the digital wrapper to a metadata object, a MARC record, or an EAD instance located elsewhere. Or, descriptive metadata may be embedded in the appropriate section of the digital wrapper.
DIGITAL ART may be as simple as digital photography or it may be much more complex in that it could be mixed media, dynamic, or could require recreation of an entire installation to render it effectively. More complex forms of digital art will likely require one-off solutions.
DIGITAL OBJECT: An entity in which one or more content files and their corresponding metadata are united, physically and/or logically, through the use of a digital wrapper.
DIGITAL WRAPPER: A structured text file that binds digital object content files and their associated metadata together and that specifies the logical relationship of the content files. METS is an emerging, XML-based international standard for wrapping digital library materials. All of the content files and corresponding metadata may be embedded in the digital wrapper and stored with the wrapper. This is physical wrapping or embedding. Or, the content files and metadata may be stored independently of the wrapper and referred to by file pointers from within the wrapper. This is logical wrapping or referencing. A digital object may partake of both kinds of wrapping.
DIP (Dissemination Information Package): An external representation of an object exported from the Digital Preservation Repository, optionally including an Archival Information Package, Submission Information Package, and object metadata. See also AIP and SIP.
EMULATION: The imitation of a computer system, performed by a combination of hardware and software, that allows programs to run between incompatible systems. Or, the ability of a program or device to imitate another program or device.
FEDORA (http://www.fedora-commons.org/) (Flexible Extensible Digital Object Repository Architecture) is a software framework to construct and maintain repositories of digital objects.
FILE: a bitstream which is managed by a file system as a single, named entity.
FIREWIRE: A way to connect different pieces of equipment so they can quickly and easily share information. FireWire (also referred to as IEEE1394 High Performance Serial Bus) is very similar to USB. It preceded the development of USB when it was originally created in 1995 by Apple. FireWire devices are hot pluggable, which means they can be connected and disconnected any time, even with the power on.
FLASH DRIVE: a small device that plugs into computer's USB port and functions as a portable hard drive.
FLOPPY DISK, also called a floppy, diskette, or just disk, is a type of disk storage composed of a disk of thin and flexible magnetic storage medium, sealed in a rectangular plastic enclosure lined with fabric that removes dust particles. Floppy disks, initially as 8-inch (200 mm) media and later in 5¼-inch (133 mm) and 3½-inch (90 mm) sizes, were a ubiquitous form of data storage and exchange from the mid-1970s into the first years of the 21st century.
GUI: Graphical user interface; a mouse-based system that contains icons, drop-down menus, and windows where you point and click to indicate what you want to do. All new Windows and Macintosh computers currently being sold utilize this technology.
HYDRA (http://projecthydra.org/) is an open source repository software solution.
INGEST: The process by which a digital object or metadata package is absorbed by a different system than the one that produced it.
JPEG (Joint Photographic Experts Group) is the name of the group that developed the standard. JPG is a compression method for images.
JPEG 2000 is a wavelet-based image compression standard. It was created by the Joint Photographic Experts Group committee in the year 2000 with the intention of superseding their original discrete cosine transform-based JPEG standard (created about 1991). The standardized filename extension is JP2.
MASTER (copy) is also known as Preservation Master or Archival Master.
METS: A standard for encoding descriptive, administrative, and structural metadata about objects within a digital library, expressed using XML. METS is the national standard for wrapping digital library materials, developed by the Digital Library Federation (DLF) and maintained by the Library of Congress. See the METS web site for more information.
MIGRATION: The transfer of digital objects from one hardware or software configuration to another, or from one generation of computer technology to a subsequent generation. The purpose of migration is to preserve the integrity of digital objects; and to retain the ability for clients to retrieve, display, and use them in the face of constantly changing technology. Migration includes refreshing as a means of digital preservation, however, it is not always possible to make an exact digital copy of a database or other information object and still maintain the compatibility of the object with a new generation of technology.
MODS: An XML schema and data structure and interchange standard, used for the creation of original resource description records (and may also be used as an alternative method for representing MARC data). MODS was developed by the Library of Congress' Network Development and MARC Standards Office. See the MODS web site for more information.
OAIS: A conceptual framework for an archival system dedicated to preserving and maintaining access to digital information over the long term. An archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community. It meets a set of responsibilities that allows an OAIS archive to be distinguished from other uses of the term 'archive'. The term 'Open' in OAIS is used to imply that this Recommendation and future related Recommendations and standards are developed in open forums, and it does not imply that access to the archive is unrestricted. See the OAIS reference model: PDF.
OCR (Optical Character Recognition), computer software designed to convert images of text (usually captured by a scanner) into machine-editable text
PDF (Portable Document Format) is a file format, created by Adobe Systems, for document exchange in a manner independent of the application software, hardware, and operating system.
Preservation Description Information (PDI) The information which is necessary for adequate preservation of the Content Information and which can be categorized as Provenance, Reference, Fixity, and Context information.
SIP (Submission Information Package): An external object representation prepared by the producer for the purpose of ingest into the Digital Preservation Repository, where it will be converted automatically to an Archival Information Package. See also AIP and DIP.
TIFF (Tagged Image File Format) is recognized as the best format for preservation and technical longevity.
TXT (.txt) is a file format used for textual documents usually containing very little formatting.
UTF-8 (UCS Unicode Transformation Format—8-bit) is a form of encoding that is backwards compatible with ASCII. The encoding standard is capable of displaying in email and in Internet browsers the standard 128 ASCII characters for English as well as Latin alphabet characters with diacritics, Greek, Cyrillic, Coptic, Armenian, Hebrew, and Arabic characters.
WEB HARVESTING is performed using open-source tools developed by the Internet Archive to crawl and provide access to the content. The data can be kept in the ISO standard WARC (WebARChive) file format. The approach to Web harvesting is fairly stable.