DSpace

From CSWiki

Jump to: navigation, search

Contents

[edit] Introduction

DSpace is a groundbreaking digital repository system that captures, stores, indexes, preserves, and redistributes an organization's research data.

What Can DSpace Do?

Jointly developed by MIT Libraries and Hewlett-Packard Labs, the DSpace software platform serves a variety of digital archiving needs.

  • Institutional Repositories (IRs)
  • Learning Object Repositories (LORs)
  • eTheses
  • Electronic Records Management (ERM)
  • Digital Preservation
  • Publishing
  • and more

What Kinds of Content Does DSpace Accept?

DSpace accepts all manner of digital formats. Some examples of items that DSpace can accommodate are:

  • Articles and preprints
  • Technical reports
  • Working papers
  • Conference papers
  • E-theses
  • Datasets: statistical, geospatial, matlab, etc.
  • Images: visual, scientific, etc.
  • Audio files
  • Video files
  • Learning objects
  • Reformatted digital library collections

[edit] Conceptual Model

Image:Dspace-diagram_large.jpg

  1. Web-based interface makes it easy for a submitter to create an archival item by depositing files. DSpace was designed to handle any format from simple text documents to datasets and digital video.
  2. Data files, also called bitstreams, are organized together into related sets. Each bitstream has a technical format and other technical information.
  3. An item is an "archival atom" consisting of grouped, related content and associated descriptions (metadata). An item's exposed metadata is indexed for browsing and searching. Items are organized into collections of logically-related material.
  4. A community is the highest level of the DSpace content hierarchy. They correspond to parts of the organization such as departments, labs, research centers or schools.
  5. DSpace's modular architecture allows for creation of large, multi-disciplinary repositories that ultimately can be expanded across institutional boundaries.
  6. DSpace is committed to going beyond reliable file preservation to offer functional preservation where files are kept accessible as technology formats, media, and paradigms evolve over time for as many types of files as possible.
  7. The end-user interface supports browsing and searching the archives. Once an item is located, Web-native formatted files can be displayed in a Web browser while other formats can be downloaded and opened with a suitable application program.

[edit] Data Model

Image:Data-model.gif

  • The way data is organized in DSpace is intended to reflect the structure of the organization using the DSpace system. Each DSpace site is divided into communities; these typically correspond to a laboratory, research center or department. As of DSpace version 1.2, these communities can be organized into an hierarchy.
  • Communities contain collections, which are groupings of related content. A collection may appear in more than one community.
  • Each collection is composed of items, which are the basic archival elements of the archive. Each item is owned by one collection. Additionally, an item may appear in additional collections; however every item has one and only one owning collection.
  • Items are further subdivided into named bundles of bitstreams. Bitstreams are, as the name suggests, streams of bits, usually ordinary computer files. Bitstreams that are somehow closely related, for example HTML files and images that compose a single HTML document, are organised into bundles.

Objects in the DSpace Data Model

  • Object - Example
  • Community - Laboratory of Computer Science; Oceanographic Research Center
  • Collection - LCS Technical Reports; ORC Statistical Data Sets
  • Item - A technical report; a data set with accompanying description; a video recording of a lecture
  • Bundle - A group of HTML and image bitstreams making up an HTML document
  • Bitstream - A single HTML file; a single image file; a source code file
  • Bitstream Format - Microsoft Word version 6.0; JPEG encoded image format

[edit] Customization and Manipulation

There are a number of ways in which DSpace can be configured and/or customized:

  • Altering the configuration files in [dspace]/config
  • Creating modified versions of the JSPs; these can be placed separately from and override the default installed JSPs, so that future updates of the code won't overwrite your changes
  • Implementing a custom 'authenticator' class, so that user authentication in the Web UI can be adapted and integrated with any existing mechanisms your organization might use
  • Editing the source code

  • DSpace has documented Java APIs that can be customized to allow interoperation with other systems an institution might be running (for example, a department's web document system auto-depositing in DSpace, or a campus data warehouse).
  • DSpace also uses the Handle System from CNRI to assign and resolve persistent identifiers for each digital item. Handles are UN-compliant identifiers. The Handle resolver is an open-source system used in conjunction with DSpace. The developers chose to use handles instead of persistent ULs to support citations to items in DSpace over very long time spans... Handles in DSpace are currently implemented as ULs, but can also be modified to work with future protocols.
  • Metadata can be entered into DSpace, stored in the database, indexed appropriately, and made searchable through the public user interface. This currently applies mainly to descriptive metadata, although as standards emerge it could also include technical, rights, preservation, structural, and behavioral metadata. Currently DSpace supports only the Dublin Core metadata element set with a few qualifications conforming to the library application profile. The DSpace team hopes to support a subset of the IMS/SCOM element set (for describing education material) sometime in the near future. HP and MIT also have a research project called SIMILE, which is investigating how to support arbitrary metadata schemas using DF as applied by the Haystack research project in the Lab for Computer Science and some of the Semantic Web technologies being developed by the W3C.
  • DSpace supports the Open Archives Initiative's Protocol for Metadata Harvesting (OAI-PMH) v2.0 as a data provider. OAI support was implemented using OCLC's OAICat open-source software to make DSpace item records available for harvesting. DSpace@MIT is registered as a data provider with the Open Archives Initiative. Other institutions running DSpace may choose to turn on OAI or not, and to register as a data provider or not.

[edit] Prerequisites

Building an Institutional Repository with DSpace requires installing the following technologies:

  • UNIX-like OS (Linux, HP/UX etc)
  • Java 1.4 or later (standard SDK is fine, you don't need J2EE)
  • Apache Ant 1.5 or later (Java make-like tool)
  • PostgreSQL 7.3 or later, an open source relational database, or Oracle 9 or higher.
  • Jakarta Tomcat 4.x/5.x or equivalent, such as Jetty or Caucho Resin.

[edit] Creating an Institutional Repository

Each university has a unique culture and assets that require a customized approach. The information model that best suits your university would not fit another campus.

It’s important to define precisely how you intend to use the system and what type of services you will offer. For example, some universities build their institutional repository to hold only academic research. Others expand the service definition to include student theses, learning materials, or university records. Ideally, you want to decide this before you build the technical infrastructure of an institutional repository.

Personal tools