Jackrabbit
From CSWiki
Contents |
[edit] Introduction
Apache Jackrabbit is a fully conforming implementation of the Content Repository for Java Technology API (JCR). A content repository is a hierarchical content store with support for structured and unstructured content, full text search, versioning, transactions, observation, and more. Typical applications that use content repositories include content management, document management, and records management systems.
Why develop this new JCR standard [JSR 170]?
Large organizations have data in many different formats, platforms and locations. There are also many different producers and consumers of information and data. Data access problems occur because each interface, data conversion, version, access authorization, synchronization and storage is done by different applications and platforms. Maintaining access and upgrading infrastructure is difficult, expensive and time consuming. JSR 170 addresses the need for a standard, uniform way to access and control all of this information.
[edit] Conceptual model
[edit] Concepts and objects
The general architecture of Jackrabbit can be described in three Layers: A Content Application Layer, an API Layer and a Content Repository Implementation Layer. See general architecture overview.
[edit] Components
- Jackrabbit API
- Jackrabbit JCR Commons
- Jackrabbit JCR Tests
- Jackrabbit Core
- Jackrabbit Index Filters
- Jackrabbit JCR-RMI
- Jackrabbit WebDAV Library
- Jackrabbit JCR Server
- Jackrabbit Web Application
- Jackrabbit JCA Resource Adapter
In addition there are a number of contributed components in the contrib folder of the Jackrabbit trunk. These components are not yet considered stable enough to be included in the official Apache Jackrabbit releases.
[edit] Prerequisites
Download a binary release and all the required dependencies or build the Jackrabbit sources.
Once you have Jackrabbit available locally, you should make sure that you have at least version 1.4 of the Java 2 Platform, Standard Edition (J2SE) installed and the following libraries configured in your Java classpath:
- jackrabbit-1.0.jar
- jcr-1.0.jar
- slf4j-log4j12-1.0.jar
- log4j-1.2.8.jar -
- commons-collections-3.1.jar
- xercesImpl-2.6.2.jar
- xmlParserApis-2.0.2.jar
- derby-10.1.3.1.jar
- concurrent-1.3.4.jar
- lucene-1.4.3.jar
[edit] How accessed/used
A content repository is an information management system that provides various services for storing, accessing, and managing content. In addition to a hierarchically structured storage system, common services of a content repository are versioning, access control, full text searching, and event monitoring.
A content repository is not a content management system (CMS), although most of the existing CMSs contain a more or less featured custom content repository implementation. A CMS uses a content repository as an underlying component for presentation, business logic, and other features. JackRabbit is a content repository that implements all of the JCR API.
The diagram below explains which components of Jackrabbit are used when a user of the JCR API modified content in the content repository. This is a simple and very common operation, that touches a large portion of the components in the Jackrabbit implementation. This implementation architecture is not mandated by JCR, but has been designed from scratch based on JCR.
- Transient Item State Manager Once content items are read by a session they are cached in the Transient Item State Manager. When those items are modified the modification is only visible to that same session, in the so-called "transient" space.
- Transactional Item State Manager When the Application persists the modified items using the JCR Item.save() or Session.save() the transient Items are promoted into the Transactional ISM. The modifications are still only visible within the scope of this transaction, meaning that other sessions will not see the modification until they are committed. The commit may haven implicitly in case the Content Repository is not running in an XA environment.
- Shared Item State Manager Once a transaction is committed the Shared Item State Manager receives the changelog and publishes the changes to all the sessions logged into the same workspace. This means that all the Item States that are cached and referenced by other sessions are notified and possibly updated or invalidated. The Shared Item State Manager also triggers the observation and hands the changelog over to the persistence manager that is configured for this workspace.
- Persistence Manager The Persistence Manager persists all the Item States in the changelog passed by the Shared ISM. The persistence manager is a very simple, fast and transactional interface that is very low-level and does not need to understand the complexities of the repository operations, but basically just needs to be able persist and retrieve a given item based on its item id.
- Observation When a transaction is committed the Shared Item State Manager triggers the Observation mechanism. This allows applications to asynchronously subscribe changes in the workspace. Jackrabbit also non-standard offers a synchronous observation.
- Query Manager / Index Through a synchronous observation event the Query Manager is instructed to index the new or modified items. A content repository index is much more complex than a classical RDB index since it deals with content repository features like the item hierarchy, nodetype inheritance or fulltext searches.



