Search in photo captions for of the words  


About this collection

This collection demonstrates Greenstone's ImportFrom feature. Using the Open Archive Protocol (version 1.1), it retrieves metadata from, a collection of photographs taken at the inaugural Joint Conference on Digital Libraries. A Greenstone collection is built from the records exported from this OAI data provider. The implementation is flexible enough to cope with the minor syntax differences between OAI 1.1 and OAI 2.0.

How the collection works

The collection configuration file includes an acquire line that is interpreted by a special program called Like other Greenstone programs, this takes as argument the name of the collection, and provides a summary of other arguments when invoked with argument -help. It reads the collection configuration file, finds the acquire line, and processes it. In this case, it is run with the command: oai-e 

(the collection's name is oai-e). The acquire line in the configuration file specifies the OAI protocol and gives the base URL of an OAI repository. The importfrom program downloads all the metadata in that repository into the collection's import directory. The getdoc argument instructs it to also download the collection's source documents, whose URLs are given in each document's Dublin Core Identifier field (this is a common convention). The metadata files, which each contain an XML record for one source document, are placed in the import file structure along with the documents themselves, and the document filename is the same as the filename in the URL. The Identifier field is overridden to give the local filename, and its original value is retained in a new field called OrigURL.

Here is an example of a downloaded metadata file.

Once the OAI information has been imported, the collection is processed in the usual way. The configuration file specifies the OAI plugin, which processes OAI metadata, and the image plugin, because in this case the collection's source documents are image files. The OAI plugin has been supplied with an input_encoding argument because data in this archive contains extended characters. It also has a default_language argument. Greenstone normally determines the language of documents automatically, but these metadata records are too small for this to be done reliably: hence English is specified explicitly in the language argument. The OAI plugin parses the metadata and passes it to the appropriate source document file, which is then processed by an appropriate plugin -- in this case ImagePlug. This plugin specifies the resolution for the screen versions of the images.

The collection configuration file has a single full-text index containing Description metadata. When a document is displayed, the DocumentHeading format statement puts out its Subject. Then the DocumentText statement follows this with screenicon, which is produced by ImagePlug and gives a screen-resolution version of the image; it is hyperlinked to the OrigURL metadata -- that is, the original version of the image on the remote OAI site. This is followed by the image's Description, also with a hyperlink; the image's size and type, again generated as metadata by ImagePlug; and then Subject, Publisher, and Rights metadata. This is the result.

There are two browsing classifiers, one based on Subject metadata and the other on Description metadata (but with a button named "captions"). Recall that the AZCompactList classifier is like AZList but generates a bookshelf for duplicate items. In this collection there are a lot of images but only a few different values for Subject metadata.

It's a little surprising that AZCompactList is used (instead of AZList) for the Description index too, because Description metadata is usually unique for each image. However, in this collection the same description has occasionally been given to several images, and some of the divisions in an AZList would contain a large number of images, slowing down transmission of that page. To avoid this, the compact version of the list is used with some arguments (mincompact, maxcompact, mingroup, minnesting) to control the display -- e.g. groups (represented by bookshelves) are not formed unless they have at least 5 (mingroup) items. To find out the meaning of the other arguments for this classifier, execute the command AZCompactList. The programs (for classifiers) and (for plugins) are useful tools for learning about the capabilities of Greenstone modules. Note incidentally the backslash in the configuration file, used to indicate a continuation of the previous line.

The VList format specification shows the image thumbnail, hyperlinked to the associated document, followed by Description metadata; the result can be seen here. The Vlists for the classifiers use numleafdocs to switch between an icon representing several documents (which will appear as a bookshelf) and the thumbnail itself, if there is only one image.

The Greenstone OAI server

From version 2.52, Greenstone comes with a built-in OAI data provider. This runs as a CGI program called "oaiserver", and is installed in the Greenstone cgi-bin directory. It can be accessed via the same URL as the Greenstone library (replacing "library" with "oaiserver"). If you are using the Windows local library server, you must install a web server (such as Apache) to run the OAI server.

Configuration of the server is done via the oai.cfg file in the Greenstone etc directory. This file specifies general information about the repository, and lists collections to be made accessible to OAI clients. By default, collections are not accessible. To enable a collection, add its name to the oaicollection list. Collections built with versions of Greenstone earlier than 2.52 must be rebuilt before they can be served.

Greenstone's OAI server only supports Dublin Core metadata at present. For collections that use other metadata sets, metadata mapping rules should be provided to map the existing metadata to Dublin Core. See the oai.cfg file for details.

How to find information in the OAI demo collection

There are 3 ways to find information in this collection:

  • search for particular words that appear in the text by clicking the Search button
  • browse documents by Subject by clicking the Subjects button
  • browse documents by Captions by clicking the Captions button