« Zotero and the Internet Archive Join Forces | Main | Effective Practice with e-Portfolios : JISC »

September 14, 2008

RIB - Repository in a Box

UPEI's Model for a "Shrink-Wrapped" Institutional Repository

I have been meaning to post on UPEI's approach to building an institutional repository for a number of months, but like so many things it always seemed to go on the back burner. I thought I would take the opportunity of a cloudy day to provide a brief description of our Repository in a Box project (RIB), which we will be launching this Fall.

RIB is built using UPEI's evolving Drupal/Fedora framework, which is also the basis of our VRE project. RIB is based on a series of workflows on top of the repository backbone:

  • A collection of citation data in an appropriate collecting database (currently RefWorks, but most likely to migrate to Zotero) which represents as complete a collection of the scholarly output of the campus community (at this point faculty) as we can generate. This is generated by harvesting existing databases and adding metadata from CVs.
  • A Fedora content model that defines the nature of the RIB disseminators, citation objects and associated datastreams, including: Qualified Dublin Core record; Original RefWorks XML record; Sherpa-Romeo record; document thumbnail; document PDF

  • A script (or as we call it, in a play on Fedora vocabulary, an inseminator) that takes a RefWorks XML file of the complete citation database, converts it to FOXML and inserts into a Fedora collection, storing each citation as a separate object.
  • A special disseminator that performs a live search of the Sherpa-Romeo database of publisher open access policies and adds/updates a Sherpa datastream in the Fedora object for the article being viewed.
  • An openURL button which send the citation data to out CUFTS linker and enables discovery of the publisher version of the article.

  • A series of XSLTs that convert the metadata and other datastreams for display in the Drupal interface.
  • A search interface, using Drupal's built-in search, that searches the complete Fedora collection, returning results to Drupal.

The end result will be an IR that launches with an almost complete collection of scholarly output for the institution. All the faculty member has to do is log in and the system will display their publications (this is the final piece we are currently working on before we launch) and all associated data.

list_small.jpg

BY viewing the detailed record, the user can view and edit metadata, look for the online version and add datastreams.

general_small.jpg

The individual can click the "Get-It @ UPEI" button to retrieve the final version, if desired, read the publisher open access policy (including a link to the full policy) and add the appropriate version of the pre-post-final print.

sherpa_small.jpg

With a minimum of training our hope is that the presentation of a 90% complete IR record will encourage faculty to complete the process. Some future enhancements will include parsing the Sherpa-Romeo record to automatically grab the publisher version where appropriate and implement disseminators to convert word-processing formats. We will be providing the RIB as an example in our packaged open source Drupal/Fedora module, which will be available at the end of September from SourceForge.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83452e76c69e2010534a910f9970c

Listed below are links to weblogs that reference RIB - Repository in a Box:

Comments

Have you done any tests to see the percentage of your institution's output that this workflow would successfully deal with in practice?

Hi Les - with respect to % of output the workflow would deal with I would expect the following: metadata only (100% - library staff do this and is largely done now); publisher copy deposit allowed (80% of items in this category - an estimate based on the number of publishers not in Sherpa); pre/post-print deposit required (70% - an estimate based on current processes we are implementing to assist faculty with adding content). These are currently estimates, as we are in process with 2 and 3 in particular. These are based on a couple of faculty we are working with now, so are not yet hard numbers - I will post firm numbers as we do more of these.

This also does not discuss the deposit of raw research data, which we are also starting to work on. I don't expect to have any good numbers on this for 6-12 months, but I suspect it would be in the 10-20% mark for newer publications.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment