Wednesday, September 24, 2014

DSpace: Harvesting an external collection using OAI-ORE

Would you like to create a collection in DSpace that is automatically capable of mirroring content from some other source? Well, if that external source support OAI, your in luck. First a quick primer: OAI has two modes, OAI-PMH (metadata only), and OAI-ORE (also get the bitstreams / content files). If you only need the metadata for metadata-records only, you'll be fine with OAI-PMH, if you also want to reference, or store the bitstreams into DSpace, then hopefully your data provider support OAI-ORE. DSpace by the way supports both OAI-PMH and OAI-ORE. So you can harvest a DSpace collection and get metadata and files.

First: Create a new Collection in DSpace.

Second: Edit Collection - Content Source - OAI Provider
In DSpace, go to Edit Collection, then click the tab for Content Source.
Then choose the option that "This collection harvests its content from an external source".
Once you save, you can then enter the OAI provider base url, then enter the set ID. Also there is an option to choose between "Harvest Metadata Only", or, if the data source supports ORE, you can either choose to have a reference to the files, or have DSpace download the files, and store them in DSpace.

Once you Save, then you get the option to "Import Now".

Import Now will import this right now. Reset and Reimport will delete the previously harvested contents, and reimport.

You can also see all of the collections that have OAI Harvesting enabled from your Control Panel:

Lastly, if you find yourself harvesting from a source that is going to regularly update their contents, and you want to regularly harvest their content, then setup a cron task to have DSpace Command Line harvest the collection each day.

peterdietz:dspace peterdietz$ /dspace/bin/dspace harvest --start
Starting harvest loop... running. 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.