Friday, November 21, 2014

How to replace all tab and newline characters in a Google Docs Spreadsheet

I've gotten a spreadsheet that is riddled with tab characters and newlines. It's so bad that the reader of this can't process this. So. If I could just remove all tab characters and newline characters from the spreadsheet, I'd be golden.

My first through was, how do I paste in a tab character, or a new line character.. What's the five sequence command to do that. Well, Google Sheets has a much easier way: REGEX.

"\t" is regex for tab, and "\n" for newline.
Find: \n to find newlines   Replace with a space, and check regular expressions

Find: \t to find tabs   Replace with a space, and ensure regular expressions is checked.

I found a full list of regular expression characters at:

Thursday, October 09, 2014

Play Framework: IntelliJ IDEA cannot find declaration to go to

I'm doing a Play! Framework project, and all of a sudden, IntelliJ IDEA forgets about everything, doesn't provide any autocomplete / intellisense, doesn't check syntax, doesn't check my imports, nada, kinda just a text-editor like Sublime at that point.

What I'm using:
- play! framework 2.3.5
- IntelliJ IDEA 13.5.1
- SBT 0.13.5
- Mix of Java and Scala
- Activator UI
- OSX Maverick

Bascially my issue/sympton is: IntelliJ cannot find declaration to go to in Play Framework, and provides no autocomplete / syntax check support.

Stack Overflow had me Invalidate Caches and Restart. ehh, wasn't enough.

Solution: Specify JDK for Scala to JDK 1.7

I'm on OSX, so Linux/Windows users will have to ad-lib.
IntelliJ -> Preferences -> IDE Settings -> Scala -> JVM SDK
Mine was oddly set to , thus nothing worked, so I flipped it to JDK 1.7, and then re-ran Invalidate Caches and Restart. A few minutes later, and I'm back in business.

Friday, October 03, 2014

Using PDFBox to create a PDF and PdfLayoutManager

I'm looking for documentation for creating a PDF with Apache PDFBox, and I'm hitting some limits. Either I'm not figuring out how to use this tool, or it doesn't have API's for how to draw what should be basic things.

So, there's a project from Glen Peterson to add PdfLayoutManager, which should be contributed upstream to PDFBox. Anyways, I was testing out his additions to the project, and here's the PDF it generates (I've removed the image it add to the PDF, I didn't want to include the resource bundle):

Here is a series of screenshots of the output of this. It can wrap text, make tables, draw shapes, insert an image, and specify colors.

Wednesday, September 24, 2014

DSpace: Harvesting an external collection using OAI-ORE

Would you like to create a collection in DSpace that is automatically capable of mirroring content from some other source? Well, if that external source support OAI, your in luck. First a quick primer: OAI has two modes, OAI-PMH (metadata only), and OAI-ORE (also get the bitstreams / content files). If you only need the metadata for metadata-records only, you'll be fine with OAI-PMH, if you also want to reference, or store the bitstreams into DSpace, then hopefully your data provider support OAI-ORE. DSpace by the way supports both OAI-PMH and OAI-ORE. So you can harvest a DSpace collection and get metadata and files.

First: Create a new Collection in DSpace.

Second: Edit Collection - Content Source - OAI Provider
In DSpace, go to Edit Collection, then click the tab for Content Source.
Then choose the option that "This collection harvests its content from an external source".
Once you save, you can then enter the OAI provider base url, then enter the set ID. Also there is an option to choose between "Harvest Metadata Only", or, if the data source supports ORE, you can either choose to have a reference to the files, or have DSpace download the files, and store them in DSpace.

Once you Save, then you get the option to "Import Now".

Import Now will import this right now. Reset and Reimport will delete the previously harvested contents, and reimport.

You can also see all of the collections that have OAI Harvesting enabled from your Control Panel:

Lastly, if you find yourself harvesting from a source that is going to regularly update their contents, and you want to regularly harvest their content, then setup a cron task to have DSpace Command Line harvest the collection each day.

peterdietz:dspace peterdietz$ /dspace/bin/dspace harvest --start
Starting harvest loop... running. 

Thursday, September 11, 2014

DSpace Additions: Author page and Altmetric statistics badge

It's a mixture of big things and little things that can add additional value to your DSpace site. Two interesting additions that I've recently stumbled upon are: Researcher Pages, and Altmetric statistics badge.

Researcher Pages

A project between @Mire and The World Bank's Open Knowledge Repository is to add author pages to DSpace. Thus far, it appears that it shows the author's name, a photo of the author, their biography, and a list of their item's in DSpace that they are an author of.
This is in use at:

Altmetrics Statistics Badge

For articles that have a DOI, you can integrate with the Altmetrics statistics service to display a badge of alternative usage of that article. Altmetrics are things like people citing the paper, mentioning them in a social network or blog, or adding it to your Mendeley library. I've seen this integrated into DSpace by Longsight's Sam Ottenhoff for Marine Biology Laboratory / Woods Hole Oceanographic Institution Open Access Server.

See DSpace and Altmetric's in use at:

Monday, August 25, 2014

DSpace OAI profiles

By default in DSpace, OAI-PMH will share all of your public accessible Items in DSpace through OAI. In case you wanted to restrict or modify the set of results that get shared, you would have to customize the ouput, luckily recent versions of DSpace have an easily modifiable configuration, that essentially gives you "profiles" in OAI.

The default profile is called "request", it doesn't filter the results, and it allows harvesting in many different metadata formats. Note: only publicly accessible items/objects can be disseminatable through OAI.

The other profiles in DSpace are OpenAIRE (Open Access Infrastructure for Research in Europe) and DRIVER (Digital Repository Infrastructure Vision for European Research). By default your repository won't disseminate any objects in OpenAIRE or DRIVER format because the filters in place require some specific metadata to be collected for those profiles/guidelines.

The DRIVER profile declares a number of filters, which restrict the items that disseminate under that profile, to match the requirements of DRIVER. In this case the filters will require: that there is a title (dc.title), that there is an author (, that the document type (dc.type) is one of article, thesis, book, etc,  also that dc.rights is equal to "open access", and lastly that there is a publicly accessible bitstream, hopefully that means that the full text is available.

So, in case you wanted to customize your default "request" profile to restrict the output to all items in the repository that also had full-text available, you would customize:
 <context baseurl="request">  
 To add:  
 <filter refid="bitstreamaccessFilter"/>  

In addition to this information about DSpace OAI profiles, I did run into some bugs or potential issues in the DSpace XOAI code base. For one, there are two modes to run DSpace XOAI in. There is either database mode, where the database responds to all OAI queries, or a performance optimized version, where SOLR indexes your repository. One of the bugs was that the solr mode had a slightly different interpretation of "bitstreamaccessFilter", i.e. database required that there was an original bundle bitstream, the solr version only required that the item was public. To correct this I've patched our code at Longsight, and have contacted the XOAI author to confirm and test the issue.

Monday, July 21, 2014

Improving DSpace Presentation: Video Player and Document Viewer / BookReader

One of the fun goals for DSpace that we have at Longsight, is to make using DSpace a great experience. We've got some more ground to cover, but today we have a BookReader and a Video Player to demonstrate.

The BookReader for DSpace uses the Internet Archive BookReader player to present scanned pages of a book in a format that looks like a book. Put the image of the left page on the left side, and the right page on the right side, and when you turn the page, they change both pages. Simple idea, and when executed properly, it makes the content look much nicer.

An example of this BookReader can be found at:

The Video Player for DSpace uses flash, and plays the video right in your browser. It is not a true streaming solution, but rather, makes use of progressive download, so you can play what has been downloaded, but you won't be able to skip ahead in the video beyond what has been downloaded.

You can view an example of the video player at:

Longsight provides Hosting, Support, and custom development solutions for DSpace, the Digital Repository / Digital Asset Management system for libraries and institutions.

Wednesday, May 28, 2014

DSpace Development at Longsight

I've been a DSpace Developer at Ohio State University for about 5 years, and recently I have changed jobs to work at Longsight, a Registered Service Provider for DSpace. They also have a few other stacks that they support, such as Sakai LMS, and LAMP, such as Drupal and Wordpress. Basically it revolves around solving problems for higher ed, using open source software. They provide hosting, training, consultation, and custom development. So far I've been on a few consultation-setup-hosting-training-development adventures in my time with Longsight, and its fun.

Peter Dietz


One thing I really enjoy at Longsight is that there are always problems to solve, and it becomes my job to come up with a creative way to solve the problem. Also, I feel like I score bonus points when the solution that works for a client, can also be contributed upstream into the next release DSpace, to be benefit everyone. Additionally, I get to meet with clients, sometimes over the phone, other times in person, and there is also meeting people at conferences. This year, I'll be in Helsinki Finland for Open Repositories 2014, come say hi. I'll be presenting on the REST API for DSpace. I'm also taking a bit of a traveling/working/holiday across parts of Scandinavia, you can be "at work", anywhere with internet.

Anyways, I have a number of development projects that I've been working on.
  • better Request-Restricted-Item workflow, adding a helpdesk workflow, with buttons to contact the requester, and author of the work.
  • Mime-Type-Icons for content without thumbnails
  • Document Viewer 
  • Displaying Thumbnails of Restricted Content, instead of showing a broken-image 
  • Statistics converter between SOLR to ElasticSearch
  • Customizing, and extending Mirage2 XMLUI theme, and making custom-branded derivative themes, to match each clients design palette.

Not to sound like a sales pitch, but, if you need DSpace Hosting, or DSpace custom development, keep Longsight in mind, we do good work, have fair prices, and we love contributing our work back upstream to improve the DSpace community. Plus, it will give me some fun work to do.