Friday, November 04, 2011

Enhancing DSpace Statistics - Collection Report

Currently, we're in the process of enhancing our statistics for our local DSpace instance.
The first step is to go through and make sure we have good data. This involves kicking out robots, and other usage abuse, as well as ensuring we've captured information from the log activity.

The second step is reporting what we've got. Thus far, we're working in a few directions to add more reporting information, this post will be the first in a series of explaining some of our new reports.

Collection Statistics
The collection statistics page in DSpace 1.6+, i.e. Solr statistics in DSpace doesn't show you very much. Atleast it doesn't show you very much that your interested it. Its almost irrelevant how many hits the collection page received, you are mostly just interested in the usage of the content within the collection.


Thus far, we've added Top Bitstreams and Top Items.

Top Items for the past month shows total for the time period and daily hits.
Top Bitstreams for the past month shows total for the time period and daily hits.


We've also added the ability to download a CSV report of the bitstreams and items within the collection right from the statistics page. The benefit of offering the CSV is so that the user can then do what they want to do with the data that we're not offering through our web interface, and so that we can deliver more information when its in a spreadsheet, as opposed to trying to display data in the browser.

Statistics Report of all items in the collection as CSV

Statistics Report of all bitstreams in the collection as a CSV


We don't have source code publicly available for how to do this, but in XMLUI we've just altered StatisticsTransformer.java in XMLUI to add the additional "views" of top items within collection. And for the CSV reports, we've added a servlet that listens responds the URL "usage-event". An example would be dspace.example.com/usage-report?owningType=4&owningID=148&reportType=0 which generates a csv report for community with community_id 148, and it reports on bitstreams.

DSpace Types are:
0 = Bitstream, 1 = Bundle, 2 = Item, 3 = Collection, 4 = Community.

owningType is the type of the parent
owningID is internal ID of the parent, once you've determine which type it is
reportType is the type to report

Thus far the only gripe about generating the servlet to report is that there is a strong coupling between dspace-statistics, solr, and XMLUI, so we had to keep this servlet in the dspace-xmlui-api namespace as opposed to the preferred dspace-api.