Friday, May 22, 2015

Changing DSpace domain name

At Longsight, we've been hosting a DSpace site at saylor.longsight.com for over a year, this .longsight.com is an internal placeholder location for spinning up a new site, we control this domain space. When it comes time for a DSpace site to have a proper domain name, such as library.saylor.org there are a few steps.

1) Set up SSL
Get the SSL .crt and .key, add it to your nginx server, and configure your /etc/nginx/conf.d/.conf

You can test that you are listening to the proper hostname on 80 and 443, by editing your local development computers /etc/hosts
IP.OF.WEB.SERVER library.saylor.org

Get the sysadmin of saylor to CNAME library.saylor.org to saylor.longsight.com

2) Change all mentions of saylor.longsight.com to library.saylor.org in the config directory for this instance.

3) Write a SQL query to change the site url in all the handle metadata.


select * from metadatavalue where text_value like '%saylor.longsight.com%';
14000+ results


select * from metadatavalue where text_value like '%library.saylor.org%';
0 results

BEGIN;
update metadatavalue set text_value = replace(text_value, 'https://saylor.longsight.com', 'https://library.saylor.org');
COMMIT;

select * from metadatavalue where text_value like '%saylor.longsight.com%';
0 results


select * from metadatavalue where text_value like '%library.saylor.org%';
14000+ results

4) Reindex Discovery
You've changed metadata outside of the system, no Events were fired, so you'll have to manually force DSpace to refresh its metadata index.

bin/dspace index-discovery -b

5) Regenerate Sitemaps
Your sitemaps (for search engines) are outdated, and have a link to your old domain. Re-run the sitemap generator, for search engines to crawl your site, with updated URLs.

bin/dspace generate-sitemaps


6) Measure success of Google picking up the redirect
Search: site:saylor.longsight.com, there are 65,900 results
https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=site%3Asaylor.longsight.com

Search: site:library.saylor.org, there is 1 result.
https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=site%3Alibrary.saylor.org&qscrl=1

This will take several weeks for the search engines to recrawl and update their search index.

I add Google Analytics and Google Webmaster Tools, and re-upload the sitemap, to ensure to robots pick this up ASAP.