About a month and a half ago I migrated this blog from an Azure-hosted WordPress setup to GitHub Pages, which runs on Jekyll. While I wouldn’t describe the migration process as painful, I wish I would have had a guide such as this one when I began the transition as it would have certainly saved me a few headaches along the way. Hopefully this will be able to help someone in the same place that I was!
Before getting into the migration process, I will say that migrating is totally worth it. The process described herein may scare you a bit, but I promise you that once you’re done you’ll be very happy that you did it. Just to hammer the point home, here are a few things that I really love about GitHub pages now that I’m using them.
- You’re the master of your own content. Your blog becomes a GitHub repository that you can clone, fork and do what you want with. Everything is at your control. This isn’t necessarily a very good system for non-programmers, but as a software developer I consider having this fine-grained control to be a great feature.
- There is no black magic behind the scenes. Posts are written in Markdown with a bit of YAML header information, thrown into a folder, and that’s it. This system is both easy to understand, and easy to work with.
- GitHub handles the hosting! Hosting was easy on Azure, don’t get me wrong, but GitHub completely removes hosting from the equation. The only thing that you have to worry about when on GitHub pages is writing.
- It’s free!
Starting the migration
Alright, let’s get down to business. If you’ve read up a bit on Jekyll you’ll probably already know that there is a Jekyll “importer” for WordPress which will migrate your blog for you. Although it’s not perfect, running the importer is an essential first step. The rest of the work will essentially consist of doing cleanup, fixing things here and there that the importer missed.
I won’t go into installing and running the importer in this post. You can find that information here. Follow the instructions provided, and before you know it you’ll have in your hands a rough translation of your WordPress blog in Jekyll.
The HTML problem
Now that you’ve got your blog in Jekyll format, it’s time to start cleaning it up a bit. One of the biggest headaches when migrating is converting old HTML posts into Markdown. Many tools exist which automate this. The one that I used was to-markdown. Pandoc, as suggested by Phil Haack is apparently decent as well.
to-markdown did a pretty decent job of converting the majority of my content, but a few HTML tags did slip through the cracks. In addition, the old
[sourcecode] blocks that I used for syntax highlighting while under WordPress were also, understandably, still present after the transformation.
To try and automate the residual cleanup as much as possible I ended up writing a sed script.
To use the script, create a folder in your Jekyll blog root called
_migration. Create a new text file called
sedscript.txt and paste the following code into it.
Now in the console navigate to your
_posts folder and run the following command.
Note that if you’re on Windows you can run this via the Git Bash window and all should be well.
Once run, about 95% of the rubbish that was left behind after the Markdown conversion should be cleaned up. You’ll still want to run through your posts by hand just to make sure that everything looks good at this point, but by now you should be very close to having fully Markdownified posts.
Once your posts’ content has been cleaned up, the next step is to attack their metadata. WordPress stores a lot of metadata about your blog posts. The Jekyll importer when run according to its instructions should retrieve this metadata and insert it into each post as YAML header information. Right after running the importer, your post headers will look something like the following.
You essentially want to transform them into this.
These are the changes that I made when transforming my posts’ metadata:
- I stripped out all unnecessary tags. This isn’t strictly necessary, but it’s nice to get rid of all that extra information that you just don’t need.
- I renamed the
category, and assigned only one category to each post. Alternatively, I apparently could have kept
categoriesand passed the list of categories as an array. The important thing here is that the hyphenated list as returned by the importer is not valid.
- I bundled the
tagselements into an array. Same problem here with the hyphenated list.
- I changed the
commentsvalue from an array of the comments left on the post to
true. This is because the Jekyll theme that I use for my blog allows for Disqus comments which can be enabled or disabled on any given post by setting this value. (Disqus, in case you aren’t aware, is a third-party service which essentially handles the comment section of your blog on your behalf.)
- I added a
shareelement and set its value to
true. This is also specific to my theme, so like with
comments, this may not apply to you.
- I added the
redirect_fromelement, which states which URLs should redirect to this post. This is essential as I need people who favourited urls on my old blog to be able to find the corresponding posts now that they have moved. This redirection functionality is not handled by Jekyll by default, but instead by a Jekyll plugin called Redirect From. Very simple instructions on how to enable Redirect From on your GitHub pages site can be found here.
Testing it out
Once you think that you’ve got everything worked out, you’ll want to test out your blog locally before pushing it to GitHub. To get Jekyll running on Windows, you’ll need to follow the instructions outlined here. Note that if you doing this on Windows you may need to execute
chcp 65001 in the command prompt before executing Jekyll to solve a character encoding issue. (Thanks to José F. Romaniello for the fix.).
Fix up any remaining errors, and once you’ve got a local copy running you can push to GitHub! Configure your custom domain if you have one using the instructions found here, and you’re good to go. Good luck!