May 11, 2018

Moving Away From WordPress To Serverless

I had been using WordPress for well over a decade. It is a great product and has a lot of capabilities. It is also heavy, and complex if you really only need a simple blog. I generally write very technical content that makes using something like markdown preferable over a WYSIWYG or even HTML. I had still been hosting my blog and my domain names on GoDaddy, back when that was the preferable choice. Having had a great deal of experience over the past number of years with AWS, I decided to look for a solution that would be easy and utilizing AWS.

It’s not surprising to me that AWS does not have a direct clone for WordPress managed blogs, nor should they. I’ve said it many times, AWS is not for the novice. They have created LightSail (which I’ve never used) as an attempt to appease the novice. They offer an instance image with some pre-configured options. I chose the WordPress option and clicked the “Next” button. It really is pretty simply, it is probably more complicated than what you would see in GoDaddy, but its a big step forward. They use a Bitnami image and it has directions how to manage your account.

Now I have no intention of using WordPress or LightSail, but I did want to take a look at it. Instead I wanted to simply use S3 and somehow use markdown. I had experience with Jekyll a bit in the past and was positive that it would satisfy my needs. I had at one time looked into using this service called SiteLeaf, which was nice but for what you are paying for it seems like I can figure out another solution. GitHub is great and all, but I wanted to use AWS end to end, and because they charge next to nothing for code commit.

What I wanted was a simple way to write markdown and take advantage of any markdown plugins. Now because this content is going to be generated and served statically the only processing that needed to be done was when updating or creating content. I wanted to have a development environment where I can write drafts as well as make changes to the site as needed. Ideally I wanted to be able to go to any computer login somewhere and be able to add an article on a whim. Not needing an IDE and an entire setup. This is much less important but something that I did think about.

I started to use AWS CodePipeline which would listen for commits on the respective master and staging branches. It is possible I am mistaken, but I didn’t find a way to make conditional branch that will only rebuild and deploy the respective branch. In fact I had to make two pipelines, one triggered by each branch. I decided whether or not you could do this with CodePipeline and I just couldn’t figure it out, that is an inherit flaw!

Let me go into what I was doing in the build process and give you some clarity. There is only a very fine difference between the development and production environments and that really didn’t need more than an environment variable to be flipped. I was hoping that I would override an environment variable in the pipeline prior to executing the build, but alas I could not find the means.

I knew that I was going to use a Docker image for the build process. I also quickly realized that I will need to have the awscli package in that image. Technically I can just output it and have CodeBuild stick the artifact somewhere, but I wanted to use CodeBuild to “build” and “deploy”. This is a very simple blog that didn’t require several stages like a more sophisticated system would. Luckilly jekyll maintains docker images that are up to date. Here is my Dockerfile below:

Dockerfile

FROM jekyll/jekyll:3.8.0

RUN apk --no-cache update && \
    apk --no-cache add gcc g++ make build-base python3 py3-pip ca-certificates curl groff less git py-yaml && \
    pip3 --no-cache-dir install --upgrade pip && \
    rm -rf /var/cache/apk/*

RUN pip3 --no-cache-dir install awscli

ADD Gemfile /root/Gemfile
ADD Gemfile.lock /root/Gemfile.lock

RUN cd /root && \
    /usr/gem/bin/bundle install

CMD sh

All I really did here was add python support and the aws tool. I also added the Gemfile and Gemfile.lock. If you don’t know ruby, that has the list of dependencies. You may ask the question why would I install all of the dependencies now, wouldn’t I want install them at the time of the build? Yes, of course. This installs them, and then at the time of build if there are any changes or new packages they get installed. This reduced the time of the build immensely and was a perfect enhancement.

Initially for my buildspec.yml I had two simple steps I ran the jekyll build command, and then I used aws s3 cp to upload the generated files to s3. I had a simple environment variable BUCKET which was set with a default to dovidkopel.com. The idea was when running a development build the BUCKET environment variable would be overridden to staging.dovidkopel.com. Unfortunately right away I realized that being able to have pretty urls with no extension and preserve and urls that have existed in the past would be difficult.

Here is my buildspec.yml:

buildspec.yml

version: 0.2

phases:
  install:
    commands:
      - python3 jekyll-build.py
      - python3 pre-upload.py
      - aws s3 cp _site s3://$BUCKET/ --acl public-read --recursive
      - python3 rename-html.py

I wrote up a small python script that was used for the build to give me a little more flexibility.

jekyll-build.py

import os

bucket = os.environ['BUCKET']
base_cmd = '/usr/local/bundle/bin/jekyll {}'

if bucket == 'staging.dovidkopel.com':
    os.system(base_cmd.format('build --drafts'))
else:
    os.system(base_cmd.format('build'))

The way I was going to serve the static content was to set the data with an acl of public-read. Then you enable website hosting for that bucket. That gets you most of the way, you might have thought. Initially I thought the only thing I would be missing is SSL since if I just used the s3 url and made my domain record a cname to point to it it would work. I quickly realized that was a small problem in comparison to the index issue. I knew I was going to use CloudFront more than anything as a easy way to add SSL to my blog. CloudFront supports a “default root object”, in other words if you goto dovidkopel.com it will attempt to fetch dovidkopel.com/index.html. The problem is that this was only supported on the root level. So a link like this: https://dovidkopel.com/2016/11/mankind-is-not-a-simple-feature-vector which is a really great article I wrote a few years back would not function properly. I googled around a bit and found that I am not the first person to find this “annoyance” of CloudFront. There was some solutions proposed, but I came up with my own.

In the case of the url listed earlier the way that WordPress handled that was with a HTTP rewrite rule. Neither S3 (without extreme measure) or CloudFront easily supported this. However, if uploaded the mankind-is-not-a-simple-feature-vector.html file into the 2016/11 directory and then rename the file it worked! I tried a few different variations, uploading the file without an extension and specifying the content type. There was an issue with S3 that I could not change the content-type of the file once it was uploaded. Therefore I successfully uploaded all of the files and then rename them as needed.

I wanted some additional features that were pretty url related. I need to support pagination of the main page: https://dovidkopel.com/archives/page/2. I also wanted to support general archive by tag and chronology with indices. I am using the jekyll-archives plugin. Here is part of my configuration:

jekyll-archives:
  enabled: all
  layout: 'archive'
  layouts:
    year: year-archive
    month: month-archive
    day: day-archive
    tag: tag-archive
    category: category-archive
  permalinks:
    title: '/:year/:month/:title/'
    year: '/archives/date/:year/'
    month: '/archives/date/:year-:month/'
    day: '/archives/date/:year-:month-:day/'
    tag: '/archives/tag/:name/'
    category: '/archives/category/:name/'

Now I wrote a little python script to handle some local renaming after generating the files but before the upload. This made the renaming process on S3 much simpler.

pre-upload.py

import glob
import os

def rename(f):
	old = ff = f.replace('/index.html', '')
	ff = f.replace('/index.html', '.html')
	print('{} --> {}'.format(f, ff))
	os.rename(f, ff)
	os.rmdir(old)	
	
for f in glob.glob('_site/archives/page/*/index.html', recursive=True):
	rename(f)

for f in glob.glob('_site/archives/date/*/index.html', recursive=True):
	rename(f)
	
for f in glob.glob('_site/archives/tag/*/index.html', recursive=True):
	rename(f)	

Lastly, here is the rename script that I use for getting rid of the .html extensions while preserving the content type.

rename-html.py

import glob
import os

bucket = os.environ['BUCKET']

def handle_file(f):
	f = f.replace('_site/', '')
	ff = f.replace('_site/', '')
	ff = ff.replace('.html', '')
	
	os.system('aws s3 mv s3://{}/{} s3://{}/{} --acl public-read --content-type "text/html"'.format(bucket, f, bucket, ff))
	
for f in glob.glob('_site/**/*.html', recursive=True):
	handle_file(f)

The one piece I haven’t gotten to yet is how I got rid of CodePipeline. I created a very simple lambda function. This lambda had a trigger setup from CodeCommit on the updateReference event on the master or staging branches. The lambda needs to have the permissions to invoke the build of course.

As for being able to write at any time I had been playing around with Cloud9. AWS bought Cloud9 recently and I initially really hated the idea. I think that it still needs a lot of work but I think it help strike a balance for unifying the development environment and simplifying collaboration. It is linked to an EC2 instance so it is capable of invoking commands as needed. It was sluggish (could have been the tiny instance) and had no git support built-in. I even figured out how to share the workspace when authenticating via a SAML IDP.

For the time being I have been using StackEdit for jotting down markdown or I jump on Cloud9.

All of the source code referenced here is available on github via this link.