Making a simple Static Site Generator

Static Site Generators can be very simple. You take your markdown files, you convert them to HTML and you stick them in a template. Sure, there are some extras that are nice to have, and some of them can also be easily scripted. In this article I will explain how I generate this site using a Makefile and a shell script.

TL;DR

I show you how I build this site using a script called from a Makefile, that converts markdown files to HTML with some additional processing.

Converting Markdown to HTML

If you're feeling adventurous, implement it from scratch. That's not what I did here, I felt this was a wheel not worth reinventing. There are several tools you can use for this. I chose cmark-gfm (GitHub's fork of cmark).

The syntax is as simple as:

cmark-gfm file.md > file.html

It can also read from stdin in case you need to do some preprocessing:

my-preprocessing-function | cmark-gfm > file.html

Putting it in a Makefile

Makefiles are great since they only rebuild files if their source was updated. Check out the great tutorial in the references¹ to learn more. Let's start putting one together with what we've learned so far. What we want is to take all the markdown files in the current directory and convert them to HTML files with the same name and different extension. We start by listing the files that we will be working with:

MD   := $(wildcard *.md)
HTML := $(MD:%.md=%.html)

Here we're first using a wildcard to match all .md files in the current directory and put them in the $(MD) variable. Then we use a Substitution Reference² to replace all the .md extensions in $(MD) with .html, and save it in the $(HTML) variable, which now contains all our targets.

Now we need a rule to build our targets:

all: $(HTML)

%.html: %.md
	cmark-gfm $< > $@

Here we're saying that the .html files should be built from the corresponding .md files using cmark-gfm. The special variables $< and $@ mean the name of the first prerequisite (.md file) and the name of the target (.html file), respectively. ³

The "all" target (or whatever you decide to name it) must be specified and depend on the files in the $(HTML) variable.

Preprocessing

That's cool and all, but our HTML files are still lacking some important stuff, like hmm... I don't know, a <html> tag maybe? I suppose you could do something as ugly as echoing/cating some template HTML to prepend/append to the generated one, with all your headers and stuff, but let's do something slightly less ugly. Oh wait, never mind, that's almost exactly what I did. Let's put it in a shell script, we'll get back to the Makefile soon. Here is the main thing (we'll get to the functions later):

cat <<-EOF
<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="utf-8">
	<link rel="stylesheet" type="text/css" href="/css/style.css"/>
	<title>~jlucas/$(gettitle "$input")</title>
</head>
<body>
	<header>
		$(cat "${srcdir}/templates/header.html")
	</header>
<main>
	$(preprocess "$input" | cmark-gfm --unsafe -e strikethrough -e table -e footnotes)
</main>
<footer>
	$(cat "${srcdir}/templates/footer.html")
</footer>
</body>
EOF

Yes, a lot of it is hardcoded. Yes, I could've used a real template engine. But this is specific for this site and I don't plan on changing this structure for specific files, so who cares?

With that out of the way, let's break it down. In the middle of this hardcoded mess, you can find some $(shell substitutions). They are used to insert the page title in the <head>, the navigation bar in the <header>, the converted HTML in <main> and the footer in the <footer>.

The header and footer are pretty simple. I just do a little catception to include them from static manually written files.

As for the title, I'm taking it from the heading on the first line of the markdown file. Here's the function:

gettitle() {
  sed -n '1s/^#* *//p' "$1"
}

The regex is just removing any '#' symbols that might be prepended.

The final piece we're missing here is the preprocess function. This will depend on the use case and might not always be needed. All I use it for at the time of writing is to auto generate a list with all posts. For that I make it search each line of the input file for the expression "@POSTS@" and replace it with the actual list. Here it is:

preprocess() {
  while IFS= read -r line; do
    case "$line" in
      '@POSTS@') listposts ;;
              *) printf '%s\n' "$line" ;;
    esac
  done < "$1"
}

Basically it reads each line of the markdown file, and either prints it as is or lists the posts. The "IFS=" is needed to prevent it from removing leading whitespace. And what's that listposts thing? Oh, right, it's yet another function. I promise it's the last one. Here it is:

listposts() {
  find "${srcdir}/posts" -type f -regextype posix-extended \
                         -regex '.*/[0-9]{8}-.*\.md' -printf '%f\n' |
  sort -r |
  while read -r file; do
    filedate="$(date -d "${file%%-*}" +%F)"
    printf '+ [%s] [%s](%s)\n' \
      "$filedate" \
      "$(gettitle "${srcdir}/posts/${file}")" \
      "/posts/${file%md}html"
  done
}

I decided to name my posts with a prepended date, in the format YYYYMMDD-title-goes-here.md. That way I avoid having to store some metadata elsewhere. Then I just list all the files in the posts dir, sort them as latest first, and write the date, title and link in a markdown list.

Then this preprocessed markdown gets fed to cmark-gfm with some additional options to enable some nice extensions and allow raw HTML.

Putting it all together

So far we have an outdated Makefile and a shell script that takes a markdown file, does some preprocessing to it and spits out the resulting HTML to stdout. We'll name the shell script src/generate.sh. Let's update that Makefile.

Previously we were calling cmark-gfm directly in the Makefile. That's now handled by the script, so just replace cmark-gfm with src/generate.sh. And that's all there is to it, unless of course you want to go a little bit further with organizing the directory structure. You do? Cool, let's do that.

Since I wanted to have the HTML and markdown in separate directories while mirroring the directory structure, I had to make some changes to the way I get the list of targets, as well as the rule to build them. Instead of a wildcard like we had before, we can invoke a shell with the find command to list markdown files recursively. Also I keep the markdown files in the src directory, so that also needs to be handled when we do the substitution for the HTML targets:

MD   := $(shell find src -type f -name '*.md')
HTML := $(MD:src/%.md=%.html)

And finally the rule to build our HTML should look like this:

$(HTML): %.html: src/%.md
	@mkdir -p $(@D)
	src/generate.sh $< > $@

Notice I added a mkdir command to create directories if needed. The $(@D) variable evaluates to the directory containing $@. Also I had to use Static Pattern Rules⁴ to allow the pattern matching to work properly with subdirectories.

Now add some more dependencies to have certain files rebuild when scripts or templates change, and we're all set (check out the complete Makefile below).

Wrapping up

Wow, that came out a bit longer than I was expecting. Still, the point is that if you don't need all the complexity of modern SSGs, you might be better off making your own thing. All we did here convert markdown to HTML with a little {pre,post}processing and the help of a Makefile. Oh, and you might want to sprinkle some nice CSS on top. But not JS. Don't do that. That's bad, mkay?

I'm leaving the complete Makefile and script here for convenience, though you can also view the source on Codeberg (or even here and here since I don't keep the source in a separate repo).

Makefile

MD        := $(shell find src -type f -name '*.md')
HTML      := $(MD:src/%.md=%.html)
TEMPLATES := $(wildcard src/templates/*.html)
SCRIPTS   := $(wildcard src/*.sh)

.PHONY: all clean

all: $(HTML)

# Rebuild post lists when posts are updated
index.html posts/index.html: $(wildcard src/posts/*.md)

$(HTML): %.html: src/%.md $(TEMPLATES) $(SCRIPTS)
	@mkdir -p $(@D)
	src/generate.sh $< > $@

clean:
	rm -f $(HTML)

src/generate.sh

#!/bin/sh
srcdir="$(dirname $0)"

die() {
  printf '%s\n' "$1" >&2
  exit "${2:-1}"
}

gettitle() {
  sed -n '1s/^#* *//p' "$1"
}

listposts() {
  find "${srcdir}/posts" -type f -regextype posix-extended \
                         -regex '.*/[0-9]{8}-.*\.md' -printf '%f\n' |
  sort -r |
  while read -r file; do
    filedate="$(date -d "${file%%-*}" +%F)"
    printf '+ [%s] [%s](%s)\n' \
      "$filedate" \
      "$(gettitle "${srcdir}/posts/${file}")" \
      "/posts/${file%md}html"
  done
}

preprocess() {
  while IFS= read -r line; do
    case "$line" in
      '@POSTS@') listposts ;;
              *) printf '%s\n' "$line" ;;
    esac
  done < "$1"
}

input="$1"
echo "$input" | grep -sq '\.md$' || die "Usage: $0 INPUT_FILE.md"

cat <<-EOF
<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="utf-8">
	<link rel="stylesheet" type="text/css" href="/css/style.css"/>
	<title>~jlucas/$(gettitle "$input")</title>
</head>
<body>
	<header>
		$(cat "${srcdir}/templates/header.html")
	</header>
<main>
	$(preprocess "$input" | cmark-gfm --unsafe -e strikethrough -e table -e footnotes)
</main>
<footer>
	$(cat "${srcdir}/templates/footer.html")
</footer>
</body>
EOF