One of the main drawbacks I can think of to a static blog is the lack of easy search function. Because all the files are pre-generated HTML, CSS, and JS, there's no server-side interpreted language that can perform actions and no database of posts which can be filtered. I decided to change this and did a little proof-of-concept on my local machine for how it would work. If you're running a [Hugo](http://gohugo.io/) blog, you can repeat my little experiment yourself! If not, you can follow along and modify parts of it to work with your own system, as this should work with platforms like Jekyll too. ## Elasticsearch And Loading Posts Firstly, set up an Ubuntu virtual machine with it's own private IP and install Oracle's Java 8 JDK (I recommend using the [webupd8team PPA](https://launchpad.net/~webupd8team/+archive/ubuntu/java)), then install a recent version of [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html). Following the linked docs should help you out. Lastly, make sure Python3, PIP and Varnish are installed too. You'll need two Python packages, you can install them with the following command: ``` pip3 install python-frontmatter elasticsearch ``` Now check out your Hugo blog repository. Change into the directory and create a new file, `elasticsearch.py` and fill it with the following: ``` import frontmatter import glob from datetime import datetime from elasticsearch import Elasticsearch es = Elasticsearch() es.indices.delete(index='blog') for file in glob.iglob('content/**/*.md', recursive=True): post = frontmatter.load(file) doc = { 'title': post['title'], 'text': post.content, } res = es.index(index="blog", doc_type='post', body=doc) ``` You can now run this file with `python3 blog.py`. What's it doing? It's going through your blog content file and every Markdown file gets read and has the "frontmatter" (metadata in YAML, TOML, or JSON format) turned into a dictionary. This dictionary is then turned into an Elasticsearch document and indexed for future searching. ## Configuring Varnish Next up, you'll want to edit your `/etc/varnish/default.vcl` file and change the backend port number to `9200` and insert the following inside the `vcl_recv` section: ``` if (req.method != "GET" && req.method != "OPTIONS" && req.method != "HEAD")$ /* We only deal with GET and OPTIONS and HEAD by default */ return (synth(405)); } ``` This limits us to only performing simple GET queries (and preflight checks for CORS) to prevent anyone in the world from indexing or deleting documents from our Elasticsearch node. You'll also want to add some Access-Control headers, so insert the following into the `vcl_deliver` portion: ``` set resp.http.Access-Control-Allow-Origin = "*"; set resp.http.Access-Control-Allow-Methods = "GET, OPTIONS, HEAD"; set resp.http.Access-Control-Allow-Headers = "Origin, Accept, Content-Type, X-Requested-With, X-CSRF-Token"; ``` This will allow us to make search queries in JavaScript from our statically hosted blog to our Elasticsearch node without need for any server in the middle (except our reverse proxy, Varnish). Please restart your Varnish install so it picks up the new configuration and continue on to the next section. ## JavaScript-Based Search Next up let's create our search page. If you're running a Hugo blog, you'll probably want to create a file in the `static` top level directory called `search.html` and fill it with something like so: ``` Search

Search

``` You'll notice I'm using [jQuery](http://jquery.com/) for this example, so make sure you have a similar version if you want to follow along. You'll also want to replace the IP address with your own virtual machine's IP address. You should now be able to generate your site and upload it to your usual host. When you browse to the search page, you can enter searches and it will return up to 10 results showing the titles of blog posts containing those terms ranked by Elasticsearch's full text searching algorithm. ## Further Work Of course, there's bits missing here. We only have post titles, no links! That's an exercise left up to the reader, it'll be dependent on the structure of your blog but you may want to look at how URLs are generated and then match that in your Python script. You'll also need to work in your own method of handling pagination of results. Lastly, consider how you'd turn this into a production setup. You'd probably want it hooked into CI/CD so that when you commit a new post, it rebuilds your site, SSH's into the Elasticsearch server, updates the local repo there and runs the Python script again. ## Conclusion I'm not turning this into a production system for my own site, mostly because Elasticsearch is expensive to run compared to my blog (from US$0.75 to US$10.75 per month) and it was mostly an exercise to see what was possible. There are other alternatives though, if you have a relatively small blog, you could use a JavaScript-based, Lucene-inspired search framework like [Lunr](https://lunrjs.com/guides/getting_started.html). You could have as part of your build process a script that pulls all the metadata, content, and links into your search and sets them up as a JSON document on your search page that is then processed and searched. Keep in mind this scales worse the more posts you have since they'll all be loaded at once. Otherwise, you can do what I remembered after all of this: go to Google and type in "site:adamogrady.id.au keywords i wanted to look for". Bam. Assuming your site is indexed and crawled by Google you'll be able to find what you seek.