Making the leap: porting our Python data science blog to Quarto

Quarto is the hot new technical publishing system. Built by the company formerly known as RStudio, I decided to see how it would work for this Python-oriented blog.

meta
Author

Brian Kent

Published

October 12, 2022

Quarto is the hot new thing in data science blogging. It hit my radar in late July via a Fast.ai blog post and Rstudio’s announcement of its re-branding to Posit, just as I happened to be looking for a new blog system to re-start this site.

It’s always hard to decide whether to invest the time to learn a new tool and overhaul the current (functioning) system. Is Quarto worth the trouble? Here’s my thought process and how it turned out for this “Python-forward” blog, plus some how-to tips to help your own transition go smoothly.

What is Quarto?

Quarto is a new open-source system for literate programming and publishing. Just as with a Jupyter notebook, you write markdown text interleaved with executable code blocks. In fact, for Pythonistas like me, Quarto uses Jupyter to execute the code and generate code output. Quarto then uses Pandoc to style the code and render it all in a very nice looking document. In fact, as data science celeb Sean Taylor said:

Quarto the new hotness for rendering documents from Jupyter notebooks. If you’re reading this and think “hey this document looks nice!” it’s 100% because of Quarto.

Formulating the decision

There are a million static site generators and it’s impossible to evaluate them all and to find the perfect one. Instead, I asked myself:

  1. What are my requirements?
  2. How well does Quarto meet these requirements?
  3. Is Quarto substantially better than what I currently use?

So what are my requirements for this blog? At a high level, I want:

  • Speed. I want to spend my time thinking and writing about data science, not blog publishing systems.
  • Nice looking output, including for technical material like code and math. I want my clients to know that I’m a professional.
  • Low cost. It’s just a static site, it shouldn’t cost much.

For me, these general principles generated more concrete requirements:

  • Write in markdown, not HTML or Jinja2 templates.

  • Literate programming, as much as possible. Text, code, and code output should live together in the same document. For me, it’s primarily Python code plus a little SQL.

  • The text and code source should be under version control.

  • The blog is one part of a larger website and should share the navigation and style of the overall site.

  • I should be able to customize the output with HTML and CSS, if necessary.

I originally chose Webflow as the anchor for my blog system. At the time, I had aspirations to create content more complex than linear blog posts and Webflow offers a content management system side by side with low-code design tools and hosting. To make Webflow work with my requirements, though, I had to write a lot of custom Python code to render and upload content. Here’s how I compared that system with Quarto, a priori.

Factor Old System Quarto Plan Winner
Host Webflow Netlify Tie
Cost $23/month Free Quarto
Authoring language Markdown with HTML for figures and footnotes Markdown Quarto
Authoring environment VS Code VS Code Tie
Version control Yes Yes Tie
Code execution Separate Python scripts and Jupyter notebooks, with manually copied output quarto render or quarto preview Quarto
Site design Manual, with Webflow low-code tools, extensive custom CSS Choose a Bootswatch theme. Small tweaks with CSS files Quarto
Publishing Ad hoc scripts to render content and push to Webflow CMS quarto publish Quarto
Appearance OK Good Quarto

I didn’t bother to quantify the ROI; it was clear that if Quarto lived up to its billing, it would be much faster and simpler than my old site, and the output would look better. The biggest question mark was how to nest a Quarto blog inside a larger website, but I was assured by the Quarto documentation:

You can create websites that consist entirely of a single blog, websites that have multiple blogs, or you can add a blog to a website that contains other content.

Making it happen

Bottom line: the transition to Quarto was smooth and mostly painless. Looking back, I think there were a few keys to the switch:

  • Most importantly, I committed to trusting Quarto’s defaults and avoiding customization unless absolutely necessary.
  • Quarto’s tutorials and user guide are good, and got me 80% of the way there. When I got stuck, I turned to a couple examples of sites built with Quarto: Fast.ai’s nbdev and Quarto’s own website.1
  • My old blog content was also mostly in markdown, so it was mostly copy and paste kinda work.
  • This site is pretty simple, with just a landing page and blog.

I did hit some snags. Here’s how I worked through them, to help make your own transition a little easier.

Set up and file hierarchy

The first decision I made was to build the whole Crosstab site with Quarto, not just the blog. Although the Quarto documentation says this is possible, it doesn’t show directly how to do it. So here’s what I did.

First, I used Quarto’s create-project command to create a website project, not a blog.

Terminal
quarto create-project mysite --type website
cd mysite

Then I mimicked a Quarto blog project inside mysite. I created a folder blog for the posts, a listings page, and metadata file.

Terminal
mkdir blog
cd blog
touch index.qmd
touch _metadata.yml

Each post is then its own folder, with an index.qmd file and optionally a CSS file, requirements.txt, and images.

Terminal
mkdir my-new-python-post
cd my-new-python-post
touch index.qmd
touch requirements.txt

Ultimately, the overall file structure looks like this:2

📦mysite
 ┣ 📂.quarto
 ┣ 📂_site
 ┣ 📂blog
 ┃ ┣ 📂my-new-python-post
 ┃ ┃ ┣ 📜index.qmd
 ┃ ┃ ┗ 📜requirements.txt
 ┃ ┣ 📜_metadata.yml
 ┃ ┗ 📜index.qmd
 ┣ 📜_quarto.yml
 ┣ 📜about.qmd
 ┣ 📜index.qmd
 ┗ 📜styles.css

Sitemaps and canonical URLs

Quarto can automatically generate a sitemap, although it doesn’t seem to be mentioned in the documentation at all. Following this blog post, just set site-url in the site metadata in the _quarto.yml file.

Quarto’s auto-generated sitemap is also a little funky in that it lists each page’s with the index.html filename, which isn’t necessary and is not typically part of the canonical URL.3

For example, the address of the page you’re reading right now is canonically

https://www.crosstab.io/articles/porting-to-quarto

But Quarto’s sitemap lists it as

https://www.crosstab.io/articles/porting-to-quarto/index.html

Not a huge deal, but I wrote a tiny Python script to remove this suffix from each entry in the sitemap.

import os

with open("_site/sitemap.xml", "r") as f:
    sitemap = f.readlines()

sitemap = [x.replace("/index.html", "") for x in sitemap]

with open("_site/sitemap.xml", "w") as f:
    f.writelines(sitemap)

Quarto will run this script automatically as part of each render, if included in the post-render field of the site metadata in _quarto.yml.

Speaking of canonical URLs, Quarto doesn’t have a built-in place to specify them, so I include them manually in each page’s header. In the markdown file for this page, for example, the format field of the metadata looks like

format:
  html:
    include-in-header: 
      text: <link rel="canonical" href="https://www.crosstab.io/articles/porting-to-quarto/">

Working with virtual environments is clunky

I use a separate conda virtual environment for each Python article and I don’t see a super smooth way to manage this with Quarto, unfortunately.

My workaround is pick a slug for each article and to use it for both the article folder name (which becomes the URL) and the conda environment name. Then I just have to remember to activate the virtual environment for the article I’m working on. Fortunately, Quarto can freeze other articles so they don’t get overwritten in a different execution environment. Still, I wouldn’t call this a solution, and it makes me nervous. I hope the Quarto devs think of a more elegant workflow.

Custom CSS

While I did commit to trusting Quarto’s design defaults, it was helpful to know some CSS for final style tweaks. Bootstrap in particular is especially useful for laying out landing pages, although not critical for blog posts. Even for blog posts, though, I made some small style changes. For example, to change table headers to match the site theme I added this snippet to a file styles.css in the blog folder.

tr.header {
    background-color: #2C3E50;
    color: white;
    vertical-align: middle;
  }

How it turned out

I’m quite happy with my decision to port the site to Quarto. As anticipated, the site looks better, and it’s much faster to create and edit new content. I suspect Quarto’s rough edges will be smoothed out in the coming months as Posit gathers feedback and iterates. In any event, the Quarto snags I hit were minor compared to the time I spent tweaking and writing custom rendering scripts in my previous Webflow-based system.

References

  • Listing image by Kimberly Farmer on Unsplash.

Footnotes

  1. This site is now another exemplar, of course.↩︎

  2. I’ve ommitted the contents of the .quarto and _site folders for clarity, and in my case I deleted the about page.↩︎

  3. I am not at all an expert in SEO, so please send me a note if you think I’ve got this wrong.↩︎