Skip to main content
⇠ Blog

Writing my own markdown parser

I wrote my own javascript-based markdown parser - for fun, not for profit 🤪
In fact, you're looking at the final result right now!
§First I should be clear that there are plenty of great MD parsers out there already. I wasn't trying to build anything to compete with them, because I would have done a worse job. And I hate reinventing the wheel.
The main reason was to challenge myself and hopefully learn something by shipping a working solution.
§Early in the process, I realised that to keep things simple and testable, I should create two distinct parts to the process:
  1. Lex (or parse) a block of text to turn it into an annotated structure
  2. Iterate over the structured representation and render suitable components for each part of the document
In the first stage, my goal was to build an array of objects. Each one has a type (eg link) with necessary attributes (eg href and children).
The "children" attribute was special. Each time an array of elements was generated, I would iterate over each new object and re-run the process on it's "children" attribute to generate the nested structure of the document.
In the end, I further divided part 1 above into 3 sub-steps:
  1. Parse the lines for section, subsection and sub-subsection headings - outputting a nested sectioned structure with each heading and its direct children.
  2. Parse the children of each section, identifying any multi-line elements (eg block-quotations, code blocks, lists)
  3. Parse each individual line to find items defined within a given line (eg links, italics etc)
Part 1 has the caveat that if no sections appear in the provided string, we’ll skip sectioning and jump to part 2
Pretty much all these 3 stages involved recursion. Future me will be pleased I added comments and tests 🤩
§I didn't follow the MD specification closely or attempt to implement the full set of features. I only built out the subset of features that I want to use right here on this blog. For example, I didn't support escape characters or tables.
Once the basics were working, I was able to build some additional features that I wished to use. For example, I sometimes use images with separate src attributes for light and dark modes. I also show a placeholder while the image is loading, which means that my image component needs to know the aspect ratio.
Traditional markdown image syntax:
My markdown image syntax:
§It seems to perform pretty well for my limited use-case. Of course, performance and features weren't my main end-goal (otherwise I would have used one of the many tried-and-tested libraries that already exist). I've kept concerns well separated and I've tested the hell out of it, so adding more features in the future should be easy enough!
Most of my photos are licensed under Creative Commons BY-SA 3.0.
If you are unsure about your right to use them please contact me.