Formatinator:

An extensible publishing system


Jan Černohorský

Supervisor: Mgr. Martin Mareš, Ph. D.

What is Formatinator?

Goals

  • Capable markup language
    • Common elements
    • Arbitrary functions
  • Modularity
  • Extensibility

Markdown


                            # Hello world

                            This is a simple document with **bold**
                            and *italic* text, `code` snippets,
                            [links](https://example.com) and

                            - lists
                            - of
                            - things

                            1. even
                            2. ordered
                        

Any sequence of characters is a valid CommonMark document.


_**_**

In Formatinator, such ambiguous syntax is invalid.


                                *Emphasized* and
                                **strongly emphasized** text.
                            

                                _Emphasized_ and
                                __strongly emphasized__ text.
                            

                                Level 1 Setext heading
                                ----------------------
                                Level 2 Setext heading
                                ======================
                            

                                # Level 1 ATX heading
                                ## Level 2 ATX heading
                            

                                - a list
                                - with dashes
                            

                                * a list
                                * with asterisks
                            

                                + a list
                                + with pluses
                            

                                1. One
                                2. Two
                                3. Three
                            

                                2. Two
                                3. Three
                                4. Four
                            

                                1. One
                                4. Four
                                5. Five
                            

                                _Emphasized_ and
                                **strongly emphasized** text.
                            

                                # Level 1 heading
                                ## Level 2 heading
                            

                                * a list
                                * with asterisks
                            

                                #. One
                                #. Two
                                #. Three
                            

                                @variable=Bold(Word("Hi!"))

                                Some text

                                @variable
                            

Some text

Hi!


                                @add(1, 2)

                                @color("red", content: [Some text])

                                @sum((1, 2, 3, 4, 5))

                                @matrix((1, 2, 3; 4, 5, 6))

                                @func({a: 1, b: 2, c})
                            

3

Some text

15

123
456

                                @repeat(2, content: [_Hello_ world!])

                                @repeat(2)[_Hello_ world!]

                                @double[_Hello_ world!]
                            

Hello world! Hello world!

Hello world! Hello world!

Hello world! Hello world!

Mechanisms of extension

  • User-supplied functions
  • User-supplied transformations

                            def foxes_are_dogs(node: Node , **_) -> Sequence[Node]:
                                    if isinstance(node, Word) and node.text == "fox":
                                return [Word("dog", loc=node.loc)]
                                    else:
                                return [node.transform_children(foxes_are_dogs)]
                        

Implementation

Tree-Sitter

  • Parser generator
  • Reads context-free grammar
  • Generates C parser with bindings
  • GLR parser with context-aware lexing
  • Additional external lexer written in C

Internal representation

  • Abstract Syntax Tree
  • Path copying algorithm preserves
    results of transformations
  • Base types for nodes

Transformations

  • Functions with right signature
  • or classes with overloaded __call__
  • output generation implemented
    with a transformation

Test suite

  • Corpus-based
  • Self-updating
  • 100+ test cases

Future work

  • Standard library
  • Global identifiers
  • Tools for editors

Goals

  • Capable markup language
    • Common elements
    • Arbitrary functions
  • Modularity
  • Extensibility