Atac Template Engine

Markus Jahn, University of Linz

Last update: April 03, 2012


Downloads | Data Structures | Output Generation

The Atac Template Engine generates text from tree-like data structures such as abstract syntax trees. It traverses a tree guided by templates, which provide the output for the tree nodes and link to potential child nodes. Additionally the user can implement handlers for text generation, in order to generate text for nodes, where the output cannot be specified declaratively in a template.


Basic Idea

To generate text from a tree-like data structure, the user has to specify a template for every node type in the tree. A template consists of a name that matches the name of the node type, and a body that provides the output for the node. The output is composed of static text and placeholders. During text generation, the placeholders get filled by the template engine, either by the output of a referenced child node, or by a value that is stored in a data field of the current node.

The example in Fig. 1 shows how the template engine uses templates to generate C# code from an abstract syntax tree (AST). The AST represents a declaration of a variable x1 of type Integer. The root of the AST is a VarDecl node that has references to a type node Integer and a name node VarId (note, that type could also point to some other type node such as Char or Boolean). The VarId node stores the variable name x1 in its data field value. As we have three different node types, we have to specify three templates for code generation. Each template starts with a template name, followed by a '='. The end of a template is marked with a '.' in a new line. In between, the template body provides the output with text and placeholders. For example, the template with the name VarDecl has a body that consists of the text "public ", followed by the placeholder <type>, the text " ", the placeholder <name>, and finally the text ";".

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
VarDecl =
public <type> <name>;
.

Integer =
int
.

VarId =
<value>
.
>>>
>>>
1
public int x1;
Fig. 1: Using templates to generate code from an abstract syntax tree

The template engine starts at the root node of the AST and generates output for the whole tree. It applies the matching template VarDecl to the root node VarDecl and composes the output of the AST by performing the following steps:

  Put the text "public ".
  Process the placeholder <type>:
      Check which node is referenced by type -> Integer.
      Apply the matching template Integer to the node Integer:
          Put the text "int".
  Put the text " ".
  Process the placeholder <name>:
      Check which node is referenced by name -> VarId.
      Apply the matching template VarId to the node VarId:
          Process the placeholder <value>:
              Check which node is referenced by value -> none.
              Check if data field value exists and is not null -> true.
              Put the data field value "x1".
  Put the text ";".

Thus, the template engine has generated the code for a C# variable declaration: public int x1;


Data Structures

The template engine supports different implementations of tree-like data structures, by allowing individual tree navigators. Users can implement their own tree navigator for their own tree implementation, however, the template engine already comes with tree navigators for common tree implementations.

This section shows a common tree implementation which can be passed to the template engine by using its default tree navigator. Every node type has to be implemented as an individual class (usually derived from a common base class). Nodes can have any number of child nodes and any number data fields. Both, child nodes and data fields, are implemented as fields or properties of a node class. If the type of a field is a node class the field represents a child node else it is an ordinary data field.

The left box in Fig. 2 shows a set of classes that is used for the implementation of the AST in the examples of this document. Class Node is the common base class for nodes. Class VarDecl derives from Node and has two fields: type and name. Their data types are subclasses of Node, so both fields are references to child nodes. Since Integer is a subclass of Type it can be referenced by type as well. Class VarId has a field value of type String and is therefore an ordinary data field.

The right box in Fig. 2 shows how the tree nodes can be instantiated, initialized, and connected in order to produce the AST shown in Fig. 1. Further information about building an AST during parsing can be obtained from the tutorial How to Build Abstract Syntax Trees with Coco/R. Furthermore, the Downloads section of this document provides some template specifications that generate executable C# code from the AST built in that tutorial.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
abstract class Node { ... }

class VarDecl : Node {
  Type type;
  VarId name;
}

abstract class Type : Node { ... }

class Integer : Type { ... }

class VarId : Node {
  String value;
}
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
Type type = new Integer();
...

VarId varId = new VarId();
varId.value = "x1";
...

Node varDecl = new VarDecl();
varDecl.type = type;
varDecl.name = varId;
...
Fig. 2: Implementation of an abstract syntax tree with individual classes for different node types

Output Generation

Once we have built an AST and have written templates for all node types in the tree, we need to initiate the template engine to generate the output from these ingredients. Fig. 3 shows how to create an instance of the template engine. The constructor uses the generic type argument <Node> to specify the common base type of the nodes. The templates are set by the constructor's template file argument "Templates.txt". Next, we create a StreamWriter object, which writes the output to a file "Output.cs". Finally, we call Start and pass the writer and the root of the AST to the method.

 1
 2
 3
 4
 5
 6
 7
Node root = new VarDecl();
...

var templateEngine = new TemplateEngine<Node>("Templates.txt");
using(TextWriter writer = new StreamWriter("Ouput.cs")) {
  templateEngine.Start(writer, root)
}
Fig. 3: Starting the template engine

Template Handlers

Sometimes it is not possible to specify an output declaratively in a template. For such cases, the template engine allows the user to implement a custom template handler instead. A template handler is a method that is called during output generation in order to generate the node's output by code. The template handler gets a reference to the currently visited node as well as a reference to the template engine's code generator. A template handler is called if the name of the node type matches the name of the handler. If the user specifies both a template and a template handler the template engine just calls the template handler. However, the implementation of the template handler can instruct the code generator to generate the output from the template.

Fig. 4 shows a scenario that cannot be realized without a template handler. In C#, variable names can be equal to C# keywords if they are prefixed with '@'. In order to guarantee that the output of the VarDecl AST is valid C# code, we need to check if the variable name equals a C# keyword and possibly put a '@' prefix before the name. Thus, we implement a template handler for the node VarId. The template handler casts the passed node to VarId and checks if the name is a C# keyword. Since "event" is a C# keyword, the handler emits the prefix '@' by calling the code generator's method PutText. Calling PutTemplate and passing it the VarId node finally applies the specified VarId template to the node and emits the variable name "event".

 1
 2
 3
 4
 5
 6
 7
 8
 1
 2
 3
 4
 5
 6
 7
 8
 9
void VarId(Node n, CodeGenerator<Node> cg) {
  var varId = (VarId) n;
  if (IsKeyword(varId.value)) {
    cg.PutText("@");
  }
  cg.PutTemplate(varId);
}
...
VarDecl =
public <type> <name>;
.

VarId =
<value>
.

...
>>>
>>>
1
public int @event;
Fig. 4: Using template handlers combined with templates for code generation

All template handlers have to be implemented as void methods of a single template handlers class that can have an arbitrary name. In order to register the template handlers, the template engine provides several overloaded constructors. The user can register the template handlers by passing either a path to the source file of the template handlers class or by passing the template handlers class or an instance of it. Fig. 5 shows how the class TemplateHandlers can be passed to the template engine. Note, that this class must provide also a parameterless constructor.

 1
 2
 3
...
var templateEngine = new TemplateEngine<Node>("Templates.txt", typeof(TemplateHandlers));
...
Fig. 5: Registering template handlers to the template engine

Node Collections

In some tree data structures it is useful to reference a whole collection of child nodes instead of a single child node. In an AST, for example, this is often used to reference a sequence of statement nodes or a sequence of declaration nodes. The following VarDecl example uses a List<VarId> to reference a list of VarId child nodes, thus it can hold a list of variables with the same data type in a single VarDecl node. Fig. 6 shows how the VarDecl class should be changed for that purpose.

 1
 2
 3
 4
 5
 6
...
class VarDecl : Node {
  Type type;
  List<VarId> names;
}
...
Fig. 6: Referencing a list of variable names

The template engine processes a referenced node collection by repeatedly applying the matching template (or template handler) to every node in the collection and writing their individual output to the output stream. In order to put a separator between the node outputs, the user can modify the placeholder in the VarDecl template that references the child node collection. The placeholder can be augmented by a separator expression that consists of the keyword sep and a string representing the separator. Fig. 7 shows the modified VarDecl template, where the placeholder names specifies a separator expression that puts the string "" between every output of a VarId child node.

 1
 2
 3
 4
 5
VarDecl =
public <type> <names sep", ">;
.

...
>>>
>>>
1
public int x1, x2;
Fig. 7: Generating output of a node collection with a separator between the individual node outputs

Conditional Output

Sometimes users want to emit certain parts of a template only if optional child nodes exist or if certain data fields have non-null values. This can be done by attaching a string prefix and/or a string suffix to a placeholder. Such strings are only emitted if the node corresponding to the placeholder exists or if the data field corresponding to the placeholder has a non-null value. As an example, consider the initialization part of a variable declaration. The initialization part should only be generated if the declaration has an initialization expression. As shown in Fig. 8, the initialization expression can be represented in the AST by a child <expr> of the VarId node. In the VarId template the <expr> placeholder is prefixed with a string " = ", so this string and the code for expr are only emitted if expr references a node.

 1
 2
 3
 4
 5
 6
 7
...

VarId =
<value><" = " expr>
.

...
>>>
>>>
1
public int x1 = 23;
Fig. 8: Generating conditional output depending on the existence of a placeholder value

Alternative Output

In some cases it is useful to emit alternative output for an optional child node that does not exist or for a data field that has the value null. In order to do this, the user can define alternative output in a placeholder. The node MethodDecl in the AST in Fig. 9 has an optional child node type, which references the node of the method's return type. If the node does not exist the method has no return type. Thus, the MethodDecl template specifies in the placeholder <type> the alternative output "void" that is used if the type node is not available.

 1
 2
 3
 4
 5
 6
 7
...

MethodDecl =
<type | "void"> <name>(<pars sep", ">) <block>
.

...
>>>
>>>
 1
 2
 3
void Foo() {
  ...
}
Fig. 9: Generating alternative output depending on the existence of a placholder value

Template Links

Normally, the template engine matches the names of node types against the names of templates and template handlers. However, in order to allow users to reuse a template for different node types, the template engine supports so-called template links. A template link forces the template engine to apply a specific template for a node that is referenced by a placeholder. Fig. 10 shows two applications for template links:

In the MethodDecl template, the placeholder <pars> for the method's parameters uses the template link <pars:VarDecl> to force the template engine to use the template VarDecl for <pars>. Although <pars> references a collection of ParDecl nodes, the user does not need to specify a ParDecl template. Due to the template link, the engine applies the VarDecl template to every ParDecl node in the collection. Similarly, in the VarDecl template the template link <names:VarId> forces the engine to apply the VarId template to all nodes in the names collection, independent of their actual node type. Thus, the VarDecl template works with child nodes of types VarId and ParId.

The second example of a template link shows how to reuse a template from within other templates in order to generate common parts of these templates. In Fig. 10 every declaration node (ClassDecl, MethodDecl, VarDecl, ...) of the AST has a data field visibility with the possible values '+', '-', '#', or ' ' (none), which should be translated to the C# access modifiers public, private, protected, or none. Since the visibility is a feature of all declaration nodes, we use an auxiliary template handler Visibility and invoke it with the template link <this:Visibility> in every declaration template to generate the desired access modifiers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
...

MethodDecl =
<this:Visibility><type | "void"> <name>(<pars:VarDecl sep", ">) <block>
.

VarDecl =
<this:Visibility><type> <names:VarId sep", ">;
.

...
  1
  2
  3
  4
  5
  6
  7
  8
  9
void Visibility(Node n, CodeGenerator<Node> cg) {
  var decl = (Declaration) n;
  switch (decl.visibility) {
    case '+': cg.PutText("public "); break;
    case '-': cg.PutText("private "); break;
    case '#': cg.PutText("protected "); break;
  }
}
...
>>>
>>>
 1
 2
 3
public void Foo(int p1) {
  ...
}
Fig. 10: Using template links to apply a specific template for various nodes

Output Format

In order to generate formatted output, the template engine considers spaces, tabs and line feeds in the templates. Furthermore, it maintains the indentation of the template in the generated output. This means, if a placeholder is indented and the output of a referenced node or data value has multiple lines, each line starts with the same indentation as the placeholder. To control how many line feeds are emitted before and after a template output, the user can specify the minimum number of preceding and succeeding line feeds in the template header. On applying these line feed specifications, the template engine takes into account how many line feeds are actually emitted to the output stream. Finally, the template engine removes empty lines, caused by a missing child node that is referenced by a placeholder in a single line.

Fig. 11 shows an example for formatted C# code produced by the template engine. The template UnitDecl for the root node in the AST has a placeholder <usings> for optional unit imports. Since the actual AST has no unit imports, the output of the AST would start with an empty line. However, the template engine removes empty lines caused by empty placeholders and starts the output with the declaration of the namespace "Test".

The indented placeholder <decls> references the ClassDecl node "Prog", whose output is emitted to the output stream using the same indentation as the placeholder. The ClassDecl template specifies that its output should start after two preceding line feeds. The first line feed is already emitted to the output stream by the line feed in the UnitDecl template in the end of line 3. Therefore, the template engine only puts one more line feed before emitting the class declaration to the output stream. The template MethodDecl specifies two preceding and two succeeding line feeds, and the template Block specifies one succeeding line feed. Although, the output of the Block template is at the end of the MethodDecl template and specifies a succeeding line feed, the template engine emits just two succeeding line feeds in total after the output of the method declaration.

 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
UnitDecl =
<usings>
namespace <name> {
  <decls>
}
.

ClassDecl[2] =
class <name><" : " base> {
  <decls>
}
.

MethodDecl[2,2] =
<type | "void"> <name>(<pars sep", ">) <block>
.

Block[,1] =
{
  ...
}
.

...
>>>
>>>
 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
namespace Test {

  class Prog : Obj {

    int Main() {
      ...
    }

  }
}
Fig. 11: Generating formatted output by considering indentations and line feeds


Downloads

Source (C#)
Binary (.NET)