cmTemplate

A Template-Based Content Generator for Python

by Chris Monson: shiblon@yahoo.com


Sections




Abstract

The cmTemplate module was originally designed to help writers of CGI programs to keep HTML tags out of their code. It is a template parsing and realization engine that allows coders to focus on logic, and designers to focus on format. It sports a very expressive and powerful syntax that is reminiscient of (and borrowed heavily from) Python itself, which we all know is the best language around :-).

Templates have special PHP-style tags embedded in them. These tags are processed by the cmTemplate engine, and are replaced with appropriate text. Simple text replacement, loops, file inclusion, if-elif-else, and other such structures are available in the template language, as well as the ability to execute arbitrary Python code.

cmTemplate was written with performance in mind. It parses files only when necessary, and is lightning fast at producing output from a parsed file. The memory footprint is minimal once the file has been parsed. It can and will play very nicely with things like FastCGI and the various mod_python variants that exist in the wild today.

Tutorial

In order to use the template engine, you will need three things:

  1. The cmTemplate module (check)
  2. A template file
  3. Some Python code to realize the output

Examples seem to be the best way to introduce new concepts, so here is an example. First we have a template file (save this as 'ex1.ctpl'):

    <?=comment ex1.ctpl ?>
    \<?=echo 'stuff' \?>
    \\<?=echo 'stuff' ?>

    \\\<?=echo 'stuff' \\\?>

    Var1 is '<?=echo var1 ?>'

    <?=for v in var2:?>
    v is <?=echo v ?>

    <?=endfor?>
Next, we have some Python code. In this code, we set variables (you can see them in the template above: var1, and var2) and then we instruct the template to display using these variables:
    import cmTemplate
    import sys

    # Load and parse the template file.
    tpl = cmTemplate.Template('ex1.ctpl')

    # Create a namespace for it so that we can set variables
    t = tpl.new_namespace()

    # Set variables in the template
    t.var1 = 'hello'
    t.var2 = [1,2,3,4,5]

    # Finally, output the template
    t.output(sys.stdout)
Run this little program, and you will find that the output looks like this:
    <?=echo 'stuff' ?>
    \stuff
    \<?=echo 'stuff' \?>

    Var1 is 'hello'

    v is 1
    v is 2
    v is 3
    v is 4
    v is 5

So, what did we do here? First of all, we created a template. This template described the format of the output, but not the actual values that would be displayed. It used a special syntax to describe where and how those values would be displayed, but not the values themselves.

Let's take a closer look at the template:

    <?=comment ex1.ctpl ?>
Notice that the comment didn't appear anywhere in the output. That's because it's a comment. The entire comment tag, from <?= to ?>, and including the newline, were consumed and discarded by the template parser. This brings up an important point: whenever the parser encounters a special tag, the entire tag is consumed by the parser. The tag is replaced, when appropriate, with values specified by the Python program that realizes the template. In this case, we are dealing with a comment, which makes use of no variables, so it is simply discarded.

Note also that there was no newline in the output where the comment appeared. This brings up another important point: if an ending tag symbol (?>) is followed directly by a newline, the newline is also consumed. If followed by more than one newline, only the first will be consumed.

You may wonder why I did it this way. The answer is simple: experience. I have used template engines in the past that did not have this behavior, and it became very ugly to try to get the desired formatting out of them. In fact, in some cases (as in those of loops), it was nearly impossible to achieve the desired output formatting. PHP uses this concept, and I have found that it works very well, and gives the template designer complete control over the output while leaving the template syntax fairly readable (as Python users, we know how important whitespace can be when aiming for readability).

The next few lines are there to illustrate a point about escaped tags:

    \<?=echo 'stuff' \?>
    \\<?=echo 'stuff' ?>

    \\\<?=echo 'stuff' \\\?>
If you want to display the text of a starting or ending tag symbol in your output, you need to precede that symbol with a backslash. But, what if you want to display a single backslash and then the contents of a variable? What if you want to display the text of a symbol preceded by a backslash?

Note that I showed examples of 1, 2 and 3 backslashes preceding start and end symbols. Here is how they behave (by number of preceding backslashes):

  1. \<?= The text of the symbol is sent to the output.
  2. \\<?= Display one backslash, then process the tag.
  3. \\\<?= Display one backslash, followed by the symbol itself.
You can see this behavior in the output of the program.

The rest of the template code, now that we have those basics out of the way, should be easy to explain. Let's go on to the next line:

    Var1 is '<?=echo var1 ?>'
When the template is realized (the output function does this), anything that is not inside of a special tag is printed just as it is. Thus, no matter what values the var1 and var2 variables take on in the Python code, the output will contain the string "Var1 is '".

Notice that after that string is a special tag: <?=echo expression ?>. This special tag is always replaced with the value of expression. For example, you could print out 'hello' by using the following tag: <?=echo 'hello' ?>. In this particular case, we were interested in printing the contents of var1, which is what happened.

Notice that the character after ?> in this case is a single quote, not a newline. That's why a blank line is printed following the 'Var1 is' line. If we had not surrounded the special tag in single quotes, the newline after the tag would have been consumed and there would be no blank line after it.

The next construct is a for loop:

    <?=for v in var2:?>
    v is <?=echo v ?>

    <?=endfor?>
This behaves exactly as a Python for loop. The variable 'v' is a temporary created for the loop, and is set to each of the elements of var2, just like in Python.

What's different is the fact that this requires an endfor tag. Since we are dealing with text-based templates that are used in HTML and other unspecified formats, I couldn't really make the template engine use whitespace as a delimiter. Sorry about that. I would have loved to do so in keeping with Python's great tradition. Alas, we don't live in a perfect world :-).

Another notable difference is that the for loop contains text, not code. Remember, anything that is not inside of a special tag is simply sent to the output. In this case, however, the innards of the for loop are sent to the output as many times as the loop runs. Notice that there is an echo tag inside of the for loop. It has access to the local variable 'v'. It should be obvious from looking at the output how this works.

Finally, notice that there is a blank space below the 'v is' line, but that blank spaces do not appear between each line in the output! This is because the newline after the echo tag was consumed. Remember that? Good. It can bite you at times. It has bitten me in the past.

Well, with that, you are ready to learn about all of the wonders of cmTemplate! The syntax has many more constructs that will be of interest to you, notably:

All of these will be explained in detail in the reference section, coming up next.

Reference: Template Syntax

echo

The echo tag is the most basic, fundamental construct in the template engine. Most template engines out there provide only this functionality.

The echo construct uses the following tags:

        <?=echo expression ?>
        <?= expression ?>
        

These two tags are basically the same thing. The latter is merely a shortcut notation for the former. Both will be replaced by the value of 'expression'. The expression can be any valid, single-line Python expression.

comment

The comment tag gives the template designer an opportunity to insert comments in the template that will never see the light of day in the final output. The comment tag and all of its contents are simply discarded by the parser.

        <?=comment This is a comment ?>
        <?=comment
            They can also span multiple lines.
            Like this
        ?>
        
inc
rawinc

The inc tag is replaced with the contents of another template wherever it is found. The included template is parsed just like the main template, and any includes that it has are also parsed. Recursive includes are detected and disallowed.

The rawinc tag is similar to the inc tag, except that the contents of the file are sent to the output exactly as they are without any processing. It includes the file "raw". Even if a raw file contains special template tags, they will not be processed. They will be treated as though there were escaped tags.

The tags for these are as follows:

        <?=inc 'filename' ?>
        <?=rawinc "filename" ?>
        

Quotes are required, but they may be either single or double quotes. The filename can be absolute or relative. If it is relative, it will be searched for in the current directory, followed by the path indicated when the template object is created. If it is not found, an exception will be raised.

NOTE: The filename must be a string literal. The template engine cannot interpret Python expressions in include constructs, as they are evaluated at compile time (more on template compilation later).

def
call

The def construct allows you to create a 'template function'. This function can be called using the call tag. It is useful to think of the def as a template file that can be passed parameters, and the call tag as a way of including that file.

The tags used for these constructs follow:

        <?=def defname( arg1, arg2, ... ):?>
        <?=enddef?>
        <?=call defname( expr1, expr2, ... )?>
        

The def construct is like a mini-template that you can define anywhere inside of any template. The def and enddef tags and all content in between them will be removed when your template is processed. Wherever a call tag appears, the mini-template will be processed and the resulting text will appear there.

An example would be instructive:

Here is a template:

        <?=def mini( a, b ):?>
        A = <?=echo a?>

        B = <?=echo b?>

        <?=enddef?>
        <?=call mini( 1, 2 )?>

        <?=call mini( 3, 4 )?>

        <?=call mini( 'hello', 'there' )?>
        
Here is the output:
        A = 1
        B = 2

        A = 3
        B = 4

        A = hello
        B = there
        
if
elif
else

These behave exactly as you would think they should. if and elif can take expressions that, if they evaluate to true, will cause the block to output. As in normal Python, the elif and else tags are optional.

Here is an example of a complete if construct:
        <?=if var1:?>
        'struth!
        <?=elif var2:?>
        This is an ex-parrot!
        <?=else:?>
        It's getting-hit-on-the head lessons in here.
        <?=endif?>
        

As you would expect, the output of this construct will depend on the values of var1 and var2. If var1 is true, then the output will be

        'struth!
        
If var1 is false and var2 is true, then the output will be
        This is an ex-parrot!
        
And, of course, if they are both false, then the last one is displayed:
        It's getting-hit-on-the head lessons in here.
        

for
else
break
continue
for_count( depth=0 )
for_list( depth=0 )
for_index( depth=0 )
for_is_first( depth=0 )
for_is_last( depth=0 )

The for construct works just like a Python 'for' loop. Just like Python's 'for' loop, it can have an else tag, which will output if the loop exits normally. Two other tags are also allowed: break and continue. These behave exactly the same as their Python counterparts.

An example:
        <?=for x in range(1,10):?>
        X is <?= x ?>

        <?=else:?><?=comment This is optional ?>
        <?=endfor?>
        
And the output would be
        X is 1
        X is 2
        X is 3
        X is 4
        X is 5
        X is 6
        X is 7
        X is 8
        X is 9
        
Note that you can use any list expression in the for loop.

Finally, while inside of a for loop, you have access to several functions, all of which can take an optional depth parameter. The functions all start with for_:

for_count( depth=0 )
The size of the iterated sequence. This may not work with iterators.
for_list( depth=0 )
The list expression used in the loop. May not work with iterators or xrange objects.
for_index( depth=0 )
The current position in the loop, as though it started counting at 0. Always works.
for_is_first( depth=0 )
Returns true if this is the first time through the loop. May not work with iterators.
for_is_last( depth=0 )
Returns true if this is the last time through the loop. May not work with iterators.

while
else
break
continue

This is of dubious utility, since it is somewhat difficult to make the expression in the while tag become false. However, it exists to make things like generating tables from databases easier. You are allowed to set function variables in the template's namespace, allowing you to effectively create a callback so that you don't have to iterate over an in-memory list of items. This can help with memory efficiency as the template is realized.

As is the case for the for construct, the while construct can have an optional else block and has access to the break and continue tags.

        <?=while not_finished():?>
        <?=echo get_next_item() ?>
        <?=endwhile?>
        
exec

This allows you to execute arbitrary Python code. Beware of the scope, as it may not be what you think it is. I try to stay away from this construct as much as possible, since the whole idea of the template engine is to separate code from text in a clean way. It is provided for the sake of completeness.

An example:
        <?=exec
        import sys
        sys.stderr.write('This is a debug message\n')
        ?>
        

That will print a debug message to stderr, which is not something you can do in any other way from a template.

That wraps up the syntactical details.

Reference: Template Objects

There is a lot more of this story to be told. The engine has some interesting features that aren't obvious on the surface. Here we discuss template object creation, functions that help us to check file freshness, and viewing generated code, among others.

Template Methods

When creating a template, it is important to note that an instance of cmTemplate.Template represents one template file. When creating a template object, you specify the filename, and you never change it thereafter. It is rarely necessary to have more than one object with the same filename, since you can create as many template namespaces as you like from that object.

In fact, it is useful to think of a cmTemplate.Template instance as the template's class. It defines how the template will work when a namespace is created for it. The namespace in this case is analagous to an instance of a class. The namespace allows you to set variables and call functions on it. The template just allows you to create similar namespaces based on a particular file.

With that, let's get to the meat of this section:

Template( filename, path=None, keep_code=0 )

The Template constructor takes a filename as its first argument. This is the minimum requirement. Two keyword arguments are also available (they aren't strictly keyword arguments, but I like to treat them that way to increase readability): path and keep_code.

path
A list or tuple of directory names, either absolute or relative to the current working directory. Defaults to the current working directory.
keep_code
A true value will indicate that the generated Python code should not be discarded once parsing is complete. Defaults to false.

new_namespace( )

As previously explained, this creates a new namespace for the template. It can also be viewed as an instantiation of the template.

Note that you can call this as many times as you like to get different instantiations of the template, all with their own namespaces.

        tpl = cmTemplate.Template('somefile.ctpl')

        # create an "instance" of this template
        n1 = tpl.new_namespace()
        n1.a = 'hello'

        # create another instance of this template
        n2 = tpl.new_namespace()
        n2.a = 'good-bye'

        # Now, output them:
        n1.output(sys.stdout)
        n2.output(sys.stdout)
        
Each of the output functions will output different values, since the namespaces for them are completely separate.

is_root_modified( )

Returns true if the root template file has been modified since it was last parsed.

is_include_modified( )

Returns true if any of the files that this template includes have been modified since the template was last parsed.

is_modified( )

Returns true if the template or any of its included files have been modified since the last time they were parsed.

reload( )

The reload method simply reparses and recompiles the template. This can be called in case the template or any of its included files have changed. A common idiom would be something like this:

        if template.is_modified():
            template.reload()
        

This function and the modified tests are provided so that you can check a file (cheap) before reparsing it (expensive). If you were using something like FastCGI to display script output, you would probably do something like this:

        def initialization():
            global tpl
            # This is expensive and should only be done once
            tpl = cmTemplate.Template('mytemplate.ctpl', path=['path/1'])

        def main_loop():
            # Check to see if the template has been modified, and if it has,
            # reparse it (expensive, but only done if modified).
            if tpl.is_modified():
                tpl.reload()

            # This is a very cheap operation, and can be done each time.
            # In fact, it SHOULD be done each time, since you usually don't
            # want stale values from the last execution to remain in memory
            # (they could be passwords, etc).

            t = tpl.new_namespace()

            t.var1 = 'hello'

            t.output( sys.stdout )
        
If you are using FCGI, you would call initialization at the beginning then call the main_loop inside of your request loop. If you are using one of the mod_python variants, you'll have to figure out how to call the initialization function once, then call the main_loop function once per request after that.

filename( )

Returns the name of the file that was parsed. This cannot be changed once the object has been instantiated.

filetime( )

Returns the mtime of the file at the time it was parsed.

includes( )

Returns a dictionary of the included files. The key is the absolute filename, and the value is the mtime of the file when it was parsed.

python_code( )

Returns the Python code (text) that was generated from parsing the template. This is only available if keep_code=1 was passed into the constructor.

remove_python_code( )

Frees up the memory used to keep the Python code around. This allows you to pass keep_code=1 to the constructor, and then delete the code after viewing it. It does not affect the template's operation, since the template keeps a compiled code object around.

python_code_object( )

Returns a code object that was obtained by calling the compile function on the generated Python code. This code object is used to generate new namespaces. It cannot be deleted.

Reference: Namespaces

Remember, the template object is like a template class, and the namespace is like an instance of that class. The template object provides the instructions for template behavior in the abstract. The template namespace allows us to specify that behavior exactly by setting variables and calling methods.

Namespace Methods

The namespace is an object of type cmTemplate.Template.Namespace. This is how it is obtained:

    class __Namespace: pass

    def new_namespace( self ):
        n = self.__Namespace()
        exec self._code_object in n.__dict__

        return n

The compiled code has some methods that are worth talking about. These get imported into the namespace object's own namespace, which makes them callable through that namespace.

output( writer )

This function causes the template to do its thing. This is where it actually runs the code that was generated from the template. Depending on what variables have been set, this will do different things, as described in the tutorial above.

The writer object that must be passed in can be any object with a write method. A file object will do, such as sys.stdout. You may also define your own objects to have their own behavior. Anytime the template code decides that it wants to offload some generated text, the write method of the writer is called.

output_str

This function calls the output function with a default writer. The default writer simply collects all of the output into a single string, and then returns that string. Thus, you can do the following:

        tpl = cmTemplate.Template( 'ex1.ctpl' )
        t = tpl.new_namespace()

        # Other stuff goes here
        print t.output_str()
        

Parsing and compilation:

Templates are broken into two basic sections: text and tags. Text is sent to the output, provided that the surrounding tags evaluate correctly (text inside of an 'if' block, for example, will only be sent to the output if the expression is true).

Each template is parsed, and then compiled into Python code. If you are curious as to how that code looks, you can do the following:

    tpl = cmTemplate.Template( 'mytemplate.ctpl', keep_code=1 )
    print tpl.python_code()

When a special tag is encountered, Python code to represent that construct is generated. Two of these tags behave in a very different way, however: inc, and rawinc. If an inc or rawinc tag is encountered, the parser saves its state and immediately begins to process that file (it's not quite that simple, but close enough). Hence, the inc or rawinc filename is processed at compile time, making it different from the other constructs, which are processed during the execution of the output function.

There are several reasons for this behavior. One of the most important reasons is efficiency in template output. If the include files were evaluated at runtime, it could potentially be very expensive to output a template. This is simply unacceptable. So, the templates have a static relationship to all of their included files, and that relationship is realized during the compilation phase.

Another benefit of this is that all of the template defs can be placed into the global scope of a template namespace, regardless of what include file they came from. Doing that required some knowledge of the overall structure, which included the defs in included files.

Finally, the test for recursion can be done during compile time, further eliminating potentially expensive tests at runtime. All of these factors contribute to a template that is both fast to realize and intuitive to develop.

Once the Python code (text) is generated from the template, it is then sent through the 'compile' function, resulting in a Python code object. The text is usually discarded after that to save memory, but, as in the above example, you can instruct the parser to keep it around for your viewing pleasure. This is often useful for debugging purposes.

This next section will make a lot more sense if you print out the generated Python code for a template and look at it from top to bottom.

The generated template code is organized as follows:

So, that's the Python code generated from our template. What is done with it? It is compiled using the Python compile function. At that point, the generated text is discarded to save memory.

There's really not much more to tell. The final section of this document details the template syntax in a formal way. You can probably skip that section unless you are interested in writing your own template engine with similar syntax.

Reference: Formal Syntax Definition

Note that I have neglected to add the formal rules for escaping. I will get around to that someday....

template :==
text block template
| NULL
 
text :==
ANY_CHAR_LITERAL
| NULL
block :==
if_block
| for_block
| while_block
| def_block
| comment_tag
| echo_tag
| call_tag
| inc_tag
| rawinc_tag
| exec_tag
| break_tag
| continue_tag
| NULL
 
if_block :==
if_tag template [ elif_tag template ]* [ else_tag template ]? endif_tag
for_block :==
for_tag template [ else_tag template ]? endfor_tag
while_block :==
while_tag template endwhile_tag
def_block :==
def_tag template enddef_tag
comment_tag :==
START_SYMBOL OP_COMMENT WS TEXT WS? end_symbol
echo_tag :==
START_SYMBOL OP_ECHO WS simple_expr WS? end_symbol
if_tag :==
START_SYMBOL OP_IF WS simple_expr WS? end_symbol_block
elif_tag :==
START_SYMBOL OP_ELIF WS simple_expr WS? end_symbol_block
else_tag :==
START_SYMBOL OP_ELSE WS? end_symbol_block
endif_tag :==
START_SYMBOL OP_ENDIF WS? end_symbol
for_tag :==
START_SYMBOL OP_FOR WS var_name WS OP_IN WS simple_expr WS? end_symbol_block
endfor_tag :==
START_SYMBOL OP_ENDFOR WS? end_symbol
while_tag :==
START_SYMBOL OP_WHILE WS simple_expr WS? end_symbol_block
endwhile_tag :==
START_SYMBOL OP_ENDWHILE WS? end_symbol
break_tag :==
START_SYMBOL OP_BREAK WS? end_symbol
continue_tag :==
START_SYMBOL OP_CONTINUE WS? end_symbol
def_tag :==
START_SYMBOL OP_DEF WS def_name def_param_expression WS? end_symbol_block
enddef_tag :==
START_SYMBOL OP_ENDDEF WS? end_symbol
call_tag :==
START_SYMBOL OP_CALL WS def_name call_param_expression WS? end_symbol
inc_tag :==
START_SYMBOL OP_INC WS QUOTE? FILENAME QUOTE? WS? end_symbol
rawinc_tag :==
START_SYMBOL OP_INC WS QUOTE? FILENAME QUOTE? WS? end_symbol
exec_tag :==
START_SYMBOL OP_EXEC WS expression WS? end_symbol
 
var_name :==
CHAR_NAME_LITERAL
def_name :==
CHAR_NAME_LITERAL
def_param_expression :==
OP_OPEN_PAREN WS? def_param_list OP_CLOSE_PAREN
def_param_list :==
CHAR_NAME_LITERAL WS? [ OP_LIST_SEP WS? CHAR_NAME_LITERAL WS? ]*
call_param_expression :==
OP_OPEN_PAREN WS? call_param_list OP_CLOSE_PAREN
call_param_list :==
simple_expr WS? [ OP_LIST_SEP WS? simple_expr WS? ]*
simple_expr :==
SINGLE_STATEMENT_EXPR
expression :==
MULTI_STATEMENT_EXPR
end_symbol_block :==
OP_BLOCK_TERMINAL WS? end_symbol
end_symbol :==
END_SYM_TEXT END_SYM_WS?
 
WS :== \s+
END_SYM_WS :== \012\015|\012|\015
CHAR_NAME_LITERAL :== [a-zA-Z_][a-zA-z0-9_]*
QUOTE :== ["']
 
START_SYMBOL :== '<?='
END_SYM_TEXT :== '?>'
 
OP_BLOCK_TERMINAL :== ':'
OP_LIST_SEP :== ','
 
OP_OPEN_PAREN :== '('
OP_CLOSE_PAREN :== ')'
OP_COMMENT :== 'comment'
OP_ECHO :== 'echo'
OP_IF :== 'if'
OP_ELIF :== 'elif'
OP_ELSE :== 'else'
OP_ENDIF :== 'endif'
OP_FOR :== 'for'
OP_IN :== 'in'
OP_ENDFOR :== 'endfor'
OP_WHILE :== 'while'
OP_ENDWHILE :== 'endwhile'
OP_BREAK :== 'break'
OP_CONTINUE :== 'continue'
OP_DEF :== 'def'
OP_ENDDEF :== 'enddef'
OP_CALL :== 'call'
OP_INC :== 'inc'
OP_EXEC :== 'exec'
 
SINGLE_STATEMENT_EXPR :==
Any valid Python expression that is a single statement and evaluates to a single return value. For example, the internals of an 'if' statement should evaluate to something akin to a boolean and would have the same rules as a normal 'if' statement.
MULTI_STATEMENT_EXPR :==
Any valid Python expression that may or may not be multiple expressions. This basically leaves the door wide open for a generic exec.
FILENAME :==
This is NOT a Python expression, but an actual filename, surrounded by quotes.
TEXT :==
This is just text. No parsing is done. Just text.