Edgewall Software

Version 41 (modified by cmlenz, 17 years ago) (diff)

--

Frequently Asked Questions

Here you can find answers to frequently asked questions about Genshi.

Overview

  1. General
    1. What is Genshi?
    2. Why yet another template engine?
    3. Why XML-based?
    4. So then why not just use Kid?
    5. What are the main differences between Kid and Genshi?
    6. What license governs the use of Genshi?
    7. What do I need to use Genshi?
    8. Why is it called “Genshi”?
    9. How do you pronounce “Genshi”?
  2. Features and Design
    1. What other features does the toolkit provide?
    2. Why use includes instead of inheritance?
    3. Is Genshi “sandboxable”?
  3. Usage
    1. How can I include literal XML in template output?
    2. What is Genshi doing with the whitespace in my markup template?
    3. Why does Genshi raise a UnicodeDecodeError when I try to render non-ASCII strings?
    4. Can I add custom directives?

General

What is Genshi?

We like to call it a “toolkit for stream-based generation of output for the web”. The largest feature provided by Genshi is an XML-based template engine that is heavily inspired by Kid. But it also provides a text-based template engine, as well as a collection of tools for working with markup.

Why yet another template engine?

We'll let Ryan Tomayko, the author of Kid, answer this one:

“There's at least four billion and nine text based template languages for Python but there aren't a lot of options that fit nicely into the XML tool-chain. Or, if they do fit nicely into the XML tool-chain, they don't fit nicely with Python.”

See his article “In search of a Pythonic, XML-based Templating Language” for the details.

Why XML-based?

Most template engines for web applications are character-stream based: they know nothing about the format of the response body that is being generated. They simply substitute variable expressions, and provide some directives for looping, conditionals, etc. Thus they can be used to generate any kind of textual output, be it HTML, plain text emails, program code, or really anything else.

However, 99% of the templates used by web applications generate some kind of XML/HTML-based markup. We believe that web applications can benefit from a template engine that “knows what it's doing” when it comes to markup. You don't need to worry about generating output that is not well-formed, nor do you need to worry about accidentially not escaping some data, thereby greatly reducing the risk for introducing XSS attack vectors. Furthermore, your templates look a lot more like the targetted output format: an HTML template looks like HTML, a template for an RSS feed looks like RSS. Directives in text-based template languages often result in rather messy templates, or produce excessive amounts of unnecessary white space.

See also HOWTO Avoid Being Called a Bozo When Producing XML, which has this say about text-based templating systems:

“Don’t use these systems for producing XML. Making mistakes with them is extremely easy and taking all cases into account is hard. These systems have failed smart people who have actively tried to get things right.”

This advice extends to HTML, of course.

In addition, text-based templates don't even work all that well for many text formats. Imagine you want to generate a plain text email or an iCalendar file. How do you deal with important concerns such as line-wrapping and white-space in your templates? You may be better off using specialized formatters.

So then why not just use Kid?

We think that Kid represents a huge step forward for XML-based templating in Python. Match templates and the generator-based processing model are extremely powerful concepts.

But arguably Kid also has some basic design problems. For example, Kid generates Python code from templates, which adds a lot of complexity to the code and can make the process of locating and fixing template errors a true nightmare. A syntax error in a template expression will cause an exception that points somewhere in the generated code. In addition, as Kid is based on ElementTree, and the ElementTree API doesn't provide location information for parse events, exceptions reported by Kid often don't include information about what part of the template caused the error. (To be fair, this kind of location tracking wasn't even available in the Python bindings for Expat before Python 2.4.)

We felt these problems would best be addressed by developing a new engine from scratch, as opposed to trying to “fix” Kid.

What are the main differences between Kid and Genshi?

Genshi executes templates directly, there's no code generation phase. Expressions are evaluated in a more forgiving way using AST transformation. Template variables are stored on a stack, which means that some variable set in a loop deep in the template won't leak into the rest of the template. And even though Genshi doesn't generate Python code for templates, it generally performs better than Kid (over 2x in our benchmarks, but the exact difference depends on a lot of factors).

Genshi does not depend on ElementTree. It uses Expat for parsing XML, and is based on streaming slightly abstracted parse events through the processing pipeline. It uses XInclude – instead of Kids' py:extends – to allow template authors to factor out common bits. For match templates, it uses XPath expressions instead of the ElementTree API.

For more details about what's different see GenshiVsKid.

http://www.edgewall.org/gfx/opensource-75x65.png

What license governs the use of Genshi?

Genshi is released under the revised BSD license, which is a liberal open source license that has been approved by the Open Source Initiative (OSI).

What do I need to use Genshi?

Python 2.3 or later. Python 2.4 is recommended for better performance, plus error messages will include template line numbers and column offsets. Python 3 does not work. Alternative implementations of Python such as PyPy, Jython, or IronPython are also unlikely to work due to the Genshi code using some rather advanced features of the CPython implementation and standard library.

Setuptools is optional and only used for installation if it's available. The template engine plugin, which enables usage of Genshi in frameworks such as TurboGears or Pylons, depends on Setuptools at runtime and installation time. Use of the plugin implementation is optional, though: Setuptools is not required for using Genshi directly.

Why is it called “Genshi”?

“Genshi” (原糸) is japanese and roughly translates to “thread for weaving”. Basically, Genshi is a thread for weaving web output. The thread meaning also fits nicely with how markup is presented as streams of events in Genshi.

Prior to release 0.3, the project was called “Markup”. The name was changed because:

  • there was another Python project using the name,
  • it made communication somewhat awkward, because it was sometimes unclear whether someone is talking about the project “Markup”, or just plain markup, and
  • it was not easy to find the project via search engines.

How do you pronounce “Genshi”?

The "official" pronunciation is "gen" (as in "get") and "shi" (as in "she").

Features and Design

What other features does the toolkit provide?

Beyond the XML-based template engine, Genshi provides:

  • a unified stream-based processing model for markup, where
  • streams can come from XML or HTML text, or be generated programmatically using a very simple syntax.
  • XPath can be used to query any stream, not just in templates.
  • Different serialization methods (XML, HTML, and plain text) for streams.
  • An HTML “sanitizing” filter to strip potentially dangerous elements or attributes from user-submitted HTML markup.
  • A simple text-based template engine that can be used for generating plain text output.

Why use includes instead of inheritance?

We think that includes are both simpler and more natural for templating.

Template inheritance is a concept that fits well with template languages where a master template provide “slots” that are “filled” by the inheriting templates. However, Genshi has no such feature, and instead uses the more powerful and flexible concept of match templates.

Furthermore, XInclude is a W3C standard, which means that it is more likely to be supported in authoring tools than some esoteric custom notation for including external resources.

See also GenshiRecipes/PyExtendsEquivalent and GenshiRecipes/PyLayoutEquivalent to find out how the Kid directives py:extends and py:layout map to includes in Genshi.

Is Genshi “sandboxable”?

Or: “can I use Genshi to allow untrusted users to create and modify templates?”

No. Genshi allows embedding Python expressions in templates. That code runs with the permissions of the process running your web application. Malicious or misguided users can quite easily construct expressions that result in exposing sensitive information or even destroying data on your server.

Unfortunately, Python does not yet provide a mode for restricted execution. Sandboxing Genshi will probably be possible as soon as that changes, but not before. Apparently some progress is being made, but we'll have to see how that develops.

Usage

How can I include literal XML in template output?

Unless explicitly told otherwise, Genshi escapes any data you substitute into template output so that it is safe for being parsed and displayed by web browsers and other tools. This saves you from the work of having to tediously escape every variable by hand, and greatly reduces the risk of introducing vectors for cross-site scripting (XSS) attacks.

However, sometimes what you want is to include text in the template output that should not get escaped. For example, if you allow users to enter HTML verbatim (or provide a rich-text editor of sorts), you want that HTML to appear as actual markup in the output, not as escaped text.

Genshi provides a number of ways to do that:

  • The Markup class in the genshi.core module can be used to flag strings that should not be escaped. Strings wrapped in a Markup instance get copied to the output unchanged.
  • The XML and HTML functions in the genshi.input module parse XML and HTML strings, respectively, and produce a markup stream. Note that this option can be rather expensive, as the text needs to be parsed just to be serialized again. Also, this method fails on bad markup that cannot be parsed by either HTMLParser or Expat.
  • If you are generating the snippets in question yourself, you may want to use the genshi.builder to generate markup streams programmatically. Just as the results of the XML and HTML functions discussed above, the stream produced using genshi.builder will not be escaped in the template output.

What is Genshi doing with the whitespace in my markup template?

When you serialize to XML/HTML, Genshi by default attempts to remove excessive whitespace: basically, it strips all trailing spaces and remove any empty lines. This is done to help keep down the size of generated content to be transmitted over the web. Genshi will not do any reformatting (such as indenting tags), as that would be too expensive and often of no practical use.

If you're rendering XHTML or HTML output, Genshi tries not to mess with certain tags that are known to be sensitive to whitespace (such as <textarea> and <pre>). However, in general it is not trivial to determine whether whitespace is significant or not in markup, so you may see unexpected results in some cases. If that is a problem, the simplest solution is to disable stripping of whitespace entirely (a side effect is that serialization will be sped up):

stream.render('html', strip_whitespace=False)

If you'd like to keep the whitespace removal in general, but need to disable it inside certain elements, you can use the xml:space attribute:

  … whitespace is stripped here …

  <div xml:space="preserve">

     … whitespace is kept intact inside this tag …

  </div>

  … whitespace is stripped here, again …

You'll need to do that when:

  1. you're rendering to XML
  2. you're rendering to XHTML/HTML, and have set the element to display as whitespace: nowrap via CSS
  3. you have <pre> or <textarea> tags that are embedded as opaque Markup instances, in which case Genshi never actually sees the tags (see #78)

Why does Genshi raise a UnicodeDecodeError when I try to render non-ASCII strings?

Genshi refuses to work with bytestrings that do not use the system default encoding, which on most systems is ASCII. This behavior is by design. If you want to render strings that contain characters outside of the ASCII range (or don't use the system default encoding), you will have to pass unicode objects to Genshi.

This doesn't mean that Genshi has bad support for non-ASCII strings. Quite the contrary -- Genshi works beautifully with unicode. It just requires that you do the decoding of your bytestrings before passing the data to the templates. If this data comes from a database, it's likely that your database connector provides an option to automatically decode any data to unicode. SQLAlchemy, for example, provides a convert_unicode flag you can specify when creating an engine. Set that flag to True, and you get rid of a whole class of potentially nasty problems.

For more information on unicode and using unicode in Python applications, see:

Can I add custom directives?

Yes and no. Technically, it is possible to add custom directives by making a subclass of MarkupTemplate or TextTemplate. However, think twice before doing that. Genshi has been designed to get by with a standard set of generic directives; adding additional ones should not be necessary.

Match templates provide much of the functionality you may want to get from custom directives, and are more convenient to boot. For example, let's assume you want a widget library. Instead of writing your own template directives, add an include file defining a couple match templates. As a simple example, assume the following is stored in a file called widgets.html:

<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:w="http://www.example.org/widgets"
      xmlns:py="http://genshi.edgewall.org/" py:strip="">
  <py:match path="w:pane">
    <div class="pane">
      <h4>${pane.label}</h4>
      ${pane.render()}
    </div>
  </py:match>
</html>

That defines one “widget" (<w:pane>) that can be used in including templates as follows:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en"
      xmlns:w="http://www.example.org/widgets"
      xmlns:xi="http://www.w3.org/2001/XInclude"
      xmlns:py="http://genshi.edgewall.org/">
  <xi:include href="widgets.html" />
  <py:for each="pane in panes">
    <w:pane />
  </py:for>
</html>

One advantage is that you're using markup to define those reusable components. Another is that you're not polluting the py: namespace.


See also: Documentation, GenshiRecipes