Edgewall Software



This module provides different kinds of serialization methods for XML event streams.

encode(iterator, method='xml', encoding='utf-8', out=None)

Encode serializer output into a string.

param iterator:the iterator returned from serializing a stream (basically any iterator that yields unicode objects)
param method:the serialization method; determines how characters not representable in the specified encoding are treated
param encoding:how the output string should be encoded; if set to None, this method returns a unicode object
param out:a file-like object that the output should be written to instead of being returned as one big string; note that if this is a file or socket (or similar), the encoding must not be None (that is, the output must be encoded)
return:a str or unicode object (depending on the encoding parameter), or None if the out parameter is provided
since:version 0.4.1
note:Changed in 0.5: added the out parameter

get_serializer(method='xml', **kwargs)

Return a serializer object for the given method.

param method:the serialization method; can be either "xml", "xhtml", "html", "text", or a custom serializer class

Any additional keyword arguments are passed to the serializer, and thus depend on the method parameter value.

see:XMLSerializer, XHTMLSerializer, HTMLSerializer, TextSerializer
since:version 0.4.1


Defines a number of commonly used DOCTYPE declarations as constants.

get(cls, name)

Return the (name, pubid, sysid) tuple of the DOCTYPE declaration for the specified name.

The following names are recognized in this version:
  • "html" or "html-strict" for the HTML 4.01 strict DTD
  • "html-transitional" for the HTML 4.01 transitional DTD
  • "html-frameset" for the HTML 4.01 frameset DTD
  • "html5" for the DOCTYPE proposed for HTML5
  • "xhtml" or "xhtml-strict" for the XHTML 1.0 strict DTD
  • "xhtml-transitional" for the XHTML 1.0 transitional DTD
  • "xhtml-frameset" for the XHTML 1.0 frameset DTD
  • "xhtml11" for the XHTML 1.1 DTD
  • "svg" or "svg-full" for the SVG 1.1 DTD
  • "svg-basic" for the SVG Basic 1.1 DTD
  • "svg-tiny" for the SVG Tiny 1.1 DTD
param name:the name of the DOCTYPE
return:the (name, pubid, sysid) tuple for the requested DOCTYPE, or None if the name is not recognized
since:version 0.4.1


Produces XML text from an event stream.

>>> from genshi.builder import tag
>>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
>>> print(''.join(XMLSerializer()(elem.generate())))
<div><a href="foo"/><br/><hr noshade="True"/></div>


Produces XHTML text from an event stream.

>>> from genshi.builder import tag
>>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
>>> print(''.join(XHTMLSerializer()(elem.generate())))
<div><a href="foo"></a><br /><hr noshade="noshade" /></div>


Produces HTML text from an event stream.

>>> from genshi.builder import tag
>>> elem = tag.div(tag.a(href='foo'), tag.br, tag.hr(noshade=True))
>>> print(''.join(HTMLSerializer()(elem.generate())))
<div><a href="foo"></a><br><hr noshade></div>


Produces plain text from an event stream.

Only text events are included in the output. Unlike the other serializer, special XML characters are not escaped:

>>> from genshi.builder import tag
>>> elem = tag.div(tag.a('<Hello!>', href='foo'), tag.br)
>>> print(elem)
<div><a href="foo">&lt;Hello!&gt;</a><br/></div>
>>> print(''.join(TextSerializer()(elem.generate())))

If text events contain literal markup (instances of the Markup class), that markup is by default passed through unchanged:

>>> elem = tag.div(Markup('<a href="foo">Hello &amp; Bye!</a><br/>'))
>>> print(elem.generate().render(TextSerializer, encoding=None))
<a href="foo">Hello &amp; Bye!</a><br/>

You can use the strip_markup to change this behavior, so that tags and entities are stripped from the output (or in the case of entities, replaced with the equivalent character):

>>> print(elem.generate().render(TextSerializer, strip_markup=True,
...                              encoding=None))
Hello & Bye!


Combines START and STOP events into EMPTY events for elements that have no contents.


Output stream filter that removes namespace information from the stream, instead adding namespace attributes and prefixes as needed.

param prefixes:optional mapping of namespace URIs to prefixes
>>> from genshi.input import XML
>>> xml = XML('''<doc xmlns="NS1" xmlns:two="NS2">
...   <two:item/>
... </doc>''')
>>> for kind, data, pos in NamespaceFlattener()(xml):
...     print('%s %r' % (kind, data))
START (u'doc', Attrs([('xmlns', u'NS1'), (u'xmlns:two', u'NS2')]))
TEXT u'\n  '
START (u'two:item', Attrs())
END u'two:item'
TEXT u'\n'
END u'doc'


A filter that removes extraneous ignorable white space from the stream.


A filter that inserts the DOCTYPE declaration in the correct location, after the XML declaration.

See ApiDocs/0.6.x, Documentation