= Markup Streams = A [wiki:ApiDocs/MarkupCore#markup.core:Stream stream] is the common representation of markup as a ''stream of events''. A stream can be attained in a number of ways. It can be: * the result of parsing XML or HTML text, or * [wiki:MarkupBuilder programmatically generated], or * the result of selecting a subset of another stream filtered by an XPath expression. For example, the functions `XML()` and `HTML()` can be used to convert literal XML or HTML text to a markup stream: {{{ >>> from markup import XML >>> stream = XML('

Some text and ' ... 'a link.' ... '

') >>> stream }}} The stream is the result of parsing the text into events. Each event is a tuple of the form `(kind, data, pos)`, where: * `kind` defines what kind of event it is (such as the start of an element, text, a comment, etc). * `data` is the actual data associated with the event. How this looks depends on the event kind. * `pos` is a `(filename, lineno, column)` tuple that describes where the event “comes from”. {{{ >>> for kind, data, pos in stream: ... print kind, `data`, pos ... START (u'p', [(u'class', u'intro')]) ('', 1, 0) TEXT u'Some text and ' ('', 1, 31) START (u'a', [(u'href', u'http://example.org/')]) ('', 1, 31) TEXT u'a link' ('', 1, 67) END u'a' ('', 1, 67) TEXT u'.' ('', 1, 72) START (u'br', []) ('', 1, 72) END u'br' ('', 1, 77) END u'p' ('', 1, 77) }}} == Filtering == One important feature of markup streams is that you can apply ''filters'' to the stream, either filters that come with Markup, or your own custom filters. A filter is simply a callable that accepts the stream as parameter, and returns the filtered stream: {{{ #!python def noop(stream): """A filter that doesn't actually do anything with the stream.""" for kind, data, pos in stream: yield kind, data, pos }}} Filters can be applied in a number of ways. The simplest is to just call the filter directly: {{{ #!python stream = noop(stream) }}} The `Stream` class also provides a `filter()` method, which takes an arbitrary number of filter callables and applies them all: {{{ #!python stream = stream.filter(noop) }}} Finally, filters can also be applied using the ''bitwise or'' operator (`|`), which allows a syntax similar to pipes on Unix shells: {{{ #!python stream = stream | noop }}} ''Note: this is only available in the current development version (0.3)'' One example of a filter included with Markup is the `HTMLSanitizer` in `markup.filters`. It processes a stream of HTML markup, and strips out any potentially dangerous constructs, such as Javascript event handlers. `HTMLSanitizer` is not a function, but rather a class that implements `__call__`, which means instances of the class are callable. Both the `filter()` method and the pipe operator allow easy chaining of filters: {{{ #!python from markup.filters import HTMLSanitizer stream = stream.filter(noop, HTMLSanitizer()) }}} That is equivalent to: {{{ #!python stream = stream | noop | HTMLSanitizer() }}} == Serialization == The `Stream` class provides two methods for serializing this list of events: [wiki:ApiDocs/MarkupCore#markup.core:Stream:serialize serialize()] and [wiki:ApiDocs/MarkupCore#markup.core:Stream:render render()]. The former is a generator that yields chunks of `Markup` objects (which are basically unicode strings). The latter returns a single string, by default UTF-8 encoded. Here's the output from `serialize()`: {{{ >>> for output in stream.serialize(): ... print `output` ... '> '> '> '> '> }}} And here's the output from `render()`: {{{ >>> print stream.render()

Some text and a link.

}}} Both methods can be passed a `method` parameter that determines how exactly the events are serialzed to text. This parameter can be either “xml” (the default), “xhtml”, “html”, “text”, or a custom serializer class: {{{ >>> print stream.render('html')

Some text and a link.

}}} ''(Note how the `
` element isn't closed, which is the right thing to do for HTML.)'' In addition, the `render()` method takes an `encoding` parameter, which defaults to “UTF-8”. If set to `None`, the result will be a unicode string. The different serializer classes in `markup.output` can also be used directly: {{{ >>> from markup.filters import HTMLSanitizer >>> from markup.output import TextSerializer >>> print TextSerializer()(HTMLSanitizer()(stream)) Some text and a link. }}} The pipe operator (added in 0.3) allows a nicer syntax: {{{ >>> print stream | HTMLSanitizer() | TextSerializer() Some text and a link. }}} == Using XPath == XPath can be used to extract a specific subset of the stream via the `select()` method: {{{ >>> substream = stream.select('a') >>> substream >>> print substream a link }}} Often, streams cannot be reused: in the above example, the sub-stream is based on a generator. Once it has been serialized, it will have been fully consumed, and cannot be rendered again. To work around this, you can wrap such a stream in a `list`: {{{ >>> from markup import Stream >>> substream = Stream(list(stream.select('a'))) >>> substream >>> print substream a link >>> print substream.select('@href') http://example.org/ >>> print substream.select('text()') a link }}} ---- See also: MarkupGuide