= Markup Streams =
A [wiki:ApiDocs/MarkupCore#markup.core:Stream stream] is the common representation of markup as a ''stream of events''.
A stream can be attained in a number of ways. It can be:
* the result of parsing XML or HTML text, or
* [wiki:MarkupBuilder programmatically generated], or
* the result of selecting a subset of another stream filtered by an XPath expression.
For example, the functions `XML()` and `HTML()` can be used to convert literal XML or HTML text to a markup stream:
{{{
>>> from markup import XML
>>> stream = XML('
Some text and '
... 'a link.'
... '
')
>>> stream
}}}
The stream is the result of parsing the text into events. Each event is a tuple of the form `(kind, data, pos)`, where:
* `kind` defines what kind of event it is (such as the start of an element, text, a comment, etc).
* `data` is the actual data associated with the event. How this looks depends on the event kind.
* `pos` is a `(filename, lineno, column)` tuple that describes where the event “comes from”.
{{{
>>> for kind, data, pos in stream:
... print kind, `data`, pos
...
START (u'p', [(u'class', u'intro')]) ('', 1, 0)
TEXT u'Some text and ' ('', 1, 31)
START (u'a', [(u'href', u'http://example.org/')]) ('', 1, 31)
TEXT u'a link' ('', 1, 67)
END u'a' ('', 1, 67)
TEXT u'.' ('', 1, 72)
START (u'br', []) ('', 1, 72)
END u'br' ('', 1, 77)
END u'p' ('', 1, 77)
}}}
== Filtering ==
One important feature of markup streams is that you can apply ''filters'' to the stream, either filters that come with Markup, or your own custom filters.
A filter is simply a callable that accepts the stream as parameter, and returns the filtered stream:
{{{
#!python
def noop(stream):
"""A filter that doesn't actually do anything with the stream."""
for kind, data, pos in stream:
yield kind, data, pos
}}}
Filters can be applied in a number of ways. The simplest is to just call the filter directly:
{{{
#!python
stream = noop(stream)
}}}
The `Stream` class also provides a `filter()` method, which takes an arbitrary number of filter callables and applies them all:
{{{
#!python
stream = stream.filter(noop)
}}}
Finally, filters can also be applied using the ''bitwise or'' operator (`|`), which allows a syntax similar to pipes on Unix shells:
{{{
#!python
stream = stream | noop
}}}
''Note: this is only available in the current development version (0.3)''
One example of a filter included with Markup is the `HTMLSanitizer` in `markup.filters`. It processes a stream of HTML markup, and strips out any potentially dangerous constructs, such as Javascript event handlers. `HTMLSanitizer` is not a function, but rather a class that implements `__call__`, which means instances of the class are callable.
Both the `filter()` method and the pipe operator allow easy chaining of filters:
{{{
#!python
from markup.filters import HTMLSanitizer
stream = stream.filter(noop, HTMLSanitizer())
}}}
That is equivalent to:
{{{
#!python
stream = stream | noop | HTMLSanitizer()
}}}
== Serialization ==
The `Stream` class provides two methods for serializing this list of events: [wiki:ApiDocs/MarkupCore#markup.core:Stream:serialize serialize()] and [wiki:ApiDocs/MarkupCore#markup.core:Stream:render render()]. The former is a generator that yields chunks of `Markup` objects (which are basically unicode strings). The latter returns a single string, by default UTF-8 encoded.
Here's the output from `serialize()`:
{{{
>>> for output in stream.serialize():
... print `output`
...
'>
'>
'>
'>
'>
}}}
And here's the output from `render()`:
{{{
>>> print stream.render()
Some text and a link.
}}}
Both methods can be passed a `method` parameter that determines how exactly the events are serialzed to text. This parameter can be either “xml” (the default), “xhtml”, “html”, “text”, or a custom serializer class:
{{{
>>> print stream.render('html')
Some text and a link.
}}}
''(Note how the `
` element isn't closed, which is the right thing to do for HTML.)''
In addition, the `render()` method takes an `encoding` parameter, which defaults to “UTF-8”. If set to `None`, the result will be a unicode string.
The different serializer classes in `markup.output` can also be used directly:
{{{
>>> from markup.filters import HTMLSanitizer
>>> from markup.output import TextSerializer
>>> print TextSerializer()(HTMLSanitizer()(stream))
Some text and a link.
}}}
The pipe operator (added in 0.3) allows a nicer syntax:
{{{
>>> print stream | HTMLSanitizer() | TextSerializer()
Some text and a link.
}}}
== Using XPath ==
XPath can be used to extract a specific subset of the stream via the `select()` method:
{{{
>>> substream = stream.select('a')
>>> substream
>>> print substream
a link
}}}
Often, streams cannot be reused: in the above example, the sub-stream is based on a generator. Once it has been serialized, it will have been fully consumed, and cannot be rendered again. To work around this, you can wrap such a stream in a `list`:
{{{
>>> from markup import Stream
>>> substream = Stream(list(stream.select('a')))
>>> substream
>>> print substream
a link
>>> print substream.select('@href')
http://example.org/
>>> print substream.select('text()')
a link
}}}
----
See also: MarkupGuide