= Markup Streams =
A [wiki:ApiDocs/MarkupCore#markup.core:Stream stream] is the common representation of markup as a ''stream of events''.
A stream can be attained in a number of ways. It can be:
* the result of parsing XML or HTML text, or
* [wiki:MarkupBuilder programmatically generated], or
* the result of selecting a subset of another stream filtered by an XPath expression.
For example, the functions `XML()` and `HTML()` can be used to convert literal XML or HTML text to a markup stream:
{{{
>>> from markup import XML
>>> stream = XML('
Some text and '
... 'a link.'
... '
')
>>> stream
}}}
The stream is the result of parsing the text into events. Each event is a tuple of the form `(kind, data, pos)`, where:
* `kind` defines what kind of event it is (such as the start of an element, text, a comment, etc).
* `data` is the actual data associated with the event. How this looks depends on the event kind.
* `pos` is a `(filename, lineno, column)` tuple that describes where the event “comes from”.
{{{
>>> for kind, data, pos in stream:
... print kind, `data`, pos
...
START (u'p', [(u'class', u'intro')]) ('', 1, 0)
TEXT u'Some text and ' ('', 1, 31)
START (u'a', [(u'href', u'http://example.org/')]) ('', 1, 31)
TEXT u'a link' ('', 1, 67)
END u'a' ('', 1, 67)
TEXT u'.' ('', 1, 72)
START (u'br', []) ('', 1, 72)
END u'br' ('', 1, 77)
END u'p' ('', 1, 77)
}}}
== Serialization ==
The `Stream` class provides two methods for serializing this list of events: [wiki:ApiDocs/MarkupCore#markup.core:Stream:serialize serialize()] and [wiki:ApiDocs/MarkupCore#markup.core:Stream:render render()]. The former is a generator that yields chunks of `Markup` objects (which are basically unicode strings). The latter returns a single string, by default UTF-8 encoded.
Here's the output from `serialize()`:
{{{
>>> for output in stream.serialize():
... print `output`
...
'>
'>
'>
'>
'>
}}}
And here's the output from `render()`:
{{{
>>> print stream.render()
Some text and a link.
}}}
Both methods can be passed a `method` parameter that determines how exactly the events are serialzed to text. This parameter can be either “xml” (the default) or “html”, or a subclass of the `markup.output.Serializer` class:
{{{
>>> print stream.render('html')
Some text and a link.
}}}
''(Note how the `
` element isn't closed, which is the right thing to do for HTML.)''
In addition, the `render()` method takes an `encoding` parameter, which defaults to “UTF-8”. If set to `None`, the result will be a unicode string.
== Using XPath ==
XPath can be used to extract a specific subset of the stream via the `select()` method:
{{{
>>> substream = stream.select('a')
>>> substream
>>> print substream
a link
}}}