Edgewall Software

Changes between Version 10 and Version 11 of MarkupStream


Ignore:
Timestamp:
Sep 11, 2006, 6:03:45 PM (18 years ago)
Author:
cmlenz
Comment:

Use Include macro to pull reST doc from the repos

Legend:

Unmodified
Added
Removed
Modified
  • MarkupStream

    v10 v11  
    1 {{{
    2 #!rst
    3 ==============
    4 Markup Streams
    5 ==============
    6 
    7 A stream is the common representation of markup as a *stream of events*.
    8 
    9 
    10 .. contents:: Contents
    11    :depth: 2
    12 .. sectnum::
    13 
    14 
    15 Basics
    16 ======
    17 
    18 A stream can be attained in a number of ways. It can be:
    19 
    20 * the result of parsing XML or HTML text, or
    21 * programmatically generated, or
    22 * the result of selecting a subset of another stream filtered by an XPath
    23   expression.
    24 
    25 For example, the functions ``XML()`` and ``HTML()`` can be used to convert
    26 literal XML or HTML text to a markup stream::
    27 
    28   >>> from markup import XML
    29   >>> stream = XML('<p class="intro">Some text and '
    30   ...              '<a href="http://example.org/">a link</a>.'
    31   ...              '<br/></p>')
    32   >>> stream
    33   <markup.core.Stream object at 0x6bef0>
    34 
    35 The stream is the result of parsing the text into events. Each event is a tuple
    36 of the form ``(kind, data, pos)``, where:
    37 
    38 * ``kind`` defines what kind of event it is (such as the start of an element,
    39   text, a comment, etc).
    40 * ``data`` is the actual data associated with the event. How this looks depends
    41   on the event kind.
    42 * ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the
    43   event “comes from”.
    44 
    45 ::
    46 
    47   >>> for kind, data, pos in stream:
    48   ...     print kind, `data`, pos
    49   ...
    50   START (u'p', [(u'class', u'intro')]) ('<string>', 1, 0)
    51   TEXT u'Some text and ' ('<string>', 1, 31)
    52   START (u'a', [(u'href', u'http://example.org/')]) ('<string>', 1, 31)
    53   TEXT u'a link' ('<string>', 1, 67)
    54   END u'a' ('<string>', 1, 67)
    55   TEXT u'.' ('<string>', 1, 72)
    56   START (u'br', []) ('<string>', 1, 72)
    57   END u'br' ('<string>', 1, 77)
    58   END u'p' ('<string>', 1, 77)
    59 
    60 
    61 Filtering
    62 =========
    63 
    64 One important feature of markup streams is that you can apply *filters* to the
    65 stream, either filters that come with Markup, or your own custom filters.
    66 
    67 A filter is simply a callable that accepts the stream as parameter, and returns
    68 the filtered stream::
    69 
    70   def noop(stream):
    71       """A filter that doesn't actually do anything with the stream."""
    72       for kind, data, pos in stream:
    73           yield kind, data, pos
    74 
    75 Filters can be applied in a number of ways. The simplest is to just call the
    76 filter directly::
    77 
    78   stream = noop(stream)
    79 
    80 The ``Stream`` class also provides a ``filter()`` method, which takes an
    81 arbitrary number of filter callables and applies them all::
    82 
    83   stream = stream.filter(noop)
    84 
    85 Finally, filters can also be applied using the *bitwise or* operator (``|``),
    86 which allows a syntax similar to pipes on Unix shells::
    87 
    88   stream = stream | noop
    89 
    90 One example of a filter included with Markup is the ``HTMLSanitizer`` in
    91 ``markup.filters``. It processes a stream of HTML markup, and strips out any
    92 potentially dangerous constructs, such as Javascript event handlers.
    93 ``HTMLSanitizer`` is not a function, but rather a class that implements
    94 ``__call__``, which means instances of the class are callable.
    95 
    96 Both the ``filter()`` method and the pipe operator allow easy chaining of
    97 filters::
    98 
    99   from markup.filters import HTMLSanitizer
    100   stream = stream.filter(noop, HTMLSanitizer())
    101 
    102 That is equivalent to::
    103 
    104   stream = stream | noop | HTMLSanitizer()
    105 
    106 
    107 Serialization
    108 =============
    109 
    110 The ``Stream`` class provides two methods for serializing this list of events:
    111 ``serialize()`` and ``render()``. The former is a generator that yields chunks
    112 of ``Markup`` objects (which are basically unicode strings). The latter returns
    113 a single string, by default UTF-8 encoded.
    114 
    115 Here's the output from ``serialize()``::
    116 
    117   >>> for output in stream.serialize():
    118   ...     print `output`
    119   ...
    120   <Markup u'<p class="intro">'>
    121   <Markup u'Some text and '>
    122   <Markup u'<a href="http://example.org/">'>
    123   <Markup u'a link'>
    124   <Markup u'</a>'>
    125   <Markup u'.'>
    126   <Markup u'<br/>'>
    127   <Markup u'</p>'>
    128 
    129 And here's the output from ``render()``::
    130 
    131   >>> print stream.render()
    132   <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br/></p>
    133 
    134 Both methods can be passed a ``method`` parameter that determines how exactly
    135 the events are serialzed to text. This parameter can be either “xml” (the
    136 default), “xhtml”, “html”, “text”, or a custom serializer class::
    137 
    138   >>> print stream.render('html')
    139   <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br></p>
    140 
    141 Note how the `<br>` element isn't closed, which is the right thing to do for
    142 HTML.
    143 
    144 In addition, the ``render()`` method takes an ``encoding`` parameter, which
    145 defaults to “UTF-8”. If set to ``None``, the result will be a unicode string.
    146 
    147 The different serializer classes in ``markup.output`` can also be used
    148 directly::
    149 
    150   >>> from markup.filters import HTMLSanitizer
    151   >>> from markup.output import TextSerializer
    152   >>> print TextSerializer()(HTMLSanitizer()(stream))
    153   Some text and a link.
    154 
    155 The pipe operator allows a nicer syntax::
    156 
    157   >>> print stream | HTMLSanitizer() | TextSerializer()
    158   Some text and a link.
    159 
    160 Using XPath
    161 ===========
    162 
    163 XPath can be used to extract a specific subset of the stream via the
    164 ``select()`` method::
    165 
    166   >>> substream = stream.select('a')
    167   >>> substream
    168   <markup.core.Stream object at 0x7118b0>
    169   >>> print substream
    170   <a href="http://example.org/">a link</a>
    171 
    172 Often, streams cannot be reused: in the above example, the sub-stream is based
    173 on a generator. Once it has been serialized, it will have been fully consumed,
    174 and cannot be rendered again. To work around this, you can wrap such a stream
    175 in a ``list``::
    176 
    177   >>> from markup import Stream
    178   >>> substream = Stream(list(stream.select('a')))
    179   >>> substream
    180   <markup.core.Stream object at 0x7118b0>
    181   >>> print substream
    182   <a href="http://example.org/">a link</a>
    183   >>> print substream.select('@href')
    184   http://example.org/
    185   >>> print substream.select('text()')
    186   a link
    187 }}}
    188 
     1[[Include(trunk/doc/streams.txt)]]
    1892----
    1903See also: GenshiGuide