Edgewall Software

Changes between Version 8 and Version 9 of MarkupStream


Ignore:
Timestamp:
Sep 8, 2006, 10:08:45 AM (18 years ago)
Author:
cmlenz
Comment:

Move to reStructuredText

Legend:

Unmodified
Added
Removed
Modified
  • MarkupStream

    v8 v9  
    1 = Markup Streams =
     1{{{
     2#!rst
     3==============
     4Markup Streams
     5==============
    26
    3 A [wiki:ApiDocs/MarkupCore#markup.core:Stream stream] is the common representation of markup as a ''stream of events''.
     7A stream is the common representation of markup as a *stream of events*.
     8
     9
     10.. contents:: Contents
     11   :depth: 2
     12.. sectnum::
     13
     14
     15Basics
     16======
    417
    518A stream can be attained in a number of ways. It can be:
    6  * the result of parsing XML or HTML text, or
    7  * [wiki:MarkupBuilder programmatically generated], or
    8  * the result of selecting a subset of another stream filtered by an XPath expression.
    919
    10 For example, the functions `XML()` and `HTML()` can be used to convert literal XML or HTML text to a markup stream:
     20* the result of parsing XML or HTML text, or
     21* programmatically generated, or
     22* the result of selecting a subset of another stream filtered by an XPath
     23  expression.
    1124
    12 {{{
    13 >>> from markup import XML
    14 >>> stream = XML('<p class="intro">Some text and '
    15 ...              '<a href="http://example.org/">a link</a>.'
    16 ...              '<br/></p>')
    17 >>> stream
    18 <markup.core.Stream object at 0x6bef0>
    19 }}}
     25For example, the functions ``XML()`` and ``HTML()`` can be used to convert
     26literal XML or HTML text to a markup stream::
    2027
    21 The stream is the result of parsing the text into events. Each event is a tuple of the form `(kind, data, pos)`, where:
    22  * `kind` defines what kind of event it is (such as the start of an element, text, a comment, etc).
    23  * `data` is the actual data associated with the event. How this looks depends on the event kind.
    24  * `pos` is a `(filename, lineno, column)` tuple that describes where the event “comes from”.
     28  >>> from markup import XML
     29  >>> stream = XML('<p class="intro">Some text and '
     30  ...              '<a href="http://example.org/">a link</a>.'
     31  ...              '<br/></p>')
     32  >>> stream
     33  <markup.core.Stream object at 0x6bef0>
    2534
    26 {{{
    27 >>> for kind, data, pos in stream:
    28 ...     print kind, `data`, pos
    29 ...
    30 START (u'p', [(u'class', u'intro')]) ('<string>', 1, 0)
    31 TEXT u'Some text and ' ('<string>', 1, 31)
    32 START (u'a', [(u'href', u'http://example.org/')]) ('<string>', 1, 31)
    33 TEXT u'a link' ('<string>', 1, 67)
    34 END u'a' ('<string>', 1, 67)
    35 TEXT u'.' ('<string>', 1, 72)
    36 START (u'br', []) ('<string>', 1, 72)
    37 END u'br' ('<string>', 1, 77)
    38 END u'p' ('<string>', 1, 77)
    39 }}}
     35The stream is the result of parsing the text into events. Each event is a tuple
     36of the form ``(kind, data, pos)``, where:
    4037
    41 == Filtering ==
     38* ``kind`` defines what kind of event it is (such as the start of an element,
     39  text, a comment, etc).
     40* ``data`` is the actual data associated with the event. How this looks depends
     41  on the event kind.
     42* ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the
     43  event “comes from”.
    4244
    43 One important feature of markup streams is that you can apply ''filters'' to the stream, either filters that come with Markup, or your own custom filters.
     45::
    4446
    45 A filter is simply a callable that accepts the stream as parameter, and returns the filtered stream:
     47  >>> for kind, data, pos in stream:
     48  ...     print kind, `data`, pos
     49  ...
     50  START (u'p', [(u'class', u'intro')]) ('<string>', 1, 0)
     51  TEXT u'Some text and ' ('<string>', 1, 31)
     52  START (u'a', [(u'href', u'http://example.org/')]) ('<string>', 1, 31)
     53  TEXT u'a link' ('<string>', 1, 67)
     54  END u'a' ('<string>', 1, 67)
     55  TEXT u'.' ('<string>', 1, 72)
     56  START (u'br', []) ('<string>', 1, 72)
     57  END u'br' ('<string>', 1, 77)
     58  END u'p' ('<string>', 1, 77)
    4659
    47 {{{
    48 #!python
    49 def noop(stream):
    50     """A filter that doesn't actually do anything with the stream."""
    51     for kind, data, pos in stream:
    52         yield kind, data, pos
    53 }}}
    5460
    55 Filters can be applied in a number of ways. The simplest is to just call the filter directly:
     61Filtering
     62=========
    5663
    57 {{{
    58 #!python
    59 stream = noop(stream)
    60 }}}
     64One important feature of markup streams is that you can apply *filters* to the
     65stream, either filters that come with Markup, or your own custom filters.
    6166
    62 The `Stream` class also provides a `filter()` method, which takes an arbitrary number of filter callables and applies them all:
     67A filter is simply a callable that accepts the stream as parameter, and returns
     68the filtered stream::
    6369
    64 {{{
    65 #!python
    66 stream = stream.filter(noop)
    67 }}}
     70  def noop(stream):
     71      """A filter that doesn't actually do anything with the stream."""
     72      for kind, data, pos in stream:
     73          yield kind, data, pos
    6874
    69 Finally, filters can also be applied using the ''bitwise or'' operator (`|`), which allows a syntax similar to pipes on Unix shells:
     75Filters can be applied in a number of ways. The simplest is to just call the
     76filter directly::
    7077
    71 {{{
    72 #!python
    73 stream = stream | noop
    74 }}}
     78  stream = noop(stream)
    7579
    76  ''Note: this is only available in the current development version (0.3)''
     80The ``Stream`` class also provides a ``filter()`` method, which takes an
     81arbitrary number of filter callables and applies them all::
    7782
    78 One example of a filter included with Markup is the `HTMLSanitizer` in `markup.filters`. It processes a stream of HTML markup, and strips out any potentially dangerous constructs, such as Javascript event handlers. `HTMLSanitizer` is not a function, but rather a class that implements `__call__`, which means instances of the class are callable.
     83  stream = stream.filter(noop)
    7984
    80 Both the `filter()` method and the pipe operator allow easy chaining of filters:
    81 {{{
    82 #!python
    83 from markup.filters import HTMLSanitizer
    84 stream = stream.filter(noop, HTMLSanitizer())
    85 }}}
     85Finally, filters can also be applied using the *bitwise or* operator (``|``),
     86which allows a syntax similar to pipes on Unix shells::
    8687
    87 That is equivalent to:
    88 {{{
    89 #!python
    90 stream = stream | noop | HTMLSanitizer()
    91 }}}
     88  stream = stream | noop
    9289
    93 == Serialization ==
     90One example of a filter included with Markup is the ``HTMLSanitizer`` in
     91``markup.filters``. It processes a stream of HTML markup, and strips out any
     92potentially dangerous constructs, such as Javascript event handlers.
     93``HTMLSanitizer`` is not a function, but rather a class that implements
     94``__call__``, which means instances of the class are callable.
    9495
    95 The `Stream` class provides two methods for serializing this list of events: [wiki:ApiDocs/MarkupCore#markup.core:Stream:serialize serialize()] and [wiki:ApiDocs/MarkupCore#markup.core:Stream:render render()]. The former is a generator that yields chunks of `Markup` objects (which are basically unicode strings). The latter returns a single string, by default UTF-8 encoded.
     96Both the ``filter()`` method and the pipe operator allow easy chaining of
     97filters::
    9698
    97 Here's the output from `serialize()`:
     99  from markup.filters import HTMLSanitizer
     100  stream = stream.filter(noop, HTMLSanitizer())
    98101
    99 {{{
    100 >>> for output in stream.serialize():
    101 ...     print `output`
    102 ...
    103 <Markup u'<p class="intro">'>
    104 <Markup u'Some text and '>
    105 <Markup u'<a href="http://example.org/">'>
    106 <Markup u'a link'>
    107 <Markup u'</a>'>
    108 <Markup u'.'>
    109 <Markup u'<br/>'>
    110 <Markup u'</p>'>
    111 }}}
     102That is equivalent to::
    112103
    113 And here's the output from `render()`:
     104  stream = stream | noop | HTMLSanitizer()
    114105
    115 {{{
    116 >>> print stream.render()
    117 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br/></p>
    118 }}}
    119106
    120 Both methods can be passed a `method` parameter that determines how exactly the events are serialzed to text. This parameter can be either “xml” (the default), “xhtml”, “html”, “text”, or a custom serializer class:
     107Serialization
     108=============
    121109
    122 {{{
    123 >>> print stream.render('html')
    124 <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br></p>
    125 }}}
     110The ``Stream`` class provides two methods for serializing this list of events:
     111``serialize()`` and ``render()``. The former is a generator that yields chunks
     112of ``Markup`` objects (which are basically unicode strings). The latter returns
     113a single string, by default UTF-8 encoded.
    126114
    127 ''(Note how the `<br>` element isn't closed, which is the right thing to do for HTML.)''
     115Here's the output from ``serialize()``::
    128116
    129 In addition, the `render()` method takes an `encoding` parameter, which defaults to “UTF-8”. If set to `None`, the result will be a unicode string.
     117  >>> for output in stream.serialize():
     118  ...     print `output`
     119  ...
     120  <Markup u'<p class="intro">'>
     121  <Markup u'Some text and '>
     122  <Markup u'<a href="http://example.org/">'>
     123  <Markup u'a link'>
     124  <Markup u'</a>'>
     125  <Markup u'.'>
     126  <Markup u'<br/>'>
     127  <Markup u'</p>'>
    130128
    131 The different serializer classes in `markup.output` can also be used directly:
     129And here's the output from ``render()``::
    132130
    133 {{{
    134 >>> from markup.filters import HTMLSanitizer
    135 >>> from markup.output import TextSerializer
    136 >>> print TextSerializer()(HTMLSanitizer()(stream))
    137 Some text and a link.
    138 }}}
     131  >>> print stream.render()
     132  <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br/></p>
    139133
    140 The pipe operator (added in 0.3) allows a nicer syntax:
     134Both methods can be passed a ``method`` parameter that determines how exactly
     135the events are serialzed to text. This parameter can be either “xml” (the
     136default), “xhtml”, “html”, “text”, or a custom serializer class::
    141137
    142 {{{
    143 >>> print stream | HTMLSanitizer() | TextSerializer()
    144 Some text and a link.
    145 }}}
     138  >>> print stream.render('html')
     139  <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br></p>
    146140
    147 == Using XPath ==
     141Note how the `<br>` element isn't closed, which is the right thing to do for
     142HTML.
    148143
    149 XPath can be used to extract a specific subset of the stream via the `select()` method:
     144In addition, the ``render()`` method takes an ``encoding`` parameter, which
     145defaults to “UTF-8”. If set to ``None``, the result will be a unicode string.
    150146
    151 {{{
    152 >>> substream = stream.select('a')
    153 >>> substream
    154 <markup.core.Stream object at 0x7118b0>
    155 >>> print substream
    156 <a href="http://example.org/">a link</a>
    157 }}}
     147The different serializer classes in ``markup.output`` can also be used
     148directly::
    158149
    159 Often, streams cannot be reused: in the above example, the sub-stream is based on a generator. Once it has been serialized, it will have been fully consumed, and cannot be rendered again. To work around this, you can wrap such a  stream in a `list`:
     150  >>> from markup.filters import HTMLSanitizer
     151  >>> from markup.output import TextSerializer
     152  >>> print TextSerializer()(HTMLSanitizer()(stream))
     153  Some text and a link.
    160154
    161 {{{
    162 >>> from markup import Stream
    163 >>> substream = Stream(list(stream.select('a')))
    164 >>> substream
    165 <markup.core.Stream object at 0x7118b0>
    166 >>> print substream
    167 <a href="http://example.org/">a link</a>
    168 >>> print substream.select('@href')
    169 http://example.org/
    170 >>> print substream.select('text()')
    171 a link
     155The pipe operator allows a nicer syntax::
     156
     157  >>> print stream | HTMLSanitizer() | TextSerializer()
     158  Some text and a link.
     159
     160Using XPath
     161===========
     162
     163XPath can be used to extract a specific subset of the stream via the
     164``select()`` method::
     165
     166  >>> substream = stream.select('a')
     167  >>> substream
     168  <markup.core.Stream object at 0x7118b0>
     169  >>> print substream
     170  <a href="http://example.org/">a link</a>
     171
     172Often, streams cannot be reused: in the above example, the sub-stream is based
     173on a generator. Once it has been serialized, it will have been fully consumed,
     174and cannot be rendered again. To work around this, you can wrap such a stream
     175in a ``list``::
     176
     177  >>> from markup import Stream
     178  >>> substream = Stream(list(stream.select('a')))
     179  >>> substream
     180  <markup.core.Stream object at 0x7118b0>
     181  >>> print substream
     182  <a href="http://example.org/">a link</a>
     183  >>> print substream.select('@href')
     184  http://example.org/
     185  >>> print substream.select('text()')
     186  a link
    172187}}}
    173188