| [283] | 1 | .. -*- mode: rst; encoding: utf-8 -*- |
|---|
| 2 | |
|---|
| 3 | ============== |
|---|
| 4 | Markup Streams |
|---|
| 5 | ============== |
|---|
| 6 | |
|---|
| 7 | A stream is the common representation of markup as a *stream of events*. |
|---|
| 8 | |
|---|
| 9 | |
|---|
| 10 | .. contents:: Contents |
|---|
| [869] | 11 | :depth: 2 |
|---|
| [283] | 12 | .. sectnum:: |
|---|
| 13 | |
|---|
| 14 | |
|---|
| 15 | Basics |
|---|
| 16 | ====== |
|---|
| 17 | |
|---|
| 18 | A stream can be attained in a number of ways. It can be: |
|---|
| 19 | |
|---|
| 20 | * the result of parsing XML or HTML text, or |
|---|
| [530] | 21 | * the result of selecting a subset of another stream using XPath, or |
|---|
| 22 | * programmatically generated. |
|---|
| [283] | 23 | |
|---|
| 24 | For example, the functions ``XML()`` and ``HTML()`` can be used to convert |
|---|
| [612] | 25 | literal XML or HTML text to a markup stream: |
|---|
| [283] | 26 | |
|---|
| [614] | 27 | .. code-block:: pycon |
|---|
| [612] | 28 | |
|---|
| [287] | 29 | >>> from genshi import XML |
|---|
| [283] | 30 | >>> stream = XML('<p class="intro">Some text and ' |
|---|
| 31 | ... '<a href="http://example.org/">a link</a>.' |
|---|
| 32 | ... '<br/></p>') |
|---|
| 33 | >>> stream |
|---|
| [464] | 34 | <genshi.core.Stream object at ...> |
|---|
| [283] | 35 | |
|---|
| 36 | The stream is the result of parsing the text into events. Each event is a tuple |
|---|
| 37 | of the form ``(kind, data, pos)``, where: |
|---|
| 38 | |
|---|
| 39 | * ``kind`` defines what kind of event it is (such as the start of an element, |
|---|
| 40 | text, a comment, etc). |
|---|
| 41 | * ``data`` is the actual data associated with the event. How this looks depends |
|---|
| [464] | 42 | on the event kind (see `event kinds`_) |
|---|
| [283] | 43 | * ``pos`` is a ``(filename, lineno, column)`` tuple that describes where the |
|---|
| 44 | event “comes from”. |
|---|
| 45 | |
|---|
| [614] | 46 | .. code-block:: pycon |
|---|
| [283] | 47 | |
|---|
| 48 | >>> for kind, data, pos in stream: |
|---|
| [1076] | 49 | ... print('%s %r %r' % (kind, data, pos)) |
|---|
| [283] | 50 | ... |
|---|
| [1080] | 51 | START (QName('p'), Attrs([(QName('class'), u'intro')])) (None, 1, 0) |
|---|
| [464] | 52 | TEXT u'Some text and ' (None, 1, 17) |
|---|
| [1080] | 53 | START (QName('a'), Attrs([(QName('href'), u'http://example.org/')])) (None, 1, 31) |
|---|
| [464] | 54 | TEXT u'a link' (None, 1, 61) |
|---|
| [1080] | 55 | END QName('a') (None, 1, 67) |
|---|
| [464] | 56 | TEXT u'.' (None, 1, 71) |
|---|
| [1080] | 57 | START (QName('br'), Attrs()) (None, 1, 72) |
|---|
| 58 | END QName('br') (None, 1, 77) |
|---|
| 59 | END QName('p') (None, 1, 77) |
|---|
| [283] | 60 | |
|---|
| 61 | |
|---|
| 62 | Filtering |
|---|
| 63 | ========= |
|---|
| 64 | |
|---|
| 65 | One important feature of markup streams is that you can apply *filters* to the |
|---|
| [287] | 66 | stream, either filters that come with Genshi, or your own custom filters. |
|---|
| [283] | 67 | |
|---|
| 68 | A filter is simply a callable that accepts the stream as parameter, and returns |
|---|
| [612] | 69 | the filtered stream: |
|---|
| [283] | 70 | |
|---|
| [612] | 71 | .. code-block:: python |
|---|
| 72 | |
|---|
| [283] | 73 | def noop(stream): |
|---|
| 74 | """A filter that doesn't actually do anything with the stream.""" |
|---|
| 75 | for kind, data, pos in stream: |
|---|
| 76 | yield kind, data, pos |
|---|
| 77 | |
|---|
| 78 | Filters can be applied in a number of ways. The simplest is to just call the |
|---|
| [612] | 79 | filter directly: |
|---|
| [283] | 80 | |
|---|
| [612] | 81 | .. code-block:: python |
|---|
| 82 | |
|---|
| [283] | 83 | stream = noop(stream) |
|---|
| 84 | |
|---|
| 85 | The ``Stream`` class also provides a ``filter()`` method, which takes an |
|---|
| [612] | 86 | arbitrary number of filter callables and applies them all: |
|---|
| [283] | 87 | |
|---|
| [612] | 88 | .. code-block:: python |
|---|
| 89 | |
|---|
| [283] | 90 | stream = stream.filter(noop) |
|---|
| 91 | |
|---|
| 92 | Finally, filters can also be applied using the *bitwise or* operator (``|``), |
|---|
| [612] | 93 | which allows a syntax similar to pipes on Unix shells: |
|---|
| [283] | 94 | |
|---|
| [612] | 95 | .. code-block:: python |
|---|
| 96 | |
|---|
| [283] | 97 | stream = stream | noop |
|---|
| 98 | |
|---|
| [287] | 99 | One example of a filter included with Genshi is the ``HTMLSanitizer`` in |
|---|
| 100 | ``genshi.filters``. It processes a stream of HTML markup, and strips out any |
|---|
| [283] | 101 | potentially dangerous constructs, such as Javascript event handlers. |
|---|
| 102 | ``HTMLSanitizer`` is not a function, but rather a class that implements |
|---|
| [612] | 103 | ``__call__``, which means instances of the class are callable: |
|---|
| [283] | 104 | |
|---|
| [612] | 105 | .. code-block:: python |
|---|
| 106 | |
|---|
| [530] | 107 | stream = stream | HTMLSanitizer() |
|---|
| 108 | |
|---|
| [283] | 109 | Both the ``filter()`` method and the pipe operator allow easy chaining of |
|---|
| [612] | 110 | filters: |
|---|
| [283] | 111 | |
|---|
| [612] | 112 | .. code-block:: python |
|---|
| 113 | |
|---|
| [287] | 114 | from genshi.filters import HTMLSanitizer |
|---|
| [283] | 115 | stream = stream.filter(noop, HTMLSanitizer()) |
|---|
| 116 | |
|---|
| [612] | 117 | That is equivalent to: |
|---|
| [283] | 118 | |
|---|
| [612] | 119 | .. code-block:: python |
|---|
| 120 | |
|---|
| [283] | 121 | stream = stream | noop | HTMLSanitizer() |
|---|
| 122 | |
|---|
| [530] | 123 | For more information about the built-in filters, see `Stream Filters`_. |
|---|
| [283] | 124 | |
|---|
| [530] | 125 | .. _`Stream Filters`: filters.html |
|---|
| 126 | |
|---|
| 127 | |
|---|
| [283] | 128 | Serialization |
|---|
| 129 | ============= |
|---|
| 130 | |
|---|
| [530] | 131 | Serialization means producing some kind of textual output from a stream of |
|---|
| 132 | events, which you'll need when you want to transmit or store the results of |
|---|
| 133 | generating or otherwise processing markup. |
|---|
| [283] | 134 | |
|---|
| [869] | 135 | The ``Stream`` class provides two methods for serialization: ``serialize()`` |
|---|
| 136 | and ``render()``. The former is a generator that yields chunks of ``Markup`` |
|---|
| 137 | objects (which are basically unicode strings that are considered safe for |
|---|
| 138 | output on the web). The latter returns a single string, by default UTF-8 |
|---|
| 139 | encoded. |
|---|
| [530] | 140 | |
|---|
| [612] | 141 | Here's the output from ``serialize()``: |
|---|
| [283] | 142 | |
|---|
| [614] | 143 | .. code-block:: pycon |
|---|
| [612] | 144 | |
|---|
| [283] | 145 | >>> for output in stream.serialize(): |
|---|
| [1076] | 146 | ... print(repr(output)) |
|---|
| [283] | 147 | ... |
|---|
| 148 | <Markup u'<p class="intro">'> |
|---|
| 149 | <Markup u'Some text and '> |
|---|
| 150 | <Markup u'<a href="http://example.org/">'> |
|---|
| 151 | <Markup u'a link'> |
|---|
| 152 | <Markup u'</a>'> |
|---|
| 153 | <Markup u'.'> |
|---|
| 154 | <Markup u'<br/>'> |
|---|
| 155 | <Markup u'</p>'> |
|---|
| 156 | |
|---|
| [612] | 157 | And here's the output from ``render()``: |
|---|
| [283] | 158 | |
|---|
| [614] | 159 | .. code-block:: pycon |
|---|
| [612] | 160 | |
|---|
| [1076] | 161 | >>> print(stream.render()) |
|---|
| [283] | 162 | <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br/></p> |
|---|
| 163 | |
|---|
| 164 | Both methods can be passed a ``method`` parameter that determines how exactly |
|---|
| [869] | 165 | the events are serialized to text. This parameter can be either a string or a |
|---|
| 166 | custom serializer class: |
|---|
| [283] | 167 | |
|---|
| [614] | 168 | .. code-block:: pycon |
|---|
| [612] | 169 | |
|---|
| [1076] | 170 | >>> print(stream.render('html')) |
|---|
| [283] | 171 | <p class="intro">Some text and <a href="http://example.org/">a link</a>.<br></p> |
|---|
| 172 | |
|---|
| 173 | Note how the `<br>` element isn't closed, which is the right thing to do for |
|---|
| [869] | 174 | HTML. See `serialization methods`_ for more details. |
|---|
| [283] | 175 | |
|---|
| 176 | In addition, the ``render()`` method takes an ``encoding`` parameter, which |
|---|
| 177 | defaults to “UTF-8”. If set to ``None``, the result will be a unicode string. |
|---|
| 178 | |
|---|
| [287] | 179 | The different serializer classes in ``genshi.output`` can also be used |
|---|
| [612] | 180 | directly: |
|---|
| [283] | 181 | |
|---|
| [614] | 182 | .. code-block:: pycon |
|---|
| [612] | 183 | |
|---|
| [287] | 184 | >>> from genshi.filters import HTMLSanitizer |
|---|
| 185 | >>> from genshi.output import TextSerializer |
|---|
| [1076] | 186 | >>> print(''.join(TextSerializer()(HTMLSanitizer()(stream)))) |
|---|
| [283] | 187 | Some text and a link. |
|---|
| 188 | |
|---|
| [612] | 189 | The pipe operator allows a nicer syntax: |
|---|
| [283] | 190 | |
|---|
| [614] | 191 | .. code-block:: pycon |
|---|
| [612] | 192 | |
|---|
| [1076] | 193 | >>> print(stream | HTMLSanitizer() | TextSerializer()) |
|---|
| [283] | 194 | Some text and a link. |
|---|
| 195 | |
|---|
| [464] | 196 | |
|---|
| [869] | 197 | .. _`serialization methods`: |
|---|
| 198 | |
|---|
| 199 | Serialization Methods |
|---|
| 200 | --------------------- |
|---|
| 201 | |
|---|
| 202 | Genshi supports the use of different serialization methods to use for creating |
|---|
| 203 | a text representation of a markup stream. |
|---|
| 204 | |
|---|
| 205 | ``xml`` |
|---|
| 206 | The ``XMLSerializer`` is the default serialization method and results in |
|---|
| 207 | proper XML output including namespace support, the XML declaration, CDATA |
|---|
| 208 | sections, and so on. It is not generally not suitable for serving HTML or |
|---|
| 209 | XHTML web pages (unless you want to use true XHTML 1.1), for which the |
|---|
| 210 | ``xhtml`` and ``html`` serializers described below should be preferred. |
|---|
| 211 | |
|---|
| 212 | ``xhtml`` |
|---|
| 213 | The ``XHTMLSerializer`` is a specialization of the generic ``XMLSerializer`` |
|---|
| 214 | that understands the pecularities of producing XML-compliant output that can |
|---|
| 215 | also be parsed without problems by the HTML parsers found in modern web |
|---|
| 216 | browsers. Thus, the output by this serializer should be usable whether sent |
|---|
| 217 | as "text/html" or "application/xhtml+html" (although there are a lot of |
|---|
| 218 | subtle issues to pay attention to when switching between the two, in |
|---|
| 219 | particular with respect to differences in the DOM and CSS). |
|---|
| 220 | |
|---|
| 221 | For example, instead of rendering a script tag as ``<script/>`` (which |
|---|
| 222 | confuses the HTML parser in many browsers), it will produce |
|---|
| 223 | ``<script></script>``. Also, it will normalize any boolean attributes values |
|---|
| 224 | that are minimized in HTML, so that for example ``<hr noshade="1"/>`` |
|---|
| 225 | becomes ``<hr noshade="noshade" />``. |
|---|
| 226 | |
|---|
| 227 | This serializer supports the use of namespaces for compound documents, for |
|---|
| 228 | example to use inline SVG inside an XHTML document. |
|---|
| 229 | |
|---|
| 230 | ``html`` |
|---|
| 231 | The ``HTMLSerializer`` produces proper HTML markup. The main differences |
|---|
| 232 | compared to ``xhtml`` serialization are that boolean attributes are |
|---|
| 233 | minimized, empty tags are not self-closing (so it's ``<br>`` instead of |
|---|
| 234 | ``<br />``), and that the contents of ``<script>`` and ``<style>`` elements |
|---|
| 235 | are not escaped. |
|---|
| 236 | |
|---|
| 237 | ``text`` |
|---|
| 238 | The ``TextSerializer`` produces plain text from markup streams. This is |
|---|
| 239 | useful primarily for `text templates`_, but can also be used to produce |
|---|
| 240 | plain text output from markup templates or other sources. |
|---|
| 241 | |
|---|
| 242 | .. _`text templates`: text-templates.html |
|---|
| 243 | |
|---|
| 244 | |
|---|
| [530] | 245 | Serialization Options |
|---|
| 246 | --------------------- |
|---|
| 247 | |
|---|
| 248 | Both ``serialize()`` and ``render()`` support additional keyword arguments that |
|---|
| 249 | are passed through to the initializer of the serializer class. The following |
|---|
| 250 | options are supported by the built-in serializers: |
|---|
| 251 | |
|---|
| 252 | ``strip_whitespace`` |
|---|
| [869] | 253 | Whether the serializer should remove trailing spaces and empty lines. |
|---|
| 254 | Defaults to ``True``. |
|---|
| [530] | 255 | |
|---|
| 256 | (This option is not available for serialization to plain text.) |
|---|
| 257 | |
|---|
| 258 | ``doctype`` |
|---|
| 259 | A ``(name, pubid, sysid)`` tuple defining the name, publid identifier, and |
|---|
| 260 | system identifier of a ``DOCTYPE`` declaration to prepend to the generated |
|---|
| 261 | output. If provided, this declaration will override any ``DOCTYPE`` |
|---|
| 262 | declaration in the stream. |
|---|
| 263 | |
|---|
| [869] | 264 | The parameter can also be specified as a string to refer to commonly used |
|---|
| 265 | doctypes: |
|---|
| 266 | |
|---|
| 267 | +-----------------------------+-------------------------------------------+ |
|---|
| 268 | | Shorthand | DOCTYPE | |
|---|
| 269 | +=============================+===========================================+ |
|---|
| 270 | | ``html`` or | HTML 4.01 Strict | |
|---|
| 271 | | ``html-strict`` | | |
|---|
| 272 | +-----------------------------+-------------------------------------------+ |
|---|
| 273 | | ``html-transitional`` | HTML 4.01 Transitional | |
|---|
| 274 | +-----------------------------+-------------------------------------------+ |
|---|
| 275 | | ``html-frameset`` | HTML 4.01 Frameset | |
|---|
| 276 | +-----------------------------+-------------------------------------------+ |
|---|
| 277 | | ``html5`` | DOCTYPE proposed for the work-in-progress | |
|---|
| 278 | | | HTML5 standard | |
|---|
| 279 | +-----------------------------+-------------------------------------------+ |
|---|
| 280 | | ``xhtml`` or | XHTML 1.0 Strict | |
|---|
| 281 | | ``xhtml-strict`` | | |
|---|
| 282 | +-----------------------------+-------------------------------------------+ |
|---|
| 283 | | ``xhtml-transitional`` | XHTML 1.0 Transitional | |
|---|
| 284 | +-----------------------------+-------------------------------------------+ |
|---|
| 285 | | ``xhtml-frameset`` | XHTML 1.0 Frameset | |
|---|
| 286 | +-----------------------------+-------------------------------------------+ |
|---|
| 287 | | ``xhtml11`` | XHTML 1.1 | |
|---|
| 288 | +-----------------------------+-------------------------------------------+ |
|---|
| 289 | | ``svg`` or ``svg-full`` | SVG 1.1 | |
|---|
| 290 | +-----------------------------+-------------------------------------------+ |
|---|
| 291 | | ``svg-basic`` | SVG 1.1 Basic | |
|---|
| 292 | +-----------------------------+-------------------------------------------+ |
|---|
| 293 | | ``svg-tiny`` | SVG 1.1 Tiny | |
|---|
| 294 | +-----------------------------+-------------------------------------------+ |
|---|
| 295 | |
|---|
| [530] | 296 | (This option is not available for serialization to plain text.) |
|---|
| 297 | |
|---|
| 298 | ``namespace_prefixes`` |
|---|
| 299 | The namespace prefixes to use for namespace that are not bound to a prefix |
|---|
| 300 | in the stream itself. |
|---|
| 301 | |
|---|
| 302 | (This option is not available for serialization to HTML or plain text.) |
|---|
| 303 | |
|---|
| [853] | 304 | ``drop_xml_decl`` |
|---|
| 305 | Whether to remove the XML declaration (the ``<?xml ?>`` part at the |
|---|
| 306 | beginning of a document) when serializing. This defaults to ``True`` as an |
|---|
| 307 | XML declaration throws some older browsers into "Quirks" rendering mode. |
|---|
| [530] | 308 | |
|---|
| [853] | 309 | (This option is only available for serialization to XHTML.) |
|---|
| [530] | 310 | |
|---|
| [869] | 311 | ``strip_markup`` |
|---|
| 312 | Whether the text serializer should detect and remove any tags or entity |
|---|
| 313 | encoded characters in the text. |
|---|
| [853] | 314 | |
|---|
| [869] | 315 | (This option is only available for serialization to plain text.) |
|---|
| [853] | 316 | |
|---|
| [869] | 317 | |
|---|
| 318 | |
|---|
| [283] | 319 | Using XPath |
|---|
| 320 | =========== |
|---|
| 321 | |
|---|
| 322 | XPath can be used to extract a specific subset of the stream via the |
|---|
| [612] | 323 | ``select()`` method: |
|---|
| [283] | 324 | |
|---|
| [614] | 325 | .. code-block:: pycon |
|---|
| [612] | 326 | |
|---|
| [283] | 327 | >>> substream = stream.select('a') |
|---|
| 328 | >>> substream |
|---|
| [464] | 329 | <genshi.core.Stream object at ...> |
|---|
| [1076] | 330 | >>> print(substream) |
|---|
| [283] | 331 | <a href="http://example.org/">a link</a> |
|---|
| 332 | |
|---|
| 333 | Often, streams cannot be reused: in the above example, the sub-stream is based |
|---|
| 334 | on a generator. Once it has been serialized, it will have been fully consumed, |
|---|
| 335 | and cannot be rendered again. To work around this, you can wrap such a stream |
|---|
| [612] | 336 | in a ``list``: |
|---|
| [283] | 337 | |
|---|
| [614] | 338 | .. code-block:: pycon |
|---|
| [612] | 339 | |
|---|
| [287] | 340 | >>> from genshi import Stream |
|---|
| [283] | 341 | >>> substream = Stream(list(stream.select('a'))) |
|---|
| 342 | >>> substream |
|---|
| [464] | 343 | <genshi.core.Stream object at ...> |
|---|
| [1076] | 344 | >>> print(substream) |
|---|
| [283] | 345 | <a href="http://example.org/">a link</a> |
|---|
| [1076] | 346 | >>> print(substream.select('@href')) |
|---|
| [283] | 347 | http://example.org/ |
|---|
| [1076] | 348 | >>> print(substream.select('text()')) |
|---|
| [283] | 349 | a link |
|---|
| [464] | 350 | |
|---|
| 351 | See `Using XPath in Genshi`_ for more information about the XPath support in |
|---|
| 352 | Genshi. |
|---|
| 353 | |
|---|
| 354 | .. _`Using XPath in Genshi`: xpath.html |
|---|
| 355 | |
|---|
| 356 | |
|---|
| 357 | .. _`event kinds`: |
|---|
| 358 | |
|---|
| 359 | Event Kinds |
|---|
| 360 | =========== |
|---|
| 361 | |
|---|
| 362 | Every event in a stream is of one of several *kinds*, which also determines |
|---|
| 363 | what the ``data`` item of the event tuple looks like. The different kinds of |
|---|
| 364 | events are documented below. |
|---|
| 365 | |
|---|
| [478] | 366 | .. note:: The ``data`` item is generally immutable. If the data is to be |
|---|
| [464] | 367 | modified when processing a stream, it must be replaced by a new tuple. |
|---|
| 368 | Effectively, this means the entire event tuple is immutable. |
|---|
| 369 | |
|---|
| 370 | START |
|---|
| 371 | ----- |
|---|
| 372 | The opening tag of an element. |
|---|
| 373 | |
|---|
| 374 | For this kind of event, the ``data`` item is a tuple of the form |
|---|
| 375 | ``(tagname, attrs)``, where ``tagname`` is a ``QName`` instance describing the |
|---|
| 376 | qualified name of the tag, and ``attrs`` is an ``Attrs`` instance containing |
|---|
| 377 | the attribute names and values associated with the tag (excluding namespace |
|---|
| [612] | 378 | declarations): |
|---|
| [464] | 379 | |
|---|
| [612] | 380 | .. code-block:: python |
|---|
| 381 | |
|---|
| [1080] | 382 | START, (QName('p'), Attrs([(QName('class'), u'intro')])), pos |
|---|
| [464] | 383 | |
|---|
| 384 | END |
|---|
| 385 | --- |
|---|
| 386 | The closing tag of an element. |
|---|
| 387 | |
|---|
| 388 | The ``data`` item of end events consists of just a ``QName`` instance |
|---|
| [612] | 389 | describing the qualified name of the tag: |
|---|
| [464] | 390 | |
|---|
| [612] | 391 | .. code-block:: python |
|---|
| 392 | |
|---|
| [1080] | 393 | END, QName('p'), pos |
|---|
| [464] | 394 | |
|---|
| 395 | TEXT |
|---|
| 396 | ---- |
|---|
| [478] | 397 | Character data outside of elements and comments. |
|---|
| [464] | 398 | |
|---|
| [612] | 399 | For text events, the ``data`` item should be a unicode object: |
|---|
| [464] | 400 | |
|---|
| [612] | 401 | .. code-block:: python |
|---|
| 402 | |
|---|
| [464] | 403 | TEXT, u'Hello, world!', pos |
|---|
| 404 | |
|---|
| 405 | START_NS |
|---|
| 406 | -------- |
|---|
| 407 | The start of a namespace mapping, binding a namespace prefix to a URI. |
|---|
| 408 | |
|---|
| 409 | The ``data`` item of this kind of event is a tuple of the form |
|---|
| 410 | ``(prefix, uri)``, where ``prefix`` is the namespace prefix and ``uri`` is the |
|---|
| 411 | full URI to which the prefix is bound. Both should be unicode objects. If the |
|---|
| [612] | 412 | namespace is not bound to any prefix, the ``prefix`` item is an empty string: |
|---|
| [464] | 413 | |
|---|
| [612] | 414 | .. code-block:: python |
|---|
| 415 | |
|---|
| [464] | 416 | START_NS, (u'svg', u'http://www.w3.org/2000/svg'), pos |
|---|
| 417 | |
|---|
| 418 | END_NS |
|---|
| 419 | ------ |
|---|
| 420 | The end of a namespace mapping. |
|---|
| 421 | |
|---|
| 422 | The ``data`` item of such events consists of only the namespace prefix (a |
|---|
| [612] | 423 | unicode object): |
|---|
| [464] | 424 | |
|---|
| [612] | 425 | .. code-block:: python |
|---|
| 426 | |
|---|
| [464] | 427 | END_NS, u'svg', pos |
|---|
| 428 | |
|---|
| 429 | DOCTYPE |
|---|
| 430 | ------- |
|---|
| 431 | A document type declaration. |
|---|
| 432 | |
|---|
| 433 | For this type of event, the ``data`` item is a tuple of the form |
|---|
| 434 | ``(name, pubid, sysid)``, where ``name`` is the name of the root element, |
|---|
| 435 | ``pubid`` is the public identifier of the DTD (or ``None``), and ``sysid`` is |
|---|
| [612] | 436 | the system identifier of the DTD (or ``None``): |
|---|
| [464] | 437 | |
|---|
| [612] | 438 | .. code-block:: python |
|---|
| 439 | |
|---|
| [464] | 440 | DOCTYPE, (u'html', u'-//W3C//DTD XHTML 1.0 Transitional//EN', \ |
|---|
| 441 | u'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'), pos |
|---|
| 442 | |
|---|
| 443 | COMMENT |
|---|
| 444 | ------- |
|---|
| 445 | A comment. |
|---|
| 446 | |
|---|
| 447 | For such events, the ``data`` item is a unicode object containing all character |
|---|
| [612] | 448 | data between the comment delimiters: |
|---|
| [464] | 449 | |
|---|
| [612] | 450 | .. code-block:: python |
|---|
| 451 | |
|---|
| [464] | 452 | COMMENT, u'Commented out', pos |
|---|
| 453 | |
|---|
| 454 | PI |
|---|
| 455 | -- |
|---|
| 456 | A processing instruction. |
|---|
| 457 | |
|---|
| 458 | The ``data`` item is a tuple of the form ``(target, data)`` for processing |
|---|
| 459 | instructions, where ``target`` is the target of the PI (used to identify the |
|---|
| 460 | application by which the instruction should be processed), and ``data`` is text |
|---|
| [612] | 461 | following the target (excluding the terminating question mark): |
|---|
| [464] | 462 | |
|---|
| [612] | 463 | .. code-block:: python |
|---|
| 464 | |
|---|
| [464] | 465 | PI, (u'php', u'echo "Yo" '), pos |
|---|
| 466 | |
|---|
| 467 | START_CDATA |
|---|
| 468 | ----------- |
|---|
| 469 | Marks the beginning of a ``CDATA`` section. |
|---|
| 470 | |
|---|
| [612] | 471 | The ``data`` item for such events is always ``None``: |
|---|
| [464] | 472 | |
|---|
| [612] | 473 | .. code-block:: python |
|---|
| 474 | |
|---|
| [464] | 475 | START_CDATA, None, pos |
|---|
| 476 | |
|---|
| 477 | END_CDATA |
|---|
| 478 | --------- |
|---|
| 479 | Marks the end of a ``CDATA`` section. |
|---|
| 480 | |
|---|
| [612] | 481 | The ``data`` item for such events is always ``None``: |
|---|
| [464] | 482 | |
|---|
| [612] | 483 | .. code-block:: python |
|---|
| 484 | |
|---|
| [464] | 485 | END_CDATA, None, pos |
|---|