Edgewall Software

source: trunk/doc/filters.txt

Last change on this file was 1194, checked in by hodgestar, 11 years ago

Fix doc examples so that test_doc passes.

  • Property svn:eol-style set to native
  • Property svn:mime-type set to text/x-rst
File size: 8.4 KB
RevLine 
[530]1.. -*- mode: rst; encoding: utf-8 -*-
2
3==============
4Stream Filters
5==============
6
7`Markup Streams`_ showed how to write filters and how they are applied to
8markup streams. This page describes the features of the various filters that
9come with Genshi itself.
10
11.. _`Markup Streams`: streams.html
12
13.. contents:: Contents
14   :depth: 1
15.. sectnum::
16
17
18HTML Form Filler
19================
20
[609]21The filter ``genshi.filters.html.HTMLFormFiller`` can automatically populate an
[799]22HTML form from values provided as a simple dictionary. When using this filter,
[609]23you can basically omit any ``value``, ``selected``, or ``checked`` attributes
24from form controls in your templates, and let the filter do all that work for
25you.
[530]26
27``HTMLFormFiller`` takes a dictionary of data to populate the form with, where
28the keys should match the names of form elements, and the values determine the
[614]29values of those controls. For example:
[530]30
[614]31.. code-block:: pycon
32
[530]33  >>> from genshi.filters import HTMLFormFiller
34  >>> from genshi.template import MarkupTemplate
[609]35 
[530]36  >>> template = MarkupTemplate("""<form>
37...   <p>
38...     <label>User name:
39...       <input type="text" name="username" />
40...     </  label><br   />
41...     <label>Password  :
42...       <input type="password" name="passwor  d" />
43...     </label><  br />
44...     <label>
45...         <input type="checkbox" name="remember" /> Rem  ember me
46...     </labe  l>
47...   </p>
48..  . </form>""")
49>>> filler = HTMLFormFiller(data=dict(username='jo  hn', remember=Tru  e))
[1076]50>>> pri  nt(template.gene  rate() | filler)
[530]51    <form>
52    <p>
53      <label>User name:
54        <input type="text" name="username" value="john"/>
55      </label><br/>
56      <label>Password:
57        <input type="password" name="password"/>
58      </label><br/>
59      <label>
60        <input type="checkbox" name="remember" checked="checked"/> Remember me
61      </label>
62    </p>
63  </form>
64
65.. note:: This processing is done without in any way reparsing the template
66          output. As any stream filter it operates after the template output is
67          generated but *before* that output is actually serialized.
68
69The filter will of course also handle radio buttons as well as ``<select>`` and
70``<textarea>`` elements. For radio buttons to be marked as checked, the value in
71the data dictionary needs to match the ``value`` attribute of the ``<input>``
72element, or evaluate to a truth value if the element has no such attribute. For
73options in a ``<select>`` box to be marked as selected, the value in the data
74dictionary needs to match the ``value`` attribute of the ``<option>`` element,
75or the text content of the option if it has no ``value`` attribute. Password and
76file input fields are not populated, as most browsers would ignore that anyway
77for security reasons.
78
79You'll want to make sure that the values in the data dictionary have already
80been converted to strings. While the filter may be able to deal with non-string
81data in some cases (such as check boxes), in most cases it will either not
82attempt any conversion or not produce the desired results.
83
84You can restrict the form filler to operate only on a specific ``<form>`` by
85passing either the ``id`` or the ``name`` keyword argument to the initializer.
86If either of those is specified, the filter will only apply to form tags with
87an attribute matching the specified value.
88
89
90HTML Sanitizer
91==============
92
[609]93The filter ``genshi.filters.html.HTMLSanitizer`` filter can be used to clean up
[530]94user-submitted HTML markup, removing potentially dangerous constructs that could
[614]95be used for various kinds of abuse, such as cross-site scripting (XSS) attacks:
[530]96
[614]97.. code-block:: pycon
98
[530]99  >>> from genshi.filters import HTMLSanitizer
100  >>> from genshi.input import HTML
[609]101 
[1194]102  >>> html = HTML(u"""<div>
[530]103...   <p>Innocent looking text.</p>
104...   <script>alert("Danger: " + document.co  okie)</script>
105... </div>""")
106>>> sa  nitize = HTMLSanitizer()
[1076]107>>> print(html | sanitize)
[530]108        <div>
109    <p>Innocent looking text.</p>
110  </div>
111
112In this example, the ``<script>`` tag was removed from the output.
113
114You can determine which tags and attributes should be allowed by initializing
115the filter with corresponding sets. See the API documentation for more
116information.
117
118Inline ``style`` attributes are forbidden by default. If you allow them, the
119filter will still perform sanitization on the contents any encountered inline
120styles: the proprietary ``expression()`` function (supported only by Internet
121Explorer) is removed, and any property using an ``url()`` which a potentially
[614]122dangerous URL scheme (such as ``javascript:``) are also stripped out:
[530]123
[614]124.. code-block:: pycon
125
[530]126  >>> from genshi.filters import HTMLSanitizer
127  >>> from genshi.input import HTML
[609]128 
[1194]129  >>> html = HTML(u"""<div>
[530]130...   <br style="background: url(javascript:alert(document.cookie); color: #000"   />
131... </div>""")
132>>> sanitize = HTMLSanitizer(safe_attrs=HTMLSanitizer.SAFE_ATTRS   | set(['style']  ))
[1076]133>>> print(html | sanitize)
[530]134    <div>
135    <br style="color: #000"/>
136  </div>
137
138.. warning:: You should probably not rely on the ``style`` filtering, as
139             sanitizing mixed HTML, CSS, and Javascript is very complicated and
140             suspect to various browser bugs. If you can somehow get away with
141             not allowing inline styles in user-submitted content, that would
142             definitely be the safer route to follow.
[609]143
144
145Transformer
146===========
147
148The filter ``genshi.filters.transform.Transformer`` provides a convenient way to
149transform or otherwise work with markup event streams. It allows you to specify
150which parts of the stream you're interested in with XPath expressions, and then
[614]151attach a variety of transformations to the parts that match:
[609]152
[614]153.. code-block:: pycon
154
[609]155  >>> from genshi.builder import tag
156  >>> from genshi.core import TEXT
157  >>> from genshi.filters import Transformer
158  >>> from genshi.input import HTML
159 
[1194]160  >>> html = HTML(u'''<html>
[609]161...   <head><title>Some Title</title></head>
162...   <body>
163...     Some <em>body</em> text.
164...   </body>
165... </html>''')
166           
[1076]167  >>> print(html | Transformer('body/em').map(unicode.upper, TEXT)
168...                                    .unwrap().wrap(tag.u).end()
169...                                    .select('body/u')
170...                                    .prepend('underlined '))
[609]171        <html>
172    <head><title>Some Title</title></head>
173    <body>
[623]174      Some <u>underlined BODY</u> text.
[609]175    </body>
176  </html>
177
178This example sets up a transformation that:
179
180 1. matches any `<em>` element anywhere in the body,
181 2. uppercases any text nodes in the element,
[623]182 3. strips off the `<em>` start and close tags,
183 4. wraps the content in a `<u>` tag, and
[641]184 5. inserts the text `underlined` inside the `<u>` tag.
[609]185
186A number of commonly useful transformations are available for this filter.
187Please consult the API documentation a complete list.
188
189In addition, you can also perform custom transformations. For example, the
[614]190following defines a transformation that changes the name of a tag:
[609]191
[614]192.. code-block:: pycon
193
[609]194  >>> from genshi import QName
195  >>> from genshi.filters.transform import ENTER, EXIT
196 
197  >>> class RenameTransformation(object):
198...    def __init__(self, name):
199...        self.name = QName(name)
200...    def __ca  ll__(self, stream):
201...        fo  r mark, (kind, data, pos) in stream  :
202...            if mark is ENTER:
203  ...                data = self.name, data[1]
204...              elif mark is EXIT:
205...                  data = self.name
206...              yield mark, (kind, data, pos)
207   
208A transformation can be any callable object that accepts an augmented event
209stream. In this case we define a class, so that we can initialize it with the
210tag name.
211
[641]212Custom transformations can be applied using the `apply()` method of a
213transformer instance:
[609]214
[614]215.. code-block:: pycon
216
[641]217  >>> xform = Transformer('body//em').map(unicode.upper, TEXT) \
218  >>> xform = xform.apply(RenameTransformation('u'))
[1076]219  >>> print(html | xform)
[609]220  <html>
221    <head><title>Some Title</title></head>
222    <body>
223      Some <u>BODY</u> text.
224    </body>
225  </html>
226
[614]227.. note:: The transformation filter was added in Genshi 0.5.
[609]228
229
230Translator
231==========
232
233The ``genshi.filters.i18n.Translator`` filter implements basic support for
234internationalizing and localizing templates. When used as a filter, it
235translates a configurable set of text nodes and attribute values using a
236``gettext``-style translation function.
237
238The ``Translator`` class also defines the ``extract`` class method, which can
239be used to extract localizable messages from a template.
240
241Please refer to the API documentation for more information on this filter.
[614]242
243.. note:: The translation filter was added in Genshi 0.4.
Note: See TracBrowser for help on using the repository browser.