| [530] | 1 | .. -*- mode: rst; encoding: utf-8 -*- |
|---|
| 2 | |
|---|
| 3 | ============== |
|---|
| 4 | Stream Filters |
|---|
| 5 | ============== |
|---|
| 6 | |
|---|
| 7 | `Markup Streams`_ showed how to write filters and how they are applied to |
|---|
| 8 | markup streams. This page describes the features of the various filters that |
|---|
| 9 | come with Genshi itself. |
|---|
| 10 | |
|---|
| 11 | .. _`Markup Streams`: streams.html |
|---|
| 12 | |
|---|
| 13 | .. contents:: Contents |
|---|
| 14 | :depth: 1 |
|---|
| 15 | .. sectnum:: |
|---|
| 16 | |
|---|
| 17 | |
|---|
| 18 | HTML Form Filler |
|---|
| 19 | ================ |
|---|
| 20 | |
|---|
| [609] | 21 | The filter ``genshi.filters.html.HTMLFormFiller`` can automatically populate an |
|---|
| [799] | 22 | HTML form from values provided as a simple dictionary. When using this filter, |
|---|
| [609] | 23 | you can basically omit any ``value``, ``selected``, or ``checked`` attributes |
|---|
| 24 | from form controls in your templates, and let the filter do all that work for |
|---|
| 25 | you. |
|---|
| [530] | 26 | |
|---|
| 27 | ``HTMLFormFiller`` takes a dictionary of data to populate the form with, where |
|---|
| 28 | the keys should match the names of form elements, and the values determine the |
|---|
| [614] | 29 | values of those controls. For example: |
|---|
| [530] | 30 | |
|---|
| [614] | 31 | .. code-block:: pycon |
|---|
| 32 | |
|---|
| [530] | 33 | >>> from genshi.filters import HTMLFormFiller |
|---|
| 34 | >>> from genshi.template import MarkupTemplate |
|---|
| [609] | 35 | |
|---|
| [530] | 36 | >>> template = MarkupTemplate("""<form> |
|---|
| 37 | ... <p> |
|---|
| 38 | ... <label>User name: |
|---|
| 39 | ... <input type="text" name="username" /> |
|---|
| 40 | ... </ label><br /> |
|---|
| 41 | ... <label>Password : |
|---|
| 42 | ... <input type="password" name="passwor d" /> |
|---|
| 43 | ... </label>< br /> |
|---|
| 44 | ... <label> |
|---|
| 45 | ... <input type="checkbox" name="remember" /> Rem ember me |
|---|
| 46 | ... </labe l> |
|---|
| 47 | ... </p> |
|---|
| 48 | .. . </form>""") |
|---|
| 49 | >>> filler = HTMLFormFiller(data=dict(username='jo hn', remember=Tru e)) |
|---|
| [1076] | 50 | >>> pri nt(template.gene rate() | filler) |
|---|
| [530] | 51 | <form> |
|---|
| 52 | <p> |
|---|
| 53 | <label>User name: |
|---|
| 54 | <input type="text" name="username" value="john"/> |
|---|
| 55 | </label><br/> |
|---|
| 56 | <label>Password: |
|---|
| 57 | <input type="password" name="password"/> |
|---|
| 58 | </label><br/> |
|---|
| 59 | <label> |
|---|
| 60 | <input type="checkbox" name="remember" checked="checked"/> Remember me |
|---|
| 61 | </label> |
|---|
| 62 | </p> |
|---|
| 63 | </form> |
|---|
| 64 | |
|---|
| 65 | .. note:: This processing is done without in any way reparsing the template |
|---|
| 66 | output. As any stream filter it operates after the template output is |
|---|
| 67 | generated but *before* that output is actually serialized. |
|---|
| 68 | |
|---|
| 69 | The filter will of course also handle radio buttons as well as ``<select>`` and |
|---|
| 70 | ``<textarea>`` elements. For radio buttons to be marked as checked, the value in |
|---|
| 71 | the data dictionary needs to match the ``value`` attribute of the ``<input>`` |
|---|
| 72 | element, or evaluate to a truth value if the element has no such attribute. For |
|---|
| 73 | options in a ``<select>`` box to be marked as selected, the value in the data |
|---|
| 74 | dictionary needs to match the ``value`` attribute of the ``<option>`` element, |
|---|
| 75 | or the text content of the option if it has no ``value`` attribute. Password and |
|---|
| 76 | file input fields are not populated, as most browsers would ignore that anyway |
|---|
| 77 | for security reasons. |
|---|
| 78 | |
|---|
| 79 | You'll want to make sure that the values in the data dictionary have already |
|---|
| 80 | been converted to strings. While the filter may be able to deal with non-string |
|---|
| 81 | data in some cases (such as check boxes), in most cases it will either not |
|---|
| 82 | attempt any conversion or not produce the desired results. |
|---|
| 83 | |
|---|
| 84 | You can restrict the form filler to operate only on a specific ``<form>`` by |
|---|
| 85 | passing either the ``id`` or the ``name`` keyword argument to the initializer. |
|---|
| 86 | If either of those is specified, the filter will only apply to form tags with |
|---|
| 87 | an attribute matching the specified value. |
|---|
| 88 | |
|---|
| 89 | |
|---|
| 90 | HTML Sanitizer |
|---|
| 91 | ============== |
|---|
| 92 | |
|---|
| [609] | 93 | The filter ``genshi.filters.html.HTMLSanitizer`` filter can be used to clean up |
|---|
| [530] | 94 | user-submitted HTML markup, removing potentially dangerous constructs that could |
|---|
| [614] | 95 | be used for various kinds of abuse, such as cross-site scripting (XSS) attacks: |
|---|
| [530] | 96 | |
|---|
| [614] | 97 | .. code-block:: pycon |
|---|
| 98 | |
|---|
| [530] | 99 | >>> from genshi.filters import HTMLSanitizer |
|---|
| 100 | >>> from genshi.input import HTML |
|---|
| [609] | 101 | |
|---|
| [1194] | 102 | >>> html = HTML(u"""<div> |
|---|
| [530] | 103 | ... <p>Innocent looking text.</p> |
|---|
| 104 | ... <script>alert("Danger: " + document.co okie)</script> |
|---|
| 105 | ... </div>""") |
|---|
| 106 | >>> sa nitize = HTMLSanitizer() |
|---|
| [1076] | 107 | >>> print(html | sanitize) |
|---|
| [530] | 108 | <div> |
|---|
| 109 | <p>Innocent looking text.</p> |
|---|
| 110 | </div> |
|---|
| 111 | |
|---|
| 112 | In this example, the ``<script>`` tag was removed from the output. |
|---|
| 113 | |
|---|
| 114 | You can determine which tags and attributes should be allowed by initializing |
|---|
| 115 | the filter with corresponding sets. See the API documentation for more |
|---|
| 116 | information. |
|---|
| 117 | |
|---|
| 118 | Inline ``style`` attributes are forbidden by default. If you allow them, the |
|---|
| 119 | filter will still perform sanitization on the contents any encountered inline |
|---|
| 120 | styles: the proprietary ``expression()`` function (supported only by Internet |
|---|
| 121 | Explorer) is removed, and any property using an ``url()`` which a potentially |
|---|
| [614] | 122 | dangerous URL scheme (such as ``javascript:``) are also stripped out: |
|---|
| [530] | 123 | |
|---|
| [614] | 124 | .. code-block:: pycon |
|---|
| 125 | |
|---|
| [530] | 126 | >>> from genshi.filters import HTMLSanitizer |
|---|
| 127 | >>> from genshi.input import HTML |
|---|
| [609] | 128 | |
|---|
| [1194] | 129 | >>> html = HTML(u"""<div> |
|---|
| [530] | 130 | ... <br style="background: url(javascript:alert(document.cookie); color: #000" /> |
|---|
| 131 | ... </div>""") |
|---|
| 132 | >>> sanitize = HTMLSanitizer(safe_attrs=HTMLSanitizer.SAFE_ATTRS | set(['style'] )) |
|---|
| [1076] | 133 | >>> print(html | sanitize) |
|---|
| [530] | 134 | <div> |
|---|
| 135 | <br style="color: #000"/> |
|---|
| 136 | </div> |
|---|
| 137 | |
|---|
| 138 | .. warning:: You should probably not rely on the ``style`` filtering, as |
|---|
| 139 | sanitizing mixed HTML, CSS, and Javascript is very complicated and |
|---|
| 140 | suspect to various browser bugs. If you can somehow get away with |
|---|
| 141 | not allowing inline styles in user-submitted content, that would |
|---|
| 142 | definitely be the safer route to follow. |
|---|
| [609] | 143 | |
|---|
| 144 | |
|---|
| 145 | Transformer |
|---|
| 146 | =========== |
|---|
| 147 | |
|---|
| 148 | The filter ``genshi.filters.transform.Transformer`` provides a convenient way to |
|---|
| 149 | transform or otherwise work with markup event streams. It allows you to specify |
|---|
| 150 | which parts of the stream you're interested in with XPath expressions, and then |
|---|
| [614] | 151 | attach a variety of transformations to the parts that match: |
|---|
| [609] | 152 | |
|---|
| [614] | 153 | .. code-block:: pycon |
|---|
| 154 | |
|---|
| [609] | 155 | >>> from genshi.builder import tag |
|---|
| 156 | >>> from genshi.core import TEXT |
|---|
| 157 | >>> from genshi.filters import Transformer |
|---|
| 158 | >>> from genshi.input import HTML |
|---|
| 159 | |
|---|
| [1194] | 160 | >>> html = HTML(u'''<html> |
|---|
| [609] | 161 | ... <head><title>Some Title</title></head> |
|---|
| 162 | ... <body> |
|---|
| 163 | ... Some <em>body</em> text. |
|---|
| 164 | ... </body> |
|---|
| 165 | ... </html>''') |
|---|
| 166 | |
|---|
| [1076] | 167 | >>> print(html | Transformer('body/em').map(unicode.upper, TEXT) |
|---|
| 168 | ... .unwrap().wrap(tag.u).end() |
|---|
| 169 | ... .select('body/u') |
|---|
| 170 | ... .prepend('underlined ')) |
|---|
| [609] | 171 | <html> |
|---|
| 172 | <head><title>Some Title</title></head> |
|---|
| 173 | <body> |
|---|
| [623] | 174 | Some <u>underlined BODY</u> text. |
|---|
| [609] | 175 | </body> |
|---|
| 176 | </html> |
|---|
| 177 | |
|---|
| 178 | This example sets up a transformation that: |
|---|
| 179 | |
|---|
| 180 | 1. matches any `<em>` element anywhere in the body, |
|---|
| 181 | 2. uppercases any text nodes in the element, |
|---|
| [623] | 182 | 3. strips off the `<em>` start and close tags, |
|---|
| 183 | 4. wraps the content in a `<u>` tag, and |
|---|
| [641] | 184 | 5. inserts the text `underlined` inside the `<u>` tag. |
|---|
| [609] | 185 | |
|---|
| 186 | A number of commonly useful transformations are available for this filter. |
|---|
| 187 | Please consult the API documentation a complete list. |
|---|
| 188 | |
|---|
| 189 | In addition, you can also perform custom transformations. For example, the |
|---|
| [614] | 190 | following defines a transformation that changes the name of a tag: |
|---|
| [609] | 191 | |
|---|
| [614] | 192 | .. code-block:: pycon |
|---|
| 193 | |
|---|
| [609] | 194 | >>> from genshi import QName |
|---|
| 195 | >>> from genshi.filters.transform import ENTER, EXIT |
|---|
| 196 | |
|---|
| 197 | >>> class RenameTransformation(object): |
|---|
| 198 | ... def __init__(self, name): |
|---|
| 199 | ... self.name = QName(name) |
|---|
| 200 | ... def __ca ll__(self, stream): |
|---|
| 201 | ... fo r mark, (kind, data, pos) in stream : |
|---|
| 202 | ... if mark is ENTER: |
|---|
| 203 | ... data = self.name, data[1] |
|---|
| 204 | ... elif mark is EXIT: |
|---|
| 205 | ... data = self.name |
|---|
| 206 | ... yield mark, (kind, data, pos) |
|---|
| 207 | |
|---|
| 208 | A transformation can be any callable object that accepts an augmented event |
|---|
| 209 | stream. In this case we define a class, so that we can initialize it with the |
|---|
| 210 | tag name. |
|---|
| 211 | |
|---|
| [641] | 212 | Custom transformations can be applied using the `apply()` method of a |
|---|
| 213 | transformer instance: |
|---|
| [609] | 214 | |
|---|
| [614] | 215 | .. code-block:: pycon |
|---|
| 216 | |
|---|
| [641] | 217 | >>> xform = Transformer('body//em').map(unicode.upper, TEXT) \ |
|---|
| 218 | >>> xform = xform.apply(RenameTransformation('u')) |
|---|
| [1076] | 219 | >>> print(html | xform) |
|---|
| [609] | 220 | <html> |
|---|
| 221 | <head><title>Some Title</title></head> |
|---|
| 222 | <body> |
|---|
| 223 | Some <u>BODY</u> text. |
|---|
| 224 | </body> |
|---|
| 225 | </html> |
|---|
| 226 | |
|---|
| [614] | 227 | .. note:: The transformation filter was added in Genshi 0.5. |
|---|
| [609] | 228 | |
|---|
| 229 | |
|---|
| 230 | Translator |
|---|
| 231 | ========== |
|---|
| 232 | |
|---|
| 233 | The ``genshi.filters.i18n.Translator`` filter implements basic support for |
|---|
| 234 | internationalizing and localizing templates. When used as a filter, it |
|---|
| 235 | translates a configurable set of text nodes and attribute values using a |
|---|
| 236 | ``gettext``-style translation function. |
|---|
| 237 | |
|---|
| 238 | The ``Translator`` class also defines the ``extract`` class method, which can |
|---|
| 239 | be used to extract localizable messages from a template. |
|---|
| 240 | |
|---|
| 241 | Please refer to the API documentation for more information on this filter. |
|---|
| [614] | 242 | |
|---|
| 243 | .. note:: The translation filter was added in Genshi 0.4. |
|---|