Edgewall Software

Changes between Version 71 and Version 72 of GenshiTutorial


Ignore:
Timestamp:
Sep 4, 2007, 7:48:40 PM (17 years ago)
Author:
cmlenz
Comment:

Starting XSS section

Legend:

Unmodified
Added
Removed
Modified
  • GenshiTutorial

    v71 v72  
    99In this tutorial we'll create a simple Python web application based on [http://cherrpy.org/ CherryPy 3]. !CherryPy was chosen because it provides a convenient level of abstraction over raw CGI or [http://wsgi.org/wsgi WSGI] development, but is less ambitious than full-stack web frameworks such as [http://pylonshq.com/ Pylons] or [http://www.djangoproject.com/ Django], which tend to come with a preferred templating language, and often show significant bias towards that language.
    1010
    11 The application we'll build here is a stripped-down version of sites such as [http://reddit.com/ reddit] or [http://digg.com/ digg]: it lets users submit links to online articles they find interesting, and then lets other users comment on those stories. Just for kicks, we'll call that application '''Geddit'''
     11The application we'll build here is a stripped-down version of sites such as [http://reddit.com/ reddit] or [http://digg.com/ digg]: it lets users submit links to online articles they find interesting, and then lets other users comment on those stories. Just for kicks, we'll call that application '''Geddit?'''.
    1212
    1313We'll keep the project as simple as possible, while still showing many of Genshi features and how to best use them:
     
    13441344=== Allowing Markup in Comments ===
    13451345
    1346 At this point we allow users to post plain text comments, but those comments can't include niceties such as hyperlinks or HTML inline formatting (emphasis, etc). A very naive application would simply accept HTML tags in the input, and pass those tags through in the output. That is generally a bad thing, however, as it opens up your site to cross-site scripting (XSS) attack, which can undermine any security measures you try put into effect.
    1347 
    1348   (''Note that as Geddit allows anyone to do anything, we don't actually have any valuable assets to protect, so this exercise is somewhat theoretical. Just imagine we required users to register to submit links or post comments for the next sections.'')
    1349 
    1350 '''TODO''':
    1351  * Details on escaping
    1352  * Ways to include literal markup in output
    1353  * Using HTMLSanitizer
     1346At this point we allow users to post plain text comments, but those comments can't include niceties such as hyperlinks or HTML inline formatting (emphasis, etc). A very naive application would simply accept HTML tags in the input, and pass those tags through in the output. That is generally a bad thing, however, as it opens up your site to cross-site scripting (XSS) attacks, which can undermine any security measures you try put into effect (including SSL). And because this is generally not the behavior you want, Genshi XML-escapes everything by default.
     1347
     1348  (''Note that as Geddit allows anyone to do anything, we don't actually have any valuable assets to protect, so this exercise is somewhat theoretical. For the rest of this section, just imagine we required users to register and login to submit links or post comments.'')
     1349
     1350So what we want to do in this section is to allow users to include HTML tags in their comments, but do so in a safe manner. We do not want to enable malicious users to include Javascript code, or CSS styles that turn the whole page black, or other things that may be considered harmful. In other words, we need to “sanitize” the markup in the comments.
     1351
     1352But let's ignore that aspect for now, and start by making Genshi not escape HTML tags in comments. We'll start by editing `geddit/template/_comment.html`:
     1353
     1354{{{
     1355#!genshi
     1356<?python from genshi import HTML ?>
     1357<li id="comment$num">
     1358  <strong>${comment.username}</strong> at ${comment.time.strftime('%x %X')}
     1359  <blockquote>${HTML(comment.content)}</blockquote>
     1360</li>
     1361}}}
     1362
     1363Here, we've added an import for the Genshi `HTML()` function. This is done using a [wiki:Documentation/templates.html#code-blocks Python code block] via the `<?python ?>` processing instruction. We've already seen that we can use complex Python expressions in templates. By using the `<?python ?>` processing instruction, we can embed any Python statements directly in the template, for example to define classes or functions. In this case we simply import a function that we need to use.
     1364
     1365The `HTML()` function parses a snippet of HTML and returns a Genshi markup stream. It tries to do this in a way that invalid HTML is corrected (for example by fixing the nesting of tags). We then use that function to render the content of the comment. So what does this do, exactly? Well, the comment text is parsed using an HTML parser, fixed up if necessary (and possible), and injected into the template as a markup stream. A template expression that evaluates to a markup stream is treated differently than other data types: it is injected directly into the template output stream, effectively resulting in tags not getting escaped.
     1366
     1367 '''TODO: Mention Markup class'''
     1368
     1369So at this point our users can include HTML tags in their comments, and it will be rendered as HTML. But as noted above, that approach is very dangerous for most real-world applications, so we've got more work to do:
     1370we need to sanitize the markup in the comment so that only markup that can be considered safe is let through. Genshi provide a stream filter to help us here: [wiki:Documentation/filters.html#html-sanitizer HTMLSanitizer].
     1371
     1372In `geddit/controller.py`, first add the imports for the `HTML` function and  the `HTMLSanitizer` filter, so that the imports at the top of the file look something like this:
     1373
     1374{{{
     1375#!python
     1376import cherrypy
     1377from formencode import Invalid
     1378from genshi.input import HTML
     1379from genshi.filters import HTMLFormFiller, HTMLSanitizer
     1380}}}
     1381
     1382Then we'll update the `Root.comment()` method so that it sanitizes comments as they are submitted:
     1383
     1384{{{
     1385#!python
     1386    @cherrypy.expose
     1387    @template.output('comment.html')
     1388    def comment(self, id, cancel=False, **data):
     1389        link = self.data.get(id)
     1390        if not link:
     1391            raise cherrypy.NotFound()
     1392        if cherrypy.request.method == 'POST':
     1393            if cancel:
     1394                raise cherrypy.HTTPRedirect('/info/%s' % link.id)
     1395            form = CommentForm()
     1396            try:
     1397                data = form.to_python(data)
     1398                markup = HTML(data['content']) | HTMLSanitizer()
     1399                data['content'] = markup.render('xhtml')
     1400                comment = link.add_comment(**data)
     1401                if not ajax.is_xhr():
     1402                    raise cherrypy.HTTPRedirect('/info/%s' % link.id)
     1403                return template.render('_comment.html', comment=comment,
     1404                                       num=len(link.comments))
     1405            except Invalid, e:
     1406                errors = e.unpack_errors()
     1407        else:
     1408            errors = {}
     1409
     1410        if ajax.is_xhr():
     1411            stream = template.render('_form.html', link=link, errors=errors)
     1412        else:
     1413            stream = template.render(link=link, comment=None, errors=errors)
     1414        return stream | HTMLFormFiller(data=data)
     1415}}}
     1416
     1417We've just added two lines here, namely:
     1418
     1419{{{
     1420#!python
     1421                markup = HTML(data['content']) | HTMLSanitizer()
     1422                data['content'] = markup.render('xhtml')
     1423}}}
     1424
     1425This parses the comment text, runs it through the sanitizer, and serializes it to XHTML. And the result of the transformation is what we'll save to our “database”. Why are we using XHTML here, when we actually use HTML almost everywhere else? Well, we want to be able to include the comment text in Atom feeds, too, and for that they'll need to be well-formed XML.
     1426
    13541427
    13551428