1346 | | At this point we allow users to post plain text comments, but those comments can't include niceties such as hyperlinks or HTML inline formatting (emphasis, etc). A very naive application would simply accept HTML tags in the input, and pass those tags through in the output. That is generally a bad thing, however, as it opens up your site to cross-site scripting (XSS) attack, which can undermine any security measures you try put into effect. |
1347 | | |
1348 | | (''Note that as Geddit allows anyone to do anything, we don't actually have any valuable assets to protect, so this exercise is somewhat theoretical. Just imagine we required users to register to submit links or post comments for the next sections.'') |
1349 | | |
1350 | | '''TODO''': |
1351 | | * Details on escaping |
1352 | | * Ways to include literal markup in output |
1353 | | * Using HTMLSanitizer |
| 1346 | At this point we allow users to post plain text comments, but those comments can't include niceties such as hyperlinks or HTML inline formatting (emphasis, etc). A very naive application would simply accept HTML tags in the input, and pass those tags through in the output. That is generally a bad thing, however, as it opens up your site to cross-site scripting (XSS) attacks, which can undermine any security measures you try put into effect (including SSL). And because this is generally not the behavior you want, Genshi XML-escapes everything by default. |
| 1347 | |
| 1348 | (''Note that as Geddit allows anyone to do anything, we don't actually have any valuable assets to protect, so this exercise is somewhat theoretical. For the rest of this section, just imagine we required users to register and login to submit links or post comments.'') |
| 1349 | |
| 1350 | So what we want to do in this section is to allow users to include HTML tags in their comments, but do so in a safe manner. We do not want to enable malicious users to include Javascript code, or CSS styles that turn the whole page black, or other things that may be considered harmful. In other words, we need to “sanitize” the markup in the comments. |
| 1351 | |
| 1352 | But let's ignore that aspect for now, and start by making Genshi not escape HTML tags in comments. We'll start by editing `geddit/template/_comment.html`: |
| 1353 | |
| 1354 | {{{ |
| 1355 | #!genshi |
| 1356 | <?python from genshi import HTML ?> |
| 1357 | <li id="comment$num"> |
| 1358 | <strong>${comment.username}</strong> at ${comment.time.strftime('%x %X')} |
| 1359 | <blockquote>${HTML(comment.content)}</blockquote> |
| 1360 | </li> |
| 1361 | }}} |
| 1362 | |
| 1363 | Here, we've added an import for the Genshi `HTML()` function. This is done using a [wiki:Documentation/templates.html#code-blocks Python code block] via the `<?python ?>` processing instruction. We've already seen that we can use complex Python expressions in templates. By using the `<?python ?>` processing instruction, we can embed any Python statements directly in the template, for example to define classes or functions. In this case we simply import a function that we need to use. |
| 1364 | |
| 1365 | The `HTML()` function parses a snippet of HTML and returns a Genshi markup stream. It tries to do this in a way that invalid HTML is corrected (for example by fixing the nesting of tags). We then use that function to render the content of the comment. So what does this do, exactly? Well, the comment text is parsed using an HTML parser, fixed up if necessary (and possible), and injected into the template as a markup stream. A template expression that evaluates to a markup stream is treated differently than other data types: it is injected directly into the template output stream, effectively resulting in tags not getting escaped. |
| 1366 | |
| 1367 | '''TODO: Mention Markup class''' |
| 1368 | |
| 1369 | So at this point our users can include HTML tags in their comments, and it will be rendered as HTML. But as noted above, that approach is very dangerous for most real-world applications, so we've got more work to do: |
| 1370 | we need to sanitize the markup in the comment so that only markup that can be considered safe is let through. Genshi provide a stream filter to help us here: [wiki:Documentation/filters.html#html-sanitizer HTMLSanitizer]. |
| 1371 | |
| 1372 | In `geddit/controller.py`, first add the imports for the `HTML` function and the `HTMLSanitizer` filter, so that the imports at the top of the file look something like this: |
| 1373 | |
| 1374 | {{{ |
| 1375 | #!python |
| 1376 | import cherrypy |
| 1377 | from formencode import Invalid |
| 1378 | from genshi.input import HTML |
| 1379 | from genshi.filters import HTMLFormFiller, HTMLSanitizer |
| 1380 | }}} |
| 1381 | |
| 1382 | Then we'll update the `Root.comment()` method so that it sanitizes comments as they are submitted: |
| 1383 | |
| 1384 | {{{ |
| 1385 | #!python |
| 1386 | @cherrypy.expose |
| 1387 | @template.output('comment.html') |
| 1388 | def comment(self, id, cancel=False, **data): |
| 1389 | link = self.data.get(id) |
| 1390 | if not link: |
| 1391 | raise cherrypy.NotFound() |
| 1392 | if cherrypy.request.method == 'POST': |
| 1393 | if cancel: |
| 1394 | raise cherrypy.HTTPRedirect('/info/%s' % link.id) |
| 1395 | form = CommentForm() |
| 1396 | try: |
| 1397 | data = form.to_python(data) |
| 1398 | markup = HTML(data['content']) | HTMLSanitizer() |
| 1399 | data['content'] = markup.render('xhtml') |
| 1400 | comment = link.add_comment(**data) |
| 1401 | if not ajax.is_xhr(): |
| 1402 | raise cherrypy.HTTPRedirect('/info/%s' % link.id) |
| 1403 | return template.render('_comment.html', comment=comment, |
| 1404 | num=len(link.comments)) |
| 1405 | except Invalid, e: |
| 1406 | errors = e.unpack_errors() |
| 1407 | else: |
| 1408 | errors = {} |
| 1409 | |
| 1410 | if ajax.is_xhr(): |
| 1411 | stream = template.render('_form.html', link=link, errors=errors) |
| 1412 | else: |
| 1413 | stream = template.render(link=link, comment=None, errors=errors) |
| 1414 | return stream | HTMLFormFiller(data=data) |
| 1415 | }}} |
| 1416 | |
| 1417 | We've just added two lines here, namely: |
| 1418 | |
| 1419 | {{{ |
| 1420 | #!python |
| 1421 | markup = HTML(data['content']) | HTMLSanitizer() |
| 1422 | data['content'] = markup.render('xhtml') |
| 1423 | }}} |
| 1424 | |
| 1425 | This parses the comment text, runs it through the sanitizer, and serializes it to XHTML. And the result of the transformation is what we'll save to our “database”. Why are we using XHTML here, when we actually use HTML almost everywhere else? Well, we want to be able to include the comment text in Atom feeds, too, and for that they'll need to be well-formed XML. |
| 1426 | |