Edgewall Software

Opened 10 years ago

Closed 9 years ago

#501 closed defect (fixed)

test_sanitize_remove_src_javascript fails due to HTMLParser bugfixes in cpython

Reported by: stefanor@… Owned by: cmlenz
Priority: major Milestone: 0.6.1
Component: Parsing Version: 0.6
Keywords: Cc:


A patch to HTMLParser, for a bunch of issues related to attribute handling, landed in the cpython 2.7 branch and breaks test_sanitize_remove_src_javascript:

FAIL: test_sanitize_remove_src_javascript (__main__.HTMLSanitizerTestCase)
Traceback (most recent call last):
  File "genshi/filters/tests/test_html.py", line 485, in test_sanitize_remove_src_javascript
    u'<IMG SRC=`javascript:alert("RSnake says, \'foo\'")`>')
AssertionError: ParseError not raised

Ran 61 tests in 0.033s

FAILED (failures=1)

Change History (5)

comment:1 Changed 9 years ago by vicho@…

It doesn't raise a ParserError? but, if I understand it correctly, the parsed string is not right:

>>> HTML('<IMG SRC=`javascript:alert("RSnake says, \'foo\'")`>')
'<img src="`javascript:alert(&#34;RSnake" says,="says," \'foo\'")`="\'foo\'&#34;)`"/>'

Is this a bug in HTMLParser in cpython?

comment:2 Changed 9 years ago by stefanor@…

Apparently IE implemented backticks as attribute delimiters. I don't see any reference in any of the issues that commit fixed to the removal of backticks as valid attribute delimiters. But it's also something that's fairly far outside the scope of defined HTML. I don't think this parse is wrong per se, just not as good as it could be.

comment:3 Changed 9 years ago by antonio.rosales@…

Hello,any update on the resolution of this ticket? It is currently breaking building and packing of Genshi.

-thanks, Antonio

comment:4 Changed 9 years ago by barry@…

Here's what I'm proposing to fix the failure in Ubuntu 12.10.

--- a/genshi/filters/tests/html.py
+++ b/genshi/filters/tests/html.py
@@ -365,9 +365,12 @@
         self.assertEquals('', (html | HTMLSanitizer()).render())
         html = HTML('<SCRIPT SRC="http://example.com/"></SCRIPT>')
         self.assertEquals('', (html | HTMLSanitizer()).render())
-        self.assertRaises(ParseError, HTML, '<SCR\0IPT>alert("foo")</SCR\0IPT>')
-        self.assertRaises(ParseError, HTML,
-                          '<SCRIPT&XYZ SRC="http://example.com/"></SCRIPT>')
+        html = HTML('<SCR\0IPT>alert("foo")</SCR\0IPT>')
+        self.assertEquals('&lt;SCR\x00IPT&gt;alert("foo")',
+                          (html | HTMLSanitizer()).render())
+        html = HTML('<SCRIPT&XYZ SRC="http://example.com/"></SCRIPT>')
+        self.assertEquals('&lt;SCRIPT&amp;XYZ; SRC="http://example.com/"&gt;',
+                          (html | HTMLSanitizer()).render())
     def test_sanitize_remove_onclick_attr(self):
         html = HTML('<div onclick=\'alert("foo")\' />')

comment:5 Changed 9 years ago by hodgestar

  • Resolution set to fixed
  • Status changed from new to closed

I've committed fixes to trunk and 0.6.x in r1187 and r1188 respectively. I opted not to simply delete the tests (I think users rely on these strange tags not making it through and so Genshi can't really just hand responsibility off to Python without testing) but I also needed to support older Python versions so I wrapped Barry's change into a helper method that accepts ParseError? as a valid alternative.

Note: See TracTickets for help on using tickets.