Edgewall Software

Ticket #501 (closed defect: fixed)

Opened 3 years ago

Last modified 2 years ago

test_sanitize_remove_src_javascript fails due to HTMLParser bugfixes in cpython

Reported by: stefanor@… Owned by: cmlenz
Priority: major Milestone: 0.6.1
Component: Parsing Version: 0.6
Keywords: Cc:


A  patch to HTMLParser, for a bunch of issues related to attribute handling, landed in the cpython 2.7 branch and breaks test_sanitize_remove_src_javascript:

FAIL: test_sanitize_remove_src_javascript (__main__.HTMLSanitizerTestCase)
Traceback (most recent call last):
  File "genshi/filters/tests/test_html.py", line 485, in test_sanitize_remove_src_javascript
    u'<IMG SRC=`javascript:alert("RSnake says, \'foo\'")`>')
AssertionError: ParseError not raised

Ran 61 tests in 0.033s

FAILED (failures=1)


Change History

Changed 2 years ago by vicho@…

It doesn't raise a ParserError? but, if I understand it correctly, the parsed string is not right:

>>> HTML('<IMG SRC=`javascript:alert("RSnake says, \'foo\'")`>')
'<img src="`javascript:alert(&#34;RSnake" says,="says," \'foo\'")`="\'foo\'&#34;)`"/>'

Is this a bug in HTMLParser in cpython?

Changed 2 years ago by stefanor@…

Apparently IE implemented backticks as attribute delimiters. I don't see any reference in any of the issues that commit fixed to the removal of backticks as valid attribute delimiters. But it's also something that's fairly far outside the scope of defined HTML. I don't think this parse is wrong per se, just not as good as it could be.

Changed 2 years ago by antonio.rosales@…

Hello,any update on the resolution of this ticket? It is currently breaking building and packing of Genshi.

-thanks, Antonio

Changed 2 years ago by barry@…

Here's what I'm proposing to fix the failure in Ubuntu 12.10.

--- a/genshi/filters/tests/html.py
+++ b/genshi/filters/tests/html.py
@@ -365,9 +365,12 @@
         self.assertEquals('', (html | HTMLSanitizer()).render())
         html = HTML('<SCRIPT SRC="http://example.com/"></SCRIPT>')
         self.assertEquals('', (html | HTMLSanitizer()).render())
-        self.assertRaises(ParseError, HTML, '<SCR\0IPT>alert("foo")</SCR\0IPT>')
-        self.assertRaises(ParseError, HTML,
-                          '<SCRIPT&XYZ SRC="http://example.com/"></SCRIPT>')
+        html = HTML('<SCR\0IPT>alert("foo")</SCR\0IPT>')
+        self.assertEquals('&lt;SCR\x00IPT&gt;alert("foo")',
+                          (html | HTMLSanitizer()).render())
+        html = HTML('<SCRIPT&XYZ SRC="http://example.com/"></SCRIPT>')
+        self.assertEquals('&lt;SCRIPT&amp;XYZ; SRC="http://example.com/"&gt;',
+                          (html | HTMLSanitizer()).render())
     def test_sanitize_remove_onclick_attr(self):
         html = HTML('<div onclick=\'alert("foo")\' />')

Changed 2 years ago by hodgestar

  • status changed from new to closed
  • resolution set to fixed

I've committed fixes to trunk and 0.6.x in r1187 and r1188 respectively. I opted not to simply delete the tests (I think users rely on these strange tags not making it through and so Genshi can't really just hand responsibility off to Python without testing) but I also needed to support older Python versions so I wrapped Barry's change into a helper method that accepts ParseError? as a valid alternative.

Add/Change #501 (test_sanitize_remove_src_javascript fails due to HTMLParser bugfixes in cpython)


E-mail address and user name can be saved in the Preferences.

Change Properties
<Author field>
as closed
The resolution will be deleted. Next status will be 'reopened'
Note: See TracTickets for help on using tickets.