Edgewall Software

Ticket #501 (closed defect: fixed)

Opened 2 years ago

Last modified 16 months ago

test_sanitize_remove_src_javascript fails due to HTMLParser bugfixes in cpython

Reported by: stefanor@… Owned by: cmlenz
Priority: major Milestone: 0.6.1
Component: Parsing Version: 0.6
Keywords: Cc:

Description

A  patch to HTMLParser, for a bunch of issues related to attribute handling, landed in the cpython 2.7 branch and breaks test_sanitize_remove_src_javascript:

======================================================================
FAIL: test_sanitize_remove_src_javascript (__main__.HTMLSanitizerTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "genshi/filters/tests/test_html.py", line 485, in test_sanitize_remove_src_javascript
    u'<IMG SRC=`javascript:alert("RSnake says, \'foo\'")`>')
AssertionError: ParseError not raised

----------------------------------------------------------------------
Ran 61 tests in 0.033s

FAILED (failures=1)

Attachments

Change History

Changed 22 months ago by vicho@…

It doesn't raise a ParserError? but, if I understand it correctly, the parsed string is not right:

>>> HTML('<IMG SRC=`javascript:alert("RSnake says, \'foo\'")`>')
'<img src="`javascript:alert(&#34;RSnake" says,="says," \'foo\'")`="\'foo\'&#34;)`"/>'

Is this a bug in HTMLParser in cpython?

Changed 21 months ago by stefanor@…

Apparently IE implemented backticks as attribute delimiters. I don't see any reference in any of the issues that commit fixed to the removal of backticks as valid attribute delimiters. But it's also something that's fairly far outside the scope of defined HTML. I don't think this parse is wrong per se, just not as good as it could be.

Changed 19 months ago by antonio.rosales@…

Hello,any update on the resolution of this ticket? It is currently breaking building and packing of Genshi.

-thanks, Antonio

Changed 19 months ago by barry@…

Here's what I'm proposing to fix the failure in Ubuntu 12.10.

--- a/genshi/filters/tests/html.py
+++ b/genshi/filters/tests/html.py
@@ -365,9 +365,12 @@
         self.assertEquals('', (html | HTMLSanitizer()).render())
         html = HTML('<SCRIPT SRC="http://example.com/"></SCRIPT>')
         self.assertEquals('', (html | HTMLSanitizer()).render())
-        self.assertRaises(ParseError, HTML, '<SCR\0IPT>alert("foo")</SCR\0IPT>')
-        self.assertRaises(ParseError, HTML,
-                          '<SCRIPT&XYZ SRC="http://example.com/"></SCRIPT>')
+        html = HTML('<SCR\0IPT>alert("foo")</SCR\0IPT>')
+        self.assertEquals('&lt;SCR\x00IPT&gt;alert("foo")',
+                          (html | HTMLSanitizer()).render())
+        html = HTML('<SCRIPT&XYZ SRC="http://example.com/"></SCRIPT>')
+        self.assertEquals('&lt;SCRIPT&amp;XYZ; SRC="http://example.com/"&gt;',
+                          (html | HTMLSanitizer()).render())
 
     def test_sanitize_remove_onclick_attr(self):
         html = HTML('<div onclick=\'alert("foo")\' />')

Changed 16 months ago by hodgestar

  • status changed from new to closed
  • resolution set to fixed

I've committed fixes to trunk and 0.6.x in r1187 and r1188 respectively. I opted not to simply delete the tests (I think users rely on these strange tags not making it through and so Genshi can't really just hand responsibility off to Python without testing) but I also needed to support older Python versions so I wrapped Barry's change into a helper method that accepts ParseError? as a valid alternative.

Add/Change #501 (test_sanitize_remove_src_javascript fails due to HTMLParser bugfixes in cpython)

Author


E-mail address and user name can be saved in the Preferences.


Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.