Opened 13 years ago
Closed 12 years ago
#501 closed defect (fixed)
test_sanitize_remove_src_javascript fails due to HTMLParser bugfixes in cpython
Reported by: | stefanor@… | Owned by: | cmlenz |
---|---|---|---|
Priority: | major | Milestone: | 0.6.1 |
Component: | Parsing | Version: | 0.6 |
Keywords: | Cc: |
Description
A patch to HTMLParser, for a bunch of issues related to attribute handling, landed in the cpython 2.7 branch and breaks test_sanitize_remove_src_javascript:
====================================================================== FAIL: test_sanitize_remove_src_javascript (__main__.HTMLSanitizerTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "genshi/filters/tests/test_html.py", line 485, in test_sanitize_remove_src_javascript u'<IMG SRC=`javascript:alert("RSnake says, \'foo\'")`>') AssertionError: ParseError not raised ---------------------------------------------------------------------- Ran 61 tests in 0.033s FAILED (failures=1)
Change History (5)
comment:1 Changed 12 years ago by vicho@…
comment:2 Changed 12 years ago by stefanor@…
Apparently IE implemented backticks as attribute delimiters. I don't see any reference in any of the issues that commit fixed to the removal of backticks as valid attribute delimiters. But it's also something that's fairly far outside the scope of defined HTML. I don't think this parse is wrong per se, just not as good as it could be.
comment:3 Changed 12 years ago by antonio.rosales@…
Hello,any update on the resolution of this ticket? It is currently breaking building and packing of Genshi.
-thanks, Antonio
comment:4 Changed 12 years ago by barry@…
Here's what I'm proposing to fix the failure in Ubuntu 12.10.
--- a/genshi/filters/tests/html.py +++ b/genshi/filters/tests/html.py @@ -365,9 +365,12 @@ self.assertEquals('', (html | HTMLSanitizer()).render()) html = HTML('<SCRIPT SRC="http://example.com/"></SCRIPT>') self.assertEquals('', (html | HTMLSanitizer()).render()) - self.assertRaises(ParseError, HTML, '<SCR\0IPT>alert("foo")</SCR\0IPT>') - self.assertRaises(ParseError, HTML, - '<SCRIPT&XYZ SRC="http://example.com/"></SCRIPT>') + html = HTML('<SCR\0IPT>alert("foo")</SCR\0IPT>') + self.assertEquals('<SCR\x00IPT>alert("foo")', + (html | HTMLSanitizer()).render()) + html = HTML('<SCRIPT&XYZ SRC="http://example.com/"></SCRIPT>') + self.assertEquals('<SCRIPT&XYZ; SRC="http://example.com/">', + (html | HTMLSanitizer()).render()) def test_sanitize_remove_onclick_attr(self): html = HTML('<div onclick=\'alert("foo")\' />')
comment:5 Changed 12 years ago by hodgestar
- Resolution set to fixed
- Status changed from new to closed
I've committed fixes to trunk and 0.6.x in r1187 and r1188 respectively. I opted not to simply delete the tests (I think users rely on these strange tags not making it through and so Genshi can't really just hand responsibility off to Python without testing) but I also needed to support older Python versions so I wrapped Barry's change into a helper method that accepts ParseError? as a valid alternative.
It doesn't raise a ParserError? but, if I understand it correctly, the parsed string is not right:
Is this a bug in HTMLParser in cpython?