Context Navigation

← Previous Ticket
Next Ticket →

#384 new defect

HTMLParser does not work with comments that include non-ascii characters

Reported by:	robert.hoelzl@…	Owned by:	cmlenz
Priority:	major	Milestone:	0.9
Component:	Parsing	Version:	0.5.1
Keywords:		Cc:

Description

Hello,

When parsing a a HTML file, that contains a comment with a non-ascii character (like "") the HTMLParser() object throws an UnicodeDecodeError?.

The reason for this bug is in module genshi.input.py / class HTMLParser / method handle_comment:

current implementation:

def handle_comment(self, text):

self._enqueue(COMMENT, text)

correct implementation:

def handle_comment(self, text):

if not isinstance(text, unicode):

text = text.decode(self.encoding, 'replace')

self._enqueue(COMMENT, text)

Change History (1)

comment:1 Changed 8 years ago by hodgestar

Milestone changed from 0.7 to 0.9

Moved to milestone 0.9.

Note: See TracTickets for help on using tickets.

Download in other formats: