Edgewall Software

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#538 closed defect (fixed)

HTMLParser fails if a multi-byte character falls on a 4K boundary

Reported by: hodgestar Owned by: hodgestar
Priority: major Milestone: 0.7
Component: Parsing Version: devel
Keywords: Cc:


If one does:

text = u'a' * ((4 * 1024) - 1) + u'\xe6'
events = list(HTMLParser(BytesIO(text.encode('utf-8')),

it produces a truncated-input error because the multi-byte character crosses the boundary of a read from the input file.

Attachments (0)

Change History (1)

comment:1 Changed 3 years ago by hodgestar

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in r1189.

Add Comment

Modify Ticket

Change Properties
Set your email in Preferences
as closed The owner will remain hodgestar.
The resolution will be deleted. Next status will be 'reopened'.

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.