Edgewall Software

Ticket #538 (closed defect: fixed)

Opened 2 years ago

Last modified 2 years ago

HTMLParser fails if a multi-byte character falls on a 4K boundary

Reported by: hodgestar Owned by: hodgestar
Priority: major Milestone: 0.7
Component: Parsing Version: devel
Keywords: Cc:

Description

If one does:

text = u'a' * ((4 * 1024) - 1) + u'\xe6'
events = list(HTMLParser(BytesIO(text.encode('utf-8')),
                                 encoding='utf-8'))

it produces a truncated-input error because the multi-byte character crosses the boundary of a read from the input file.

Attachments

Change History

Changed 2 years ago by hodgestar

  • status changed from new to closed
  • resolution set to fixed

Fixed in r1189.

Add/Change #538 (HTMLParser fails if a multi-byte character falls on a 4K boundary)

Author


E-mail address and user name can be saved in the Preferences.


Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.