Edgewall Software

Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#538 closed defect (fixed)

HTMLParser fails if a multi-byte character falls on a 4K boundary

Reported by: hodgestar Owned by: hodgestar
Priority: major Milestone: 0.7
Component: Parsing Version: devel
Keywords: Cc:

Description

If one does:

text = u'a' * ((4 * 1024) - 1) + u'\xe6'
events = list(HTMLParser(BytesIO(text.encode('utf-8')),
                                 encoding='utf-8'))

it produces a truncated-input error because the multi-byte character crosses the boundary of a read from the input file.

Change History (1)

comment:1 Changed 11 years ago by hodgestar

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in r1189.

Note: See TracTickets for help on using tickets.