Ticket #538 (closed defect: fixed)
HTMLParser fails if a multi-byte character falls on a 4K boundary
|Reported by:||hodgestar||Owned by:||hodgestar|
If one does:
text = u'a' * ((4 * 1024) - 1) + u'\xe6' events = list(HTMLParser(BytesIO(text.encode('utf-8')), encoding='utf-8'))
it produces a truncated-input error because the multi-byte character crosses the boundary of a read from the input file.