Edgewall Software

Opened 17 years ago

Closed 17 years ago

#136 closed defect (fixed)

Transformer.filter returns inconsistently marked stream

Reported by: Dave Abrahams <dave@…> Owned by: athomas
Priority: major Milestone: 0.5
Component: General Version: 0.4.2
Keywords: Cc:

Description

The resulting stream ends up with a mixture of marked and unmarked events. Because the current tests just use print on the stream, the error is hidden. You can see the problem by doing print(list( resultstream )). You need to decide whether the result is supposed to be a marked or an unmarked stream. I suggest changing all filtering doctests by appending a new filter that asserts the correctness of the stream format before passing it through to the output, or updating Stream.render() to do the invariant check.

Attachments (1)

transform.py.patch (940 bytes) - added by Dave Abrahams <dave@…> 17 years ago.
This patch appears to fix it, but I don't know whether you want None marks

Download all attachments as: .zip

Change History (8)

Changed 17 years ago by Dave Abrahams <dave@…>

This patch appears to fix it, but I don't know whether you want None marks

comment:1 Changed 17 years ago by athomas

  • Summary changed from Transformer.filter broken to Transformer.filter returns inconsistently marked stream

comment:2 Changed 17 years ago by athomas

  • Status changed from new to assigned

It's not clear to me what the exact problem is. Do you have an example filter that triggers this behaviour?

comment:3 Changed 17 years ago by Dave Abrahams <dave@…>

As I tried to make clear above, your own doctest case triggers it.

comment:4 follow-up: Changed 17 years ago by athomas

Right, and I'm asking for more information. Like which doctest? And which marks are you talking about? StreamFilter marks or Transformer marks?

The doctest I assume you're referring to is this one (modified to use list()):

        >>> from genshi.filters.html import HTMLSanitizer
        >>> html = HTML('<html><body>Some text<script>alert(document.cookie)'
        ...             '</script> and some more text</body></html>')
        >>> import pprint
        >>> pprint.pprint(list(html | Transformer('body/*').filter(HTMLSanitizer())))

Which prints this when using list() as you suggest:

    [('START', (QName(u'html'), Attrs()), (None, 1, 0)),
     ('START', (QName(u'body'), Attrs()), (None, 1, 6)),
     ('TEXT', u'Some text', (None, 1, 12)),
     ('TEXT', u' and some more text', (None, 1, 60)),
     ('END', QName(u'body'), (None, 1, 79)),
     ('END', QName(u'html'), (None, 1, 86))]

I don't see a mix of marked/unmarked events. If there's a bug, I'd like to fix it, but I need more information.

Also, the flush() function looks correct to me. self.filter is a "normal" stream filter, not a transformer filter, so it must return a normal stream event ((kind, data, pos)), not a transformer event ((mark, (kind, data, pos))). That is why flush() prepends OUTSIDE.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 17 years ago by Dave Abrahams <dave@…>

Replying to athomas:

Right, and I'm asking for more information. Like which doctest?

I meant (AFAICT) the only one there is (the one you assumed below). But clearly I was slightly mistaken.

And which marks are you talking about? StreamFilter marks or Transformer marks?

Heck, I didn't know there was a distinction. I think more docs may be needed to describe all these marks.

I don't see a mix of marked/unmarked events. If there's a bug, I'd like to fix it, but I need more information.

>>> def noop(s):
...     for e in s: yield e
...
>>> pprint.pprint(list(html | Transformer('body/*').filter(noop)))
[('START', (QName(u'html'), Attrs()), (None, 1, 0)),
 ('START', (QName(u'body'), Attrs()), (None, 1, 6)),
 ('TEXT', u'Some text', (None, 1, 12)),
 ('START', (QName(u'script'), Attrs()), (None, 1, 21)),
 ('TEXT', u'alert(document.cookie)', (None, 1, 29)),
 ('END', QName(u'script'), (None, 1, 51)),
 ('OUTSIDE', ('END', QName(u'script'), (None, 1, 51))),
 ('END', QName(u'body'), (None, 1, 79)),
 ('END', QName(u'html'), (None, 1, 86))]

Also, the flush() function looks correct to me. self.filter is a "normal" stream filter, not a transformer filter, so it must return a normal stream event ((kind, data, pos)), not a transformer event ((mark, (kind, data, pos))). That is why flush() prepends OUTSIDE.

Well, then, I must be doing something wrong.

comment:6 in reply to: ↑ 5 Changed 17 years ago by athomas

Replying to Dave Abrahams <dave@boost-consulting.com>:

>>> def noop(s):
...     for e in s: yield e
... 
>>> pprint.pprint(list(html | Transformer('body/\*').filter(noop)))
[('START', (QName(u'html'), Attrs()), (None, 1, 0)),
 ('START', (QName(u'body'), Attrs()), (None, 1, 6)),
 ('TEXT', u'Some text', (None, 1, 12)),
 ('START', (QName(u'script'), Attrs()), (None, 1, 21)),
 ('TEXT', u'alert(document.cookie)', (None, 1, 29)),
 ('END', QName(u'script'), (None, 1, 51)),
 **('OUTSIDE', ('END', QName(u'script'), (None, 1, 51)))**,
 ('END', QName(u'body'), (None, 1, 79)),
 ('END', QName(u'html'), (None, 1, 86))]

Perfect, thanks!

Try this patch:

  • genshi/filters/transform.py

     
    794794            if mark:
    795795                queue.append(event)
    796796            else:
    797                 for event in flush(queue):
    798                     yield event
     797                for qevent in flush(queue):
     798                    yield qevent
    799799                yield None, event
    800800        for event in flush(queue):
    801801            yield event

comment:7 Changed 17 years ago by athomas

  • Resolution set to fixed
  • Status changed from assigned to closed

This fix seems to work for me, committed in r691.

Note: See TracTickets for help on using tickets.