#186 closed defect (fixed)
py:match not applied correctly in recursion
Reported by: | matt@… | Owned by: | cmlenz |
---|---|---|---|
Priority: | major | Milestone: | 0.5 |
Component: | Template processing | Version: | 0.4.4 |
Keywords: | match py:match recursive | Cc: |
Description
In this example, I have a template that matches b[@type = 'bullet'] and a template that matchers just b. Normally the b[@type = 'bullet'] match is applied correctly, but inside a b match it's not.
from genshi.template import MarkupTemplate tmpl = MarkupTemplate("""<?xml version="1.0" ?> <test xmlns:py="http://genshi.edgewall.org/"> <py:match path="b[@type = 'bullet']"> <bullet>${select('*|text()')}</bullet> </py:match> <py:match path="group[@type = 'bullet']"> <ul> ${select('*')} </ul> </py:match> <py:match path="b"> <generic> ${select('*|text()')} </generic> </py:match> <!-- This works --> <group type="bullet"> <b type="bullet">1</b> <b type="bullet">2</b> <b type="bullet">3</b> </group> <!-- This should have identical output to the previous example but be wrapped in a "generic" element. Instead, it stops matching the @type aware matches and uses generic for everything --> <b> <group type="bullet"> <b type="bullet">1</b> <b type="bullet">2</b> <b type="bullet">3</b> </group> </b> </test> """) print unicode(tmpl.generate())
Attachments (1)
Change History (14)
comment:1 Changed 17 years ago by matt@…
- Version changed from 0.4.4 to devel
comment:2 Changed 17 years ago by matt@…
Or, I'm completely wrong about that and just don't understand what's going on in MarkupTemplate?._match(). :/
comment:3 Changed 17 years ago by matt@…
The line:
if 'match_once' not in hints:
seems to have the logic reversed, but that doesn't seem to be the problem. It seems like somehow match templates are being dropped from consideration on recursion, but I can't see where or why.
comment:4 Changed 17 years ago by anonymous
Can anyone explain to me what the line:
inner = self._match(inner, ctxt, [match_templates[idx]])
...in template/markup.py is meant to accomplish? Removing it fixes the problem in some of my test cases, but I don't know enough about how the template machinery works to know why.
comment:5 Changed 17 years ago by matt@…
Seems to prevent infinite recursion in some cases. I really need to figure out what's going on in here...
comment:6 Changed 17 years ago by Matt Chaput <matt@…>
That line (well, the logic of making a pass through _match with only the current matcher) certainly seems to be the source of the problem, though.
comment:7 Changed 17 years ago by Matt Chaput <matt@…>
OK, forget what I just said about infinite recursion; I had forgotten to reverse something I did while testing. Removing the whole inner = self._match(inner, ctxt, [match_templates[idx]]) thing seems to fix the problem, as far as I can tell so far.
comment:8 Changed 17 years ago by Matt Chaput <matt@…>
My guess is that what the code is meant to do is apply the matcher that matched recursively, and then later apply all the other matchers recursively. This fails when the order of the matchers matters (where multiple matchers could match the same element).
The fix seems to be to remove the inner = self._match(inner, ctxt, [match_templates[idx]]) bit, and later in the code (around line 292), only remove the matched matcher from "remaining" if it's "once", otherwise apply the full list of all matchers.
I'll try to make a patch and attach it when I have some time.
comment:9 Changed 17 years ago by cmlenz
Thanks for looking into this, and be sure to run the test suite to verify that such as change doesn't cause any regressions.
comment:10 Changed 17 years ago by cmlenz
- Status changed from new to assigned
Okay, I've been looking into this one a bit.
The problem was introduced in [444] in response to #77. Prior to that, we had the problem that a match template could get applied to the element that it produced itself, because it was getting applied to both the matched content (inner), and the output stream. This problem is demonstrated by the test test_not_match_self.
So what's going on here, exactly? Of the three match templates in your example, the third one (path="b") matches first due to the outer <b> tag. It then applies itself to the inner content, which results in the <b type="bullet"> tags being being replaced <generic> tags. Only after that has happened do the other match templates get a chance to kick in: they are applied to the result of the first match template. But of course that result no longer contains the <b type="bullet"> tags, so that match template never matches.
How can this be fixed? I'm not sure :/
- If we only remove the inner = self._match(inner, ctxt, [match_templates[idx]]) part, a match template can't be applied to recursively (see test_recursive_match_1)
- If, in addition to the above, we don't remove the current match template from those that process the results of the match, we'll get infinite recursion due to match templates matching their own output over and over.
- If we apply the full set of match templates to the inner content, we break test_not_match_self.
So I'm not sure how to proceed. The underlying problem is that Genshi somewhat schizophrenically tries to apply match templates both to the input and to the generated output. This is very much related to #187.
Still thinking/experimenting…
comment:11 Changed 17 years ago by cmlenz
I've attached a patch containing a potential fix for this issue. All the current unit tests still succeed, and the example in the description of this ticket is added as a new unit tests, and also succeeds.
The exact change made is the following:
- For every matching match template, all match templates found before the matching one in source order are applied to the original content, including the matching template itself (if it's not flagged non-recursive)
- The match templates found after the matching template in source order are then applied to the output stream of the matching template.
Effectively this means no single match template is applied both to the input and the output. Which of the two it is applied to depends on the order in which match templates are declared in the markup template.
This seems to work pretty well (better than I expected), but I'd really like some feedback on whether this breaks anything in real-world projects. Although we may need to just live with that breakage anyway.
comment:12 Changed 17 years ago by cmlenz
- Resolution set to fixed
- Status changed from assigned to closed
Patch applied in [810].
comment:13 Changed 17 years ago by cmlenz
- Version changed from devel to 0.4.4
The problem seems to be this line in template/markup.py:
inner = self._match(inner, ctxt, [match_templates[idx]])
I'm not sure why this only uses [match_templates[idx]] when matching inside, instead of all match_templates. It seems incorrect -- it doesn't allow proper processing of hierarchical XML using py:match.