Edgewall Software

Ticket #186 (closed defect: fixed)

Opened 6 years ago

Last modified 2 years ago

py:match not applied correctly in recursion

Reported by: matt@… Owned by: cmlenz
Priority: major Milestone: 0.5
Component: Template processing Version: 0.4.4
Keywords: match py:match recursive Cc:

Description

In this example, I have a template that matches b[@type = 'bullet'] and a template that matchers just b. Normally the b[@type = 'bullet'] match is applied correctly, but inside a b match it's not.

from genshi.template import MarkupTemplate

tmpl = MarkupTemplate("""<?xml version="1.0" ?>
<test xmlns:py="http://genshi.edgewall.org/">

    <py:match path="b[@type = 'bullet']">
        <bullet>${select('*|text()')}</bullet>
    </py:match>

    <py:match path="group[@type = 'bullet']">
        <ul>
            ${select('*')}
        </ul>
    </py:match>
    
    <py:match path="b">
        <generic>
            ${select('*|text()')}
        </generic>
    </py:match>
    
    <!-- This works -->
    
    <group type="bullet">
        <b type="bullet">1</b>
        <b type="bullet">2</b>
        <b type="bullet">3</b>
    </group>
    
    <!-- This should have identical output to the
    previous example but be wrapped in a "generic"
    element. Instead, it stops matching the @type
    aware matches and uses generic for everything -->
    
    <b>
        <group type="bullet">
            <b type="bullet">1</b>
            <b type="bullet">2</b>
            <b type="bullet">3</b>
        </group>
    </b>
</test>
""")

print unicode(tmpl.generate())

Attachments

ticket186.diff Download (2.9 KB) - added by cmlenz 6 years ago.
Potential fix

Change History

Changed 6 years ago by matt@…

  • version changed from 0.4.4 to devel

The problem seems to be this line in template/markup.py:

inner = self._match(inner, ctxt, [match_templates[idx]])

I'm not sure why this only uses [match_templates[idx]] when matching inside, instead of all match_templates. It seems incorrect -- it doesn't allow proper processing of hierarchical XML using py:match.

Changed 6 years ago by matt@…

Or, I'm completely wrong about that and just don't understand what's going on in MarkupTemplate?._match(). :/

Changed 6 years ago by matt@…

The line:

if 'match_once' not in hints:

seems to have the logic reversed, but that doesn't seem to be the problem. It seems like somehow match templates are being dropped from consideration on recursion, but I can't see where or why.

Changed 6 years ago by anonymous

Can anyone explain to me what the line:

inner = self._match(inner, ctxt, [match_templates[idx]])

...in template/markup.py is meant to accomplish? Removing it fixes the problem in some of my test cases, but I don't know enough about how the template machinery works to know why.

Changed 6 years ago by matt@…

Seems to prevent infinite recursion in some cases. I really need to figure out what's going on in here...

Changed 6 years ago by Matt Chaput <matt@…>

That line (well, the logic of making a pass through _match with only the current matcher) certainly seems to be the source of the problem, though.

Changed 6 years ago by Matt Chaput <matt@…>

OK, forget what I just said about infinite recursion; I had forgotten to reverse something I did while testing. Removing the whole inner = self._match(inner, ctxt, [match_templates[idx]]) thing seems to fix the problem, as far as I can tell so far.

Changed 6 years ago by Matt Chaput <matt@…>

My guess is that what the code is meant to do is apply the matcher that matched recursively, and then later apply all the other matchers recursively. This fails when the order of the matchers matters (where multiple matchers could match the same element).

The fix seems to be to remove the inner = self._match(inner, ctxt, [match_templates[idx]]) bit, and later in the code (around line 292), only remove the matched matcher from "remaining" if it's "once", otherwise apply the full list of all matchers.

I'll try to make a patch and attach it when I have some time.

Changed 6 years ago by cmlenz

Thanks for looking into this, and be sure to run the test suite to verify that such as change doesn't cause any regressions.

Changed 6 years ago by cmlenz

  • status changed from new to assigned

Okay, I've been looking into this one a bit.

The problem was introduced in [444] in response to #77. Prior to that, we had the problem that a match template could get applied to the element that it produced itself, because it was getting applied to both the matched content (inner), and the output stream. This problem is demonstrated by the test test_not_match_self.

So what's going on here, exactly? Of the three match templates in your example, the third one (path="b") matches first due to the outer <b> tag. It then applies itself to the inner content, which results in the <b type="bullet"> tags being being replaced <generic> tags. Only after that has happened do the other match templates get a chance to kick in: they are applied to the result of the first match template. But of course that result no longer contains the <b type="bullet"> tags, so that match template never matches.

How can this be fixed? I'm not sure :/

  • If we only remove the inner = self._match(inner, ctxt, [match_templates[idx]]) part, a match template can't be applied to recursively (see test_recursive_match_1)
  • If, in addition to the above, we don't remove the current match template from those that process the results of the match, we'll get infinite recursion due to match templates matching their own output over and over.
  • If we apply the full set of match templates to the inner content, we break test_not_match_self.

So I'm not sure how to proceed. The underlying problem is that Genshi somewhat schizophrenically tries to apply match templates both to the input and to the generated output. This is very much related to #187.

Still thinking/experimenting…

Changed 6 years ago by cmlenz

Potential fix

Changed 6 years ago by cmlenz

I've attached a patch containing a potential fix for this issue. All the current unit tests still succeed, and the example in the description of this ticket is added as a new unit tests, and also succeeds.

The exact change made is the following:

  • For every matching match template, all match templates found before the matching one in source order are applied to the original content, including the matching template itself (if it's not flagged non-recursive)
  • The match templates found after the matching template in source order are then applied to the output stream of the matching template.

Effectively this means no single match template is applied both to the input and the output. Which of the two it is applied to depends on the order in which match templates are declared in the markup template.

This seems to work pretty well (better than I expected), but I'd really like some feedback on whether this breaks anything in real-world projects. Although we may need to just live with that breakage anyway.

Changed 6 years ago by cmlenz

  • status changed from assigned to closed
  • resolution set to fixed

Patch applied in [810].

Changed 6 years ago by cmlenz

  • version changed from devel to 0.4.4

Add/Change #186 (py:match not applied correctly in recursion)

Author


E-mail address and user name can be saved in the Preferences.


Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.