Edgewall Software

Opened 16 years ago

Last modified 7 years ago

#253 assigned defect

Improve handling of default namespaces

Reported by: llasram@… Owned by: cmlenz
Priority: major Milestone: 0.9
Component: Serialization Version: devel
Keywords: Cc:

Description

The NamespaceFlattener class attempts to resolve all uses of a particular namespace to a single prefix at any given point, but makes no allowance for the common practice of binding the same namespace as both the default and to an explicit prefix. The attached patch implements this support.

With the patch applied one test (MatchDirectiveTestCase.test_namespace_context) fails. I've modified the test in the patch to show the difference as I'm not sure what the correct output should in fact be. Modifying NamespaceFlattener to print out events before interpreting them shows the following for this test case:

('START_NS', (u'x', u'http://www.example.org/'))
('START', (QName(u'html'), Attrs()))
('TEXT', <Markup u'\n          '>)
('START_NS', ('', u'http://www.example.org/'))
('START', (QName(u'div'), Attrs()))
('TEXT', <Markup u'Foo'>)
('END', QName(u'div'))
('END_NS', '')
('TEXT', <Markup u'\n        '>)
('END', QName(u'html'))
('END_NS', u'x')}}}

Which to me implies that my patch is actually generating the correct behavior?

Thanks and hope this helps!

Attachments (2)

genshi-default-namespaces.patch (3.6 KB) - added by llasram@… 16 years ago.
genshi-namespace-flattening.patch (10.3 KB) - added by cmlenz 16 years ago.
Added patch by Marshall Vandegrift

Download all attachments as: .zip

Change History (13)

Changed 16 years ago by llasram@…

comment:1 Changed 16 years ago by cmlenz

I don't really see how the current behavior can be considered incorrect. Why is it a problem that Genshi omits the redundant namespace rebinding?

comment:2 Changed 16 years ago by llasram@…

Here's an example (formatted for legibility):

>>> from genshi.input import XML
>>> from genshi.output import NamespaceFlattener
>>> from genshi.output import XMLSerializer
>>> xml = XML('''
<doc xmlns="http://www.example.com/a">
  <data xmlns:a="http://www.example.com/a"
        xmlns:b="http://www.example.com/b">
    <b:item a:key="value"/>
  </data>
</doc>''')
>>> print ''.join(XMLSerializer()(xml))
<doc xmlns="http://www.example.com/a">
  <a:data xmlns:b="http://www.example.com/b">
    <b:item a:key="value"/>
  </a:data>
</doc>

Which on reflection is both buggy and sub-optimal. The NamespaceFlattener suppresses declarations of an existing namespace on a new prefix, which creates two issues:

  • The easily-spotted bug, that it suppresses declaration of the the new prefix, then uses the new prefix.
  • The more subtle bug, that it shouldn't suppress declaration of a new prefix when the existing prefix is the default namespace, as explicit prefixes are necessary to namespace-scope attributes.

If you agree with this analysis I'll revise my patch to address both issues.

comment:3 Changed 16 years ago by cmlenz

Both good points, a revised patch would be appreciated. Ideally, with added test cases (in genshi.tests.output) :)

Thanks!

comment:4 Changed 16 years ago by llasram@…

  • Type changed from enhancement to defect

Ok, attaching a new patch, complete with several additional test cases. Let me know if any of them look incorrect. Also changing this issue 'type' to 'defect' as namespace flattening is in fact buggy. Thanks!

comment:5 Changed 16 years ago by llasram@…

Oh, sweet graciousness. All the "http://example.org/"s in my patch are leading Akismet to believe my patch is URI-laden spam. Is there some other way I can submit this?

Changed 16 years ago by cmlenz

Added patch by Marshall Vandegrift

comment:6 Changed 16 years ago by cmlenz

  • Milestone changed from 0.6 to 0.5.2
  • Status changed from new to assigned

I've attached the patch you sent me via email, sorry for the trouble.

comment:7 Changed 16 years ago by llasram@…

I'm wondering what the status on this is... I'd really like to see a fix for the issue integrated into genshi. If there's a problem with this patch I am willing to re-work it yet again.

comment:8 Changed 16 years ago by cmlenz

Sorry for not getting back to you. I've tested the patch, and it looks good, but unfortunately it had quite a negative effect on serialization performance. Don't remember the exact numbers as it's been around 2 months now.

Anyway, I think there will need to be some kind of option to disable/enable the extra processing required to handle more complex namespacing. It's probably not needed by many/most sites using the XHTMLSerializer for example, so they shouldn't need to suffer the overhead.

comment:9 Changed 15 years ago by cmlenz

  • Milestone changed from 0.5.2 to 0.6

comment:10 Changed 14 years ago by cmlenz

  • Milestone changed from 0.6 to 0.6.1

comment:11 Changed 7 years ago by hodgestar

  • Milestone changed from 0.6.1 to 0.9

Move to milestone 0.9.

Note: See TracTickets for help on using tickets.