Edgewall Software

Ticket #253 (assigned defect)

Opened 4 years ago

Last modified 2 years ago

Improve handling of default namespaces

Reported by: llasram@… Owned by: cmlenz
Priority: major Milestone: 0.6.1
Component: Serialization Version: devel
Keywords: Cc:

Description

The NamespaceFlattener class attempts to resolve all uses of a particular namespace to a single prefix at any given point, but makes no allowance for the common practice of binding the same namespace as both the default and to an explicit prefix. The attached patch implements this support.

With the patch applied one test (MatchDirectiveTestCase.test_namespace_context) fails. I've modified the test in the patch to show the difference as I'm not sure what the correct output should in fact be. Modifying NamespaceFlattener to print out events before interpreting them shows the following for this test case:

('START_NS', (u'x', u'http://www.example.org/'))
('START', (QName(u'html'), Attrs()))
('TEXT', <Markup u'\n          '>)
('START_NS', ('', u'http://www.example.org/'))
('START', (QName(u'div'), Attrs()))
('TEXT', <Markup u'Foo'>)
('END', QName(u'div'))
('END_NS', '')
('TEXT', <Markup u'\n        '>)
('END', QName(u'html'))
('END_NS', u'x')}}}

Which to me implies that my patch is actually generating the correct behavior?

Thanks and hope this helps!

Attachments

genshi-default-namespaces.patch Download (3.6 KB) - added by llasram@… 4 years ago.
genshi-namespace-flattening.patch Download (10.3 KB) - added by cmlenz 4 years ago.
Added patch by Marshall Vandegrift

Change History

Changed 4 years ago by llasram@…

Changed 4 years ago by cmlenz

I don't really see how the current behavior can be considered incorrect. Why is it a problem that Genshi omits the redundant namespace rebinding?

Changed 4 years ago by llasram@…

Here's an example (formatted for legibility):

>>> from genshi.input import XML
>>> from genshi.output import NamespaceFlattener
>>> from genshi.output import XMLSerializer
>>> xml = XML('''
<doc xmlns="http://www.example.com/a">
  <data xmlns:a="http://www.example.com/a"
        xmlns:b="http://www.example.com/b">
    <b:item a:key="value"/>
  </data>
</doc>''')
>>> print ''.join(XMLSerializer()(xml))
<doc xmlns="http://www.example.com/a">
  <a:data xmlns:b="http://www.example.com/b">
    <b:item a:key="value"/>
  </a:data>
</doc>

Which on reflection is both buggy and sub-optimal. The NamespaceFlattener suppresses declarations of an existing namespace on a new prefix, which creates two issues:

  • The easily-spotted bug, that it suppresses declaration of the the new prefix, then uses the new prefix.
  • The more subtle bug, that it shouldn't suppress declaration of a new prefix when the existing prefix is the default namespace, as explicit prefixes are necessary to namespace-scope attributes.

If you agree with this analysis I'll revise my patch to address both issues.

Changed 4 years ago by cmlenz

Both good points, a revised patch would be appreciated. Ideally, with added test cases (in genshi.tests.output) :)

Thanks!

Changed 4 years ago by llasram@…

  • type changed from enhancement to defect

Ok, attaching a new patch, complete with several additional test cases. Let me know if any of them look incorrect. Also changing this issue 'type' to 'defect' as namespace flattening is in fact buggy. Thanks!

Changed 4 years ago by llasram@…

Oh, sweet graciousness. All the "http://example.org/"s in my patch are leading Akismet to believe my patch is URI-laden spam. Is there some other way I can submit this?

Changed 4 years ago by cmlenz

Added patch by Marshall Vandegrift

Changed 4 years ago by cmlenz

  • status changed from new to assigned
  • milestone changed from 0.6 to 0.5.2

I've attached the patch you sent me via email, sorry for the trouble.

Changed 3 years ago by llasram@…

I'm wondering what the status on this is... I'd really like to see a fix for the issue integrated into genshi. If there's a problem with this patch I am willing to re-work it yet again.

Changed 3 years ago by cmlenz

Sorry for not getting back to you. I've tested the patch, and it looks good, but unfortunately it had quite a negative effect on serialization performance. Don't remember the exact numbers as it's been around 2 months now.

Anyway, I think there will need to be some kind of option to disable/enable the extra processing required to handle more complex namespacing. It's probably not needed by many/most sites using the XHTMLSerializer for example, so they shouldn't need to suffer the overhead.

Changed 3 years ago by cmlenz

  • milestone changed from 0.5.2 to 0.6

Changed 2 years ago by cmlenz

  • milestone changed from 0.6 to 0.6.1

Add/Change #253 (Improve handling of default namespaces)

Author


E-mail address and user name can be saved in the Preferences.


Change Properties
<Author field>
Action
as assigned
as The resolution will be set. Next status will be 'closed'
to The owner will change from cmlenz. Next status will be 'new'
 
Note: See TracTickets for help on using tickets.