Edgewall Software

Opened 18 years ago

Last modified 17 years ago

#70 new enhancement

Genshi Markup to lxml fast converter

Reported by: ianb@… Owned by: cmlenz
Priority: major Milestone:
Component: General Version: 0.3.3
Keywords: helpwanted Cc: ianb@…


I'm doing a lot of stuff with lxml now, much of which takes the form of a pipeline, transforming output through multiple stages. There's opportunities to do this very efficiently if the markup isn't constantly serialized and reparsed. lxml itself is uniquely qualified for this role -- in part because of the tools it has, but also largely because it has a pretty good HTML parser.

Anyway, the sad part is that nothing produces lxml output currently except other lxml tools. Genshi doesn't either, for reasons I understand (even if I'm a little suspicious if they really apply to realistic situations). But this wouldn't be too big a problem if Genshi had a fast way to transform its markup to lxml without a serialization step. (Pyrex even? Even a Python transformation would be fast, I'm sure)

Anyway, that's what I'm suggesting here.

Change History (3)

comment:1 Changed 18 years ago by cmlenz

  • Keywords helpwanted added
  • Milestone 0.4 deleted

Would be nice... patch, anyone? :-)

comment:2 Changed 17 years ago by matt@…

Do you mean a function/class that would take a Genshi stream and return an lxml ElementTree? If so, it shouldn't be too hard to write, but it should probably be solved in the general case as a Genshi to SAX event converter. Then you could use lxml's lxml.sax.ElementTreeContentHandler interface to do Genshi to lxml...


But, given the slowness of Python looping, it might actually be slower than serializing and reparsing.

comment:3 Changed 17 years ago by ianb@…

After doing some benchmarks, serialization and re-parsing could very well be the fastest way of creating an lxml tree. The lxml parsing will probably be a very small part of the time involved, and the Genshi serialization will be most of the time. Only if you can save time over serialization will a more specific technique be advantageous.

Note: See TracTickets for help on using tickets.