#65 closed defect (fixed)
TurboGears templates with latin-1 encoding
Reported by: | cito@… | Owned by: | cmlenz |
---|---|---|---|
Priority: | major | Milestone: | 0.3.4 |
Component: | Parsing | Version: | 0.3.3 |
Keywords: | input encoding | Cc: |
Description
I'd like to use German Umlaut characters in latin-1 (iso-8859-1) encoding in my HTML Genshi templates under Turbogears. Currently, it seems I can only use UTF-8 (hard coded in input.XMLParser). It would be nice if some parameter could be passed from Turbogears to Genshi telling it which default input encoding to use (for the output encoding, this is already possible). Maybe the HTMLParser could also determine the encoding from the meta content-type tag if no encoding is explicitely set. Then, everything after the meta tag would be parsed in with that encoding.
Change History (8)
comment:1 Changed 18 years ago by cmlenz
- Component changed from General to Parsing
- Milestone changed from 0.4 to 0.3.4
- Status changed from new to assigned
comment:2 Changed 18 years ago by cmlenz
- Resolution set to fixed
- Status changed from assigned to closed
comment:3 Changed 18 years ago by cmlenz
#66 has been marked as duplicate of this ticket.
comment:4 Changed 18 years ago by anonymous
Thanks, that was quick. But as far as I see there is still now way to pass the encoding parameter through the plugin API to the template loader. My idea is to set an input encoding parameter in Turbogears like genshi.input_encoding = 'latin-1' that is stored as an template option in the _load_engines() function in turbogears/view/base.py, recognized by the Genshi plugin API and passed to the XML/HTML parser.
comment:5 Changed 18 years ago by cmlenz
Well, TurboGears currently doesn't actually support passing any config options to template plugins (except for Kid). See:
http://groups.google.com/group/turbogears-trunk/browse_thread/thread/a309604c2d6f3dea
Also, I'm not sure whether this'd be a good idea. If you're not going to use UTF-8 in your template (which is the default encoding for XML files if not otherwise specified, or no BOM is found), you should probably specify the encoding in the XML declaration anyway, no?
comment:6 follow-up: ↓ 7 Changed 18 years ago by cito@…
The current TurboGears trunk passes genshi.encoding to the template plugin and that seems to work. If Genshi supports it, it would be easy to adapt Turbogears to pass an additional option in Turbogears.
Concerning the XML declaration, what about HTML templates (not XHTML, but HTML)? I must admit I still haven't had a closer look on Genshi, but as far as I understand Genshi is able to parse HTML soup using a more tolerant HTML parser. Such HTML templates should not have a XML declaration since that would be misleading (the XML declaration implicitly claims that the file is valid XML).
The TurboGears standard Genshi templates (http://trac.turbogears.org/turbogears/browser/trunk/turbogears/qstemplates/quickstart/%2Bpackage%2B/templates/welcome.html) have no XML declaration either, Since they are XHMTL, one could (actually should) add a declaration here, but this may be avoided because some browser do not recognize the doctype if its preceded by the XML declaration. By the way, is it possible to control whether the XLM declaration will appear in the output?
comment:7 in reply to: ↑ 6 Changed 18 years ago by cmlenz
Replying to cito@online.de:
The current TurboGears trunk passes genshi.encoding to the template plugin and that seems to work. If Genshi supports it, it would be easy to adapt Turbogears to pass an additional option in Turbogears.
AFAIK TG only supports pluginname.outputformat. Maybe it also supports pluginname.encoding, I haven't tried (but that'd be the output encoding, I think). TG does not appear to pass any additional options to plugins, so I can add as many options as I'd like, and you wouldn't be able to use them from TG. I must admit I don't have much inside knowledge about how TG dispatches those options, so I must trust what others have said about this. I'd love to be proven wrong on this.
Anyway, I probably still should add an option for that, I just doubt that you'll be able to use it from TG. And the genshi.encoding parameter should be used to set the output encoding if I'm not mistaken.
Concerning the XML declaration, what about HTML templates (not XHTML, but HTML)? I must admit I still haven't had a closer look on Genshi, but as far as I understand Genshi is able to parse HTML soup using a more tolerant HTML parser. Such HTML templates should not have a XML declaration since that would be misleading (the XML declaration implicitly claims that the file is valid XML).
Genshi doesn't actually support HTML files as templates. I.e. it can parse HTML and lets you do all kinds of stuff with that, but you can't use HTML for templates, simply because HTML doesn't support namespaces, which are needed for directives.
The TurboGears standard Genshi templates have no XML declaration either,
I assume they don't need any because they're UTF-8 (or even plain ASCII) encoded.
Since they are XHMTL, one could (actually should) add a declaration here, but this may be avoided because some browser do not recognize the doctype if its preceded by the XML declaration. By the way, is it possible to control whether the XLM declaration will appear in the output?
The XML decl is currently not passed through (i.e. only the parser sees it). That's something I intend to address in the long term, but it's not high priority (because as you indicate, including the decl triggers quirks mode in IE6, so real-world usefulness is limited in web-apps).
comment:8 Changed 18 years ago by cito@…
Yes, currently turbogears does not pass additional options. But they could be easily added here, in _load_engines() at the bottom of the file: http://trac.turbogears.org/turbogears/browser/trunk/turbogears/view/base.py As you see, genshi.encoding is already supported (yes, that would be the output encoding; I have set it to latin-1 and that seemed to work).
That's right, HTML does not support namespaces. I thought Genshi parser does not care about that and you could use them as templates anyway, i.e. including the attribute language (though they would not be proper HTML anymore, that's right). But if Genshi templates have always to be XML anyway, then it would probably suffice if the encoding is read from the XML declaration.
Yeah, that'd be good. Expat only seems to support Latin-1 in addition to UTF-8/16, so you're lucky ;-)
About detecting the encoding from the Content-type <meta /> tag, well, that's a slightly different issue that'd deserve its own ticket.