#124 closed defect (wontfix)
Problem with replace() on unicode string
| Reported by: | anonymous | Owned by: | cmlenz |
|---|---|---|---|
| Priority: | major | Milestone: | |
| Component: | Template processing | Version: | 0.4 |
| Keywords: | needinfo | Cc: |
Description (last modified by cmlenz)
I am using Genshi 0.4.1 with TurboGears 1.0.2.2 and I'm getting problem with replace() on an unicode string :
Traceback (most recent call last):
File "/var/lib/python-support/python2.4/cherrypy/_cphttptools.py", line 105, in _run
self.main()
File "/var/lib/python-support/python2.4/cherrypy/_cphttptools.py", line 254, in main
body = page_handler(*virtual_path, **self.params)
File "<string>", line 3, in default
File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 334, in expose
output = database.run_with_transaction(
File "<string>", line 5, in run_with_transaction
File "/var/lib/python-support/python2.4/turbogears/database.py", line 260, in so_rwt
retval = func(*args, **kw)
File "<string>", line 5, in _expose
File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 351, in <lambda>
mapping, fragment, args, kw)))
File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 391, in _execute_func
return _process_output(output, template, format, content_type, mapping, fragment)
File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 82, in _process_output
fragment=fragment)
File "/var/lib/python-support/python2.4/turbogears/view/base.py", line 131, in render
return engine.render(**kw)
File "/var/lib/python-support/python2.4/genshi/plugin.py", line 78, in render
return self.transform(info, template).render(method=format)
File "/var/lib/python-support/python2.4/genshi/core.py", line 141, in render
output = u''.join(list(generator))
File "/var/lib/python-support/python2.4/genshi/output.py", line 332, in __call__
for kind, data, pos in stream:
File "/var/lib/python-support/python2.4/genshi/output.py", line 499, in __call__
text = escape(pop_text(), quotes=False)
File "/var/lib/python-support/python2.4/genshi/core.py", line 420, in escape
text = unicode(text).replace('&', '&') \
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)
Change History (6)
comment:1 Changed 18 years ago by cmlenz
- Description modified (diff)
comment:2 Changed 18 years ago by cmlenz
- Component changed from General to Template processing
- Keywords needinfo added
comment:3 Changed 18 years ago by cmlenz
Also: you're probably passing the template some non-ASCII string that is actually not a unicode object, but a bytestring using some encoding, which is unknown to Genshi. If you want to be dealing with non-ASCII strings, you absolutely need to be using true unicode objects everywhere.
comment:4 in reply to: ↑ description Changed 18 years ago by anonymous
That was quick ! Thanks :-)
Well I'm just using straight TG object (backend is Postgresal/Sqlalchemy?) : $ tg-admin shell Python 2.4.4 (#2, Apr 5 2007, 20:11:18) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 Type "help", "copyright", "credits" or "license" for more information. (CustomShell?)
import sqlalchemy as sa from betta.model import session session.get(Foo, 2) unicode(a.content).replace('&', '&')
Traceback (most recent call last):
File "<console>", line 1, in ?
UnicodeDecodeError?: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)
a.content.replace('&', '&')
'<p>Despu\xc3\xa9s de varias semanas ...'
The first replace is how Genshi is doing and next one is how I would expect it to be done. How should that be really handled ? Should that be reported to TurboGears instead ?
The database has been filled from an unicode text file encoded in UTF-8 normally.
Thanks for your help.
comment:5 follow-up: ↓ 6 Changed 18 years ago by cmlenz
- Milestone 0.4.2 deleted
- Resolution set to wontfix
- Status changed from new to closed
You'll need to make sure the database module and/or SQLAlchemy returns unicode objects for strings.
SQLAlchemy provides two ways to do this AFAICT:
- the convert_unicode flag on the create_engine() function (see Database Engine Options). I'm not sure how you set that up with TurboGears.
- using Unicode as the type for all string columns that should support non-ASCII values
I'm closing this ticket because I don't intend to change Genshi to allow bytestrings using non-ASCII encodings. Using unicode is the right thing to do anyway, so you can consider Genshi's strict behavior in that area as a hint/reminder :-)
comment:6 in reply to: ↑ 5 Changed 18 years ago by anonymous
Oh ! I understand, was confused, should be more something like this :
unicode(a.content.decode('utf-8')).replace('&', '&')
The a.content.decode('utf-8') being irrelevente to Genshi and done upfront. Makes more sense now.
Thanks for your help.

Need more information here. What does the template look like? What's in the data you're passing the template?