Context Navigation

← Previous Ticket
Next Ticket →

#124 closed defect (wontfix)

Problem with replace() on unicode string

Reported by:	anonymous	Owned by:	cmlenz
Priority:	major	Milestone:
Component:	Template processing	Version:	0.4
Keywords:	needinfo	Cc:

Description (last modified by cmlenz)

I am using Genshi 0.4.1 with TurboGears 1.0.2.2 and I'm getting problem with replace() on an unicode string :

Traceback (most recent call last):
  File "/var/lib/python-support/python2.4/cherrypy/_cphttptools.py", line 105, in _run
    self.main()
  File "/var/lib/python-support/python2.4/cherrypy/_cphttptools.py", line 254, in main
    body = page_handler(*virtual_path, **self.params)
  File "<string>", line 3, in default
  File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 334, in expose
    output = database.run_with_transaction(
  File "<string>", line 5, in run_with_transaction
  File "/var/lib/python-support/python2.4/turbogears/database.py", line 260, in so_rwt
    retval = func(*args, **kw)
  File "<string>", line 5, in _expose
  File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 351, in <lambda>
    mapping, fragment, args, kw)))
  File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 391, in _execute_func
    return _process_output(output, template, format, content_type, mapping, fragment)
  File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 82, in _process_output
    fragment=fragment)
  File "/var/lib/python-support/python2.4/turbogears/view/base.py", line 131, in render
    return engine.render(**kw)
  File "/var/lib/python-support/python2.4/genshi/plugin.py", line 78, in render
    return self.transform(info, template).render(method=format)
  File "/var/lib/python-support/python2.4/genshi/core.py", line 141, in render
    output = u''.join(list(generator))
  File "/var/lib/python-support/python2.4/genshi/output.py", line 332, in __call__
    for kind, data, pos in stream:
  File "/var/lib/python-support/python2.4/genshi/output.py", line 499, in __call__
    text = escape(pop_text(), quotes=False)
  File "/var/lib/python-support/python2.4/genshi/core.py", line 420, in escape
    text = unicode(text).replace('&', '&amp;') \
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)

Change History (6)

comment:1 Changed 18 years ago by cmlenz

Description modified (diff)

comment:2 Changed 18 years ago by cmlenz

Component changed from General to Template processing
Keywords needinfo added

Need more information here. What does the template look like? What's in the data you're passing the template?

comment:3 Changed 18 years ago by cmlenz

Also: you're probably passing the template some non-ASCII string that is actually not a unicode object, but a bytestring using some encoding, which is unknown to Genshi. If you want to be dealing with non-ASCII strings, you absolutely need to be using true unicode objects everywhere.

comment:4 in reply to: ↑ description Changed 18 years ago by anonymous

That was quick ! Thanks :-)

Well I'm just using straight TG object (backend is Postgresal/Sqlalchemy?) : $ tg-admin shell Python 2.4.4 (#2, Apr 5 2007, 20:11:18) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 Type "help", "copyright", "credits" or "license" for more information. (CustomShell?)

import sqlalchemy as sa from betta.model import session session.get(Foo, 2) unicode(a.content).replace('&', '&')

Traceback (most recent call last):

File "<console>", line 1, in ?

UnicodeDecodeError?: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)

a.content.replace('&', '&')

'<p>Despu\xc3\xa9s de varias semanas ...'

The first replace is how Genshi is doing and next one is how I would expect it to be done. How should that be really handled ? Should that be reported to TurboGears instead ?

The database has been filled from an unicode text file encoded in UTF-8 normally.

Thanks for your help.

comment:5 follow-up: ↓ 6 Changed 18 years ago by cmlenz

Milestone 0.4.2 deleted
Resolution set to wontfix
Status changed from new to closed

You'll need to make sure the database module and/or SQLAlchemy returns unicode objects for strings.

SQLAlchemy provides two ways to do this AFAICT:

the convert_unicode flag on the create_engine() function (see Database Engine Options). I'm not sure how you set that up with TurboGears.
using Unicode as the type for all string columns that should support non-ASCII values

I'm closing this ticket because I don't intend to change Genshi to allow bytestrings using non-ASCII encodings. Using unicode is the right thing to do anyway, so you can consider Genshi's strict behavior in that area as a hint/reminder :-)

comment:6 in reply to: ↑ 5 Changed 18 years ago by anonymous

Oh ! I understand, was confused, should be more something like this :

unicode(a.content.decode('utf-8')).replace('&', '&')

The a.content.decode('utf-8') being irrelevente to Genshi and done upfront. Makes more sense now.

Thanks for your help.

Note: See TracTickets for help on using tickets.

Download in other formats: