Edgewall Software

Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#124 closed defect (wontfix)

Problem with replace() on unicode string

Reported by: anonymous Owned by: cmlenz
Priority: major Milestone:
Component: Template processing Version: 0.4
Keywords: needinfo Cc:

Description (last modified by cmlenz)

I am using Genshi 0.4.1 with TurboGears and I'm getting problem with replace() on an unicode string :

Traceback (most recent call last):
  File "/var/lib/python-support/python2.4/cherrypy/_cphttptools.py", line 105, in _run
  File "/var/lib/python-support/python2.4/cherrypy/_cphttptools.py", line 254, in main
    body = page_handler(*virtual_path, **self.params)
  File "<string>", line 3, in default
  File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 334, in expose
    output = database.run_with_transaction(
  File "<string>", line 5, in run_with_transaction
  File "/var/lib/python-support/python2.4/turbogears/database.py", line 260, in so_rwt
    retval = func(*args, **kw)
  File "<string>", line 5, in _expose
  File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 351, in <lambda>
    mapping, fragment, args, kw)))
  File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 391, in _execute_func
    return _process_output(output, template, format, content_type, mapping, fragment)
  File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 82, in _process_output
  File "/var/lib/python-support/python2.4/turbogears/view/base.py", line 131, in render
    return engine.render(**kw)
  File "/var/lib/python-support/python2.4/genshi/plugin.py", line 78, in render
    return self.transform(info, template).render(method=format)
  File "/var/lib/python-support/python2.4/genshi/core.py", line 141, in render
    output = u''.join(list(generator))
  File "/var/lib/python-support/python2.4/genshi/output.py", line 332, in __call__
    for kind, data, pos in stream:
  File "/var/lib/python-support/python2.4/genshi/output.py", line 499, in __call__
    text = escape(pop_text(), quotes=False)
  File "/var/lib/python-support/python2.4/genshi/core.py", line 420, in escape
    text = unicode(text).replace('&', '&amp;') \
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)

Attachments (0)

Change History (6)

comment:1 Changed 8 years ago by cmlenz

  • Description modified (diff)

comment:2 Changed 8 years ago by cmlenz

  • Component changed from General to Template processing
  • Keywords needinfo added

Need more information here. What does the template look like? What's in the data you're passing the template?

comment:3 Changed 8 years ago by cmlenz

Also: you're probably passing the template some non-ASCII string that is actually not a unicode object, but a bytestring using some encoding, which is unknown to Genshi. If you want to be dealing with non-ASCII strings, you absolutely need to be using true unicode objects everywhere.

comment:4 in reply to: ↑ description Changed 8 years ago by anonymous

That was quick ! Thanks :-)

Well I'm just using straight TG object (backend is Postgresal/Sqlalchemy?) : $ tg-admin shell Python 2.4.4 (#2, Apr 5 2007, 20:11:18) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 Type "help", "copyright", "credits" or "license" for more information. (CustomShell?)

import sqlalchemy as sa from betta.model import session session.get(Foo, 2) unicode(a.content).replace('&', '&amp;')

Traceback (most recent call last):

File "<console>", line 1, in ?

UnicodeDecodeError?: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)

a.content.replace('&', '&amp;')

'<p>Despu\xc3\xa9s de varias semanas ...'

The first replace is how Genshi is doing and next one is how I would expect it to be done. How should that be really handled ? Should that be reported to TurboGears instead ?

The database has been filled from an unicode text file encoded in UTF-8 normally.

Thanks for your help.

comment:5 follow-up: Changed 8 years ago by cmlenz

  • Milestone 0.4.2 deleted
  • Resolution set to wontfix
  • Status changed from new to closed

You'll need to make sure the database module and/or SQLAlchemy returns unicode objects for strings.

SQLAlchemy provides two ways to do this AFAICT:

  • the convert_unicode flag on the create_engine() function (see Database Engine Options). I'm not sure how you set that up with TurboGears.
  • using Unicode as the type for all string columns that should support non-ASCII values

I'm closing this ticket because I don't intend to change Genshi to allow bytestrings using non-ASCII encodings. Using unicode is the right thing to do anyway, so you can consider Genshi's strict behavior in that area as a hint/reminder :-)

comment:6 in reply to: ↑ 5 Changed 8 years ago by anonymous

Oh ! I understand, was confused, should be more something like this :

unicode(a.content.decode('utf-8')).replace('&', '&amp;')

The a.content.decode('utf-8') being irrelevente to Genshi and done upfront. Makes more sense now.

Thanks for your help.

Add Comment

Modify Ticket

Change Properties
Set your email in Preferences
as closed The owner will remain cmlenz.
The resolution will be deleted. Next status will be 'reopened'.

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.