Edgewall Software

Opened 17 years ago

Closed 17 years ago

Last modified 17 years ago

#124 closed defect (wontfix)

Problem with replace() on unicode string

Reported by: anonymous Owned by: cmlenz
Priority: major Milestone:
Component: Template processing Version: 0.4
Keywords: needinfo Cc:

Description (last modified by cmlenz)

I am using Genshi 0.4.1 with TurboGears 1.0.2.2 and I'm getting problem with replace() on an unicode string :

Traceback (most recent call last):
  File "/var/lib/python-support/python2.4/cherrypy/_cphttptools.py", line 105, in _run
    self.main()
  File "/var/lib/python-support/python2.4/cherrypy/_cphttptools.py", line 254, in main
    body = page_handler(*virtual_path, **self.params)
  File "<string>", line 3, in default
  File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 334, in expose
    output = database.run_with_transaction(
  File "<string>", line 5, in run_with_transaction
  File "/var/lib/python-support/python2.4/turbogears/database.py", line 260, in so_rwt
    retval = func(*args, **kw)
  File "<string>", line 5, in _expose
  File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 351, in <lambda>
    mapping, fragment, args, kw)))
  File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 391, in _execute_func
    return _process_output(output, template, format, content_type, mapping, fragment)
  File "/var/lib/python-support/python2.4/turbogears/controllers.py", line 82, in _process_output
    fragment=fragment)
  File "/var/lib/python-support/python2.4/turbogears/view/base.py", line 131, in render
    return engine.render(**kw)
  File "/var/lib/python-support/python2.4/genshi/plugin.py", line 78, in render
    return self.transform(info, template).render(method=format)
  File "/var/lib/python-support/python2.4/genshi/core.py", line 141, in render
    output = u''.join(list(generator))
  File "/var/lib/python-support/python2.4/genshi/output.py", line 332, in __call__
    for kind, data, pos in stream:
  File "/var/lib/python-support/python2.4/genshi/output.py", line 499, in __call__
    text = escape(pop_text(), quotes=False)
  File "/var/lib/python-support/python2.4/genshi/core.py", line 420, in escape
    text = unicode(text).replace('&', '&amp;') \
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)

Change History (6)

comment:1 Changed 17 years ago by cmlenz

  • Description modified (diff)

comment:2 Changed 17 years ago by cmlenz

  • Component changed from General to Template processing
  • Keywords needinfo added

Need more information here. What does the template look like? What's in the data you're passing the template?

comment:3 Changed 17 years ago by cmlenz

Also: you're probably passing the template some non-ASCII string that is actually not a unicode object, but a bytestring using some encoding, which is unknown to Genshi. If you want to be dealing with non-ASCII strings, you absolutely need to be using true unicode objects everywhere.

comment:4 in reply to: ↑ description Changed 17 years ago by anonymous

That was quick ! Thanks :-)

Well I'm just using straight TG object (backend is Postgresal/Sqlalchemy?) : $ tg-admin shell Python 2.4.4 (#2, Apr 5 2007, 20:11:18) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 Type "help", "copyright", "credits" or "license" for more information. (CustomShell?)

import sqlalchemy as sa from betta.model import session session.get(Foo, 2) unicode(a.content).replace('&', '&amp;')

Traceback (most recent call last):

File "<console>", line 1, in ?

UnicodeDecodeError?: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)

a.content.replace('&', '&amp;')

'<p>Despu\xc3\xa9s de varias semanas ...'

The first replace is how Genshi is doing and next one is how I would expect it to be done. How should that be really handled ? Should that be reported to TurboGears instead ?

The database has been filled from an unicode text file encoded in UTF-8 normally.

Thanks for your help.

comment:5 follow-up: Changed 17 years ago by cmlenz

  • Milestone 0.4.2 deleted
  • Resolution set to wontfix
  • Status changed from new to closed

You'll need to make sure the database module and/or SQLAlchemy returns unicode objects for strings.

SQLAlchemy provides two ways to do this AFAICT:

  • the convert_unicode flag on the create_engine() function (see Database Engine Options). I'm not sure how you set that up with TurboGears.
  • using Unicode as the type for all string columns that should support non-ASCII values

I'm closing this ticket because I don't intend to change Genshi to allow bytestrings using non-ASCII encodings. Using unicode is the right thing to do anyway, so you can consider Genshi's strict behavior in that area as a hint/reminder :-)

comment:6 in reply to: ↑ 5 Changed 17 years ago by anonymous

Oh ! I understand, was confused, should be more something like this :

unicode(a.content.decode('utf-8')).replace('&', '&amp;')

The a.content.decode('utf-8') being irrelevente to Genshi and done upfront. Makes more sense now.

Thanks for your help.

Note: See TracTickets for help on using tickets.