Opened 17 years ago
Closed 15 years ago
#129 closed defect (fixed)
Sentences get broken into too small fragments
Reported by: | jruigrok | Owned by: | cmlenz |
---|---|---|---|
Priority: | major | Milestone: | 0.6 |
Component: | Internationalization | Version: | devel |
Keywords: | Cc: | palgarvio, dfraser |
Description
Currently sentences are cut up in too small fragments. This complicates matters since grammar differs a lot from language to language and such cut ups do not necessarily make sense in other languages.
Moved from Babel ticketing to Genshi.
Attachments (9)
Change History (38)
comment:1 Changed 17 years ago by cmlenz
- Component changed from Template processing to Internationalization
comment:2 Changed 17 years ago by cmlenz
- Priority changed from major to critical
- Status changed from new to assigned
Here's a proposal:
The idea is to introduce an additional namespace for any non-trivial I18n functionality, which is understood by the genshi.filters.Translator filter, both in extraction and translation.
This namespace would provide attributes/elements such as:
- i18n:message: handle all nested content as a single message for internationalization purposes
- i18n:tag: assign a symbolic name to a tag nested in a message
- i18n:param: mark the content of a tag as a message parameter (not sure about this one yet, see example below)
- i18n:singular / i18n:plural: can be nested inside i18n:message to define a pluralizable message
Examples:
- Compound messages including a tag:
<html xmlns:py="http://genshi.edgewall.org/" xmlns:i18n="http://genshi.edgewall.org/#i18n"> <p i18n:message=""> Please see <a href="help.html">Help</a> for details. </p> </html>
msgid "Please see [Help] for details." msgstr "Details finden Sie unter [Hilfe]."
- Compound messages including nested tags:
<html xmlns:py="http://genshi.edgewall.org/" xmlns:i18n="http://genshi.edgewall.org/#i18n"> <p i18n:message=""> Please see <a href="help.html">or <em>Help</em> page</a> for details. </p> </html>
msgid "Please see [our [Help] page] for details." msgstr "Details finden Sie auf [unserer [Hilfe]-Seite]."
- Compound messages including an empty tag:
<html xmlns:py="http://genshi.edgewall.org/" xmlns:i18n="http://genshi.edgewall.org/#i18n"> <p i18n:message=""> Show me <input type="text" name="num" value="10" /> entries per page. </p> </html>
msgid "Show me [] entries per page." msgstr "[] Einträge pro Seite anzeigen."
- Compound messages including multiple, not nested tags:
<html xmlns:py="http://genshi.edgewall.org/" xmlns:i18n="http://genshi.edgewall.org/#i18n"> <p i18n:message=""> Please see <a href="help.html" i18n:tag="help">Help</a> for <strong>details</strong>. </p> </html>
msgid "Please see [help:Help] for [details]." msgstr "[Details] finden Sie unter [help:Hilfe]."
- Compound messages including multiple empty tags:
<html xmlns:py="http://genshi.edgewall.org/" xmlns:i18n="http://genshi.edgewall.org/#i18n"> <p i18n:message=""> Show me <input type="text" name="num" value="10" i18n:tag="limit" /> entries per page, starting at page <input type="text" name="num" value="10" i18n:tag="offset"/>. </p> </html>
msgid "Show me [limit:] entries per page, starting at [offset:]" msgstr "[limit:] Einträge pro Seite anzeigen, beginned auf Seite [offset:]"
- Compound pluralizable messages including a tag:
<html xmlns:py="http://genshi.edgewall.org/" xmlns:i18n="http://genshi.edgewall.org/#i18n"> <i18n:message switch="num"> <p i18n:singular="">You have <strong i18n:param="num">$num</strong> unread message.</p> <p i18n:plural="">You have <strong i18n:param="num">$num</strong> unread messages.</p> </i18n:message> </html>
msgid "You have [%(num)s] unread message." msgid_plural "You have [%(num)s] unread messages." msgstr[0] "Sie haben [%(num)s] ungelesene Nachricht." msgstr[1] "Sie haben [%(num)s] ungelesene Nachrichten."
comment:3 follow-up: ↓ 4 Changed 17 years ago by james.harris@…
- Cc james.harris@… added
The singular/plural selection concept has always been appealing to me, as it's a problem which needs a solution even outside of I18N.
However, When applied in an I18N situation it can quickly grow to be quite in-depth as many languages have complex rules for which noun form to use. From memory Arabic has up to six different plural forms.
The first thought I had was to define a predicate that determines which plural form to use for each different language:
def select_en_plural_form(num): if num == 1: return 0 # use singular form return 1 # use the english languages only plural form
Then write your message switch so as to indicate which choice corresponds to each of the possible results returned from the function defined above:
<i18n:message switch="num"> <p i18n:plural-form="0">You have <strong i18n:param="num">$num</strong> unread message.</p> <p i18n:plural-form="1">You have <strong i18n:param="num">$num</strong> unread messages.</p> </i18n:message>
This could hopefully go some way towards providing a versatile solution.
comment:4 in reply to: ↑ 3 Changed 17 years ago by cmlenz
Replying to james.harris@icecave.com.au:
However, When applied in an I18N situation it can quickly grow to be quite in-depth as many languages have complex rules for which noun form to use. From memory Arabic has up to six different plural forms.
But luckily, that stuff is already addressed by gettext ;-)
http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms
The pluralization example above is simply a variant of ngettext() for messages that contain markup.
comment:5 Changed 17 years ago by cmlenz
There's been some feedback on the mailing list. One thing was that i18n:tag should probably used everywhere, but then we could just as well go with automatic numbering of nested tags, and drop i18n:tag completely. The following is the updated proposal taking that into account, which I think is a bit nicer.
- Compound messages including a tag:
<html xmlns:py="http://genshi.edgewall.org/" xmlns:i18n="http://genshi.edgewall.org/#i18n"> <p i18n:message=""> Please see <a href="help.html">Help</a> for details. </p> </html>
msgid "Please see [1:Help] for details." msgstr "Details finden Sie unter [1:Hilfe]."
- Compound messages including nested tags:
<html xmlns:py="http://genshi.edgewall.org/" xmlns:i18n="http://genshi.edgewall.org/#i18n"> <p i18n:message=""> Please see <a href="help.html">or <em>Help</em> page</a> for details. </p> </html>
msgid "Please see [1:our [2:Help] page] for details." msgstr "Details finden Sie auf [1:unserer [2:Hilfe]-Seite]."
- Compound messages including an empty tag:
<html xmlns:py="http://genshi.edgewall.org/" xmlns:i18n="http://genshi.edgewall.org/#i18n"> <p i18n:message=""> Show me <input type="text" name="num" value="10" /> entries per page. </p> </html>
msgid "Show me [1:] entries per page." msgstr "[1:] Einträge pro Seite anzeigen."
- Compound messages including multiple, not nested tags:
<html xmlns:py="http://genshi.edgewall.org/" xmlns:i18n="http://genshi.edgewall.org/#i18n"> <p i18n:message=""> Please see <a href="help.html">Help</a> for <strong>details</strong>. </p> </html>
msgid "Please see [1:Help] for [2:details]." msgstr "[2:Details] finden Sie unter [1:Hilfe]."
- Compound messages including multiple empty tags:
<html xmlns:py="http://genshi.edgewall.org/" xmlns:i18n="http://genshi.edgewall.org/#i18n"> <p i18n:message=""> Show me <input type="text" name="num" value="10" /> entries per page, starting at page <input type="text" name="num" value="10" />. </p> </html>
msgid "Show me [1:] entries per page, starting at [2:]" msgstr "[1:] Einträge pro Seite anzeigen, beginned auf Seite [2:]"
- Compound pluralizable messages including a tag:
<html xmlns:py="http://genshi.edgewall.org/" xmlns:i18n="http://genshi.edgewall.org/#i18n"> <p i18n:message="num"> <i18n:singular> You have <strong i18n:param="num">$num</strong> unread message. </i18n:plural> <i18n:plural> You have <strong i18n:param="num">$num</strong> unread messages. </i18n:plural> </p> </html>
msgid "You have [1:%(num)s] unread message." msgid_plural "You have [1:%(num)s] unread messages." msgstr[0] "Sie haben [1:%(num)s] ungelesene Nachricht." msgstr[1] "Sie haben [1:%(num)s] ungelesene Nachrichten."
comment:6 Changed 17 years ago by cmlenz
I've started implementing this in [671].
Note that the name i18n:message has changed to the shorter i18n:msg.
comment:7 follow-up: ↓ 9 Changed 17 years ago by palgarvio
- Cc palgarvio added
On the following template:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:py="http://genshi.edgewall.org/" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:i18n="http://genshi.edgewall.org/i18n"> <xi:include href="../layout.html"/> <head> <title>Welcome</title> </head> <body> <py:choose test=""> <py:when test="c.user"> <h2>Welcome $c.user.name!!!!</h2> <p i18n:msg="">Your last login was on ${h.format_datetime(c.user.lastlogin, format='full', tzinfo=c.timezone, locale=g.locale)}</p> <p i18n:msg="">You can update your accout details <a href="${h.url_for(controller='account', action='index', id=None)}"> here</a>.</p> </py:when> <h1 py:otherwise="">Welcome!!!</h1> </py:choose> $c.messages </body> </html>
text still get's split apart on the pot file :\
#: oil/templates/main/index.html:10 oil/templates/main/index.html:16 msgid "Welcome" msgstr "" #: oil/templates/main/index.html:17 msgid "Your last login was on" msgstr "" #: oil/templates/main/index.html:21 msgid "You can update your accout details" msgstr "" #: oil/templates/main/index.html:22 msgid "here" msgstr "" #: oil/templates/main/index.html:26 msgid "Welcome!!!" msgstr ""
comment:8 Changed 17 years ago by anonymous
- Cc james.harris@… removed
Removing myself from CC, and trying to defeat spam filter.
comment:9 in reply to: ↑ 7 Changed 17 years ago by anonymous
Replying to palgarvio:
Consider the above comment invalid, I was using the stable branch, not trunk.... Stupid.....
I confirm that indeed it works, although it does need some work, all that white space ;) but that's not a major problem and we're "porting" the white space to the pot file, so I should just write better templates...
For the same template above here's the .pot contents snippet:
#: oil/templates/main/index.html:10 oil/templates/main/index.html:16 msgid "Welcome" msgstr "" #: oil/templates/main/index.html:17 msgid "Your last login was on" msgstr "" #: oil/templates/main/index.html:21 msgid "" "You can update your accout details\n" " [1:\n" " here]." msgstr "" #: oil/templates/main/index.html:26 msgid "Welcome!!!" msgstr ""
Notice the first msgid:
msgid "Welcome"
Wouldn't it be better(translation wise) to include the used variable:
msgid "Welcome $c.user.id"
comment:10 Changed 17 years ago by palgarvio
The above comment was mine, sorry.
Found a bug. The order of some messages get's switched when using the Translation filter.
Template:
<body> <py:choose test=""> <py:when test="c.user"> <h2>Welcome $c.user.name!!!!</h2> <p i18n:msg="">Your last login was on ${h.format_datetime(c.user.lastlogin, format='full', tzinfo=c.timezone, locale=g.locale)}</p> <p i18n:msg="">You can update your accout details <a href="${h.url_for(controller='account', action='index', id=None)}"> here</a>.</p> </py:when> <h1 py:otherwise="">Welcome!!!</h1> </py:choose> $c.messages </body>
The generated HTML:
<div id="content"> <h2>Welcome Foo Bar!!!!</h2> <p>Sunday, August 26, 2007 0:40:57 AM Portugal (Lisbon) TimeYour last login was on</p> <p>You can update your accout details <a href="/account"> here</a>.</p> </div>
The formated date came first!?!?
comment:11 Changed 17 years ago by palgarvio
Update:
The problem occurs when using the i18n xml tag.
comment:12 Changed 17 years ago by cmlenz
- Milestone changed from 0.5 to 0.6
- Priority changed from critical to major
comment:13 Changed 16 years ago by cmlenz
[801] adds parameter support for i18n:msg elements. The value of the i18n:msg attribute is a comma-separated list of parameter names, in the order they appear in the original content inside the element. For example:
<div i18n:msg="name, time"> Comment by <strong>${comment.username}</strong> at ${format.datetime(comment.time, 'medium')} </div>
This is converted to the following msgid:
"Comment by [1:%(name)s]\n" " at %(time)s"
It should then be possible for translators to rearrange the parameters in the translation as needed.
comment:14 Changed 16 years ago by palgarvio
Ok, the i18n:choose patch won't work if you have expressions inside the translatable strings. Working on it ....
comment:15 Changed 16 years ago by palgarvio
If you wish to try the patches, apply them in the order they were added to this ticket.
comment:16 Changed 16 years ago by palgarvio
Still missing, context and domain support.
comment:17 follow-up: ↓ 18 Changed 16 years ago by palgarvio
Updated all_patches_bundle.patch to include cmlenz's suggestions.
comment:18 in reply to: ↑ 17 Changed 16 years ago by palgarvio
Replying to palgarvio:
Updated all_patches_bundle.patch to include cmlenz's suggestions.
This also means that the previous staged patches are obsolete. I can re-add them if anyone asks.
comment:19 Changed 16 years ago by palgarvio
Added domain support which it might be complete or not. I'll explain.
Currently I haven't found a way to pass the current domains to xi:include templates nor into py:def directives. One has to define the domain in use on both the include template and on the def directive.
Any ideas cmlenz? asmodai?
Changed 16 years ago by palgarvio
comment:20 Changed 16 years ago by palgarvio
Updated i18n_domain.patch, it now passes the domain in use to xi:include and it also handle's the py:def issue since, normally, one has a "macros.html" with all py:def's which is then xi:included.
comment:21 Changed 16 years ago by palgarvio
Ok, there are still problems regarding py:def directives.
Since the i18n filter is the first to run, variables defined with py:with or arguments passed to py:def are not know at this time and i18n:choose might not know enough to do it's work.
Might there be a way to do this as a late evaluation/translation?
Oh, and once again, why can't the i18n filter be the last one instead of the first?
Changed 16 years ago by palgarvio
comment:22 Changed 16 years ago by palgarvio
Updated i18n_domain.patch. Cleaner solution.
Regarding i18n:choose still on a dead end since we don't know at that stage some of the template vars.
comment:23 Changed 16 years ago by palgarvio
That latest patch I added solves all issues I mentioned regarding the other patches but is meant to be used on the custom-directives branch which makes this much, much more simple.
Changed 16 years ago by palgarvio
Changed 16 years ago by palgarvio
comment:24 Changed 16 years ago by wichert@…
Can someone give an indication of the status of this work?
comment:25 Changed 16 years ago by palgarvio
Some of the issues and required features raised on this ticket are beeing worked on on this branch.
comment:26 Changed 16 years ago by wichert@…
Is the spec as documented in this ticket considered to be final?
The reason that I'm asking is that I am looking at adding this to chameleon.genshi, but I don't want to risk implementing a standard that will change.
comment:27 Changed 16 years ago by palgarvio
Regarding the spec, there's no documentation yet and since it's not included in the main development trunk, ie, it's in a branch, it's still possible that it changes thought, it's almost definitive.
Either way, for the current spec, there's nothing like the source code I pointed you at, specially i18n.py.
comment:28 Changed 16 years ago by dfraser
- Cc dfraser added
comment:29 Changed 15 years ago by cboos
- Resolution set to fixed
- Status changed from assigned to closed
With [1072], the i18n branch has been merged in trunk and I think the issue can be considered to be fixed.
Further problems or enhancements requests related to this topic should be opened as new tickets.
This referes to sentences that have embedded markup, such as an emphasis or a link. This is a problem with many I18n systems, but the issue is compounded in Genshi because we do automatic escaping.