Edgewall Software

Ticket #129 (assigned defect)

Opened 16 months ago

Last modified 3 weeks ago

Sentences get broken into too small fragments

Reported by: jruigrok Owned by: cmlenz
Priority: major Milestone: 0.6
Component: Internationalization Version: devel
Keywords: Cc: palgarvio

Description

Currently sentences are cut up in too small fragments. This complicates matters since grammar differs a lot from language to language and such cut ups do not necessarily make sense in other languages.

Moved from Babel ticketing to Genshi.

Attachments

i18n_comment.patch (3.7 kB) - added by palgarvio 2 months ago.
Suport for comments extraction
i18n_choose.patch (10.1 kB) - added by palgarvio 2 months ago.
Extraction currently working now, I think! ;)
i18n_choose_translate.patch (4.1 kB) - added by palgarvio 2 months ago.
Translation also working now!!!!!! I think ;)
i18n_choose_translate.2.patch (6.6 kB) - added by palgarvio 2 months ago.
This one now works correctly
i18n_choose_translate.3.patch (7.5 kB) - added by palgarvio 2 months ago.
Now also working for SUB kind's
all_patches_bundle.patch (18.4 kB) - added by palgarvio 2 months ago.
i18n_domain.patch (6.7 kB) - added by palgarvio 2 months ago.
i18n_directives_full.patch (74.8 kB) - added by palgarvio 8 weeks ago.
i18n_directives_full_with_py_strip.patch (85.2 kB) - added by palgarvio 7 weeks ago.

Change History

  Changed 16 months ago by cmlenz

  • component changed from Template processing to Internationalization

This referes to sentences that have embedded markup, such as an emphasis or a link. This is a problem with many I18n systems, but the issue is compounded in Genshi because we do automatic escaping.

  Changed 16 months ago by cmlenz

  • priority changed from major to critical
  • status changed from new to assigned

Here's a proposal:

The idea is to introduce an additional namespace for any non-trivial I18n functionality, which is understood by the genshi.filters.Translator filter, both in extraction and translation.

This namespace would provide attributes/elements such as:

  • i18n:message: handle all nested content as a single message for internationalization purposes
  • i18n:tag: assign a symbolic name to a tag nested in a message
  • i18n:param: mark the content of a tag as a message parameter (not sure about this one yet, see example below)
  • i18n:singular / i18n:plural: can be nested inside i18n:message to define a pluralizable message

Examples:

1. Compound messages including a tag:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">Help</a> for details.
    </p>
  </html>
  msgid  "Please see [Help] for details."
  msgstr "Details finden Sie unter [Hilfe]."

2. Compound messages including nested tags:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">or <em>Help</em> page</a>
      for details.
    </p>
  </html>
  msgid  "Please see [our [Help] page] for details."
  msgstr "Details finden Sie auf [unserer [Hilfe]-Seite]."

3. Compound messages including an empty tag:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Show me <input type="text" name="num" value="10" /> entries per page.
    </p>
  </html>
  msgid  "Show me [] entries per page."
  msgstr "[] Einträge pro Seite anzeigen."

4. Compound messages including multiple, not nested tags:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html" i18n:tag="help">Help</a>
      for <strong>details</strong>.
    </p>
  </html>
  msgid  "Please see [help:Help] for [details]."
  msgstr "[Details] finden Sie unter [help:Hilfe]."

5. Compound messages including multiple empty tags:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Show me <input type="text" name="num" value="10" i18n:tag="limit" />
      entries per page, starting at page
      <input type="text" name="num" value="10" i18n:tag="offset"/>.
    </p>
  </html>
  msgid  "Show me [limit:] entries per page, starting at [offset:]"
  msgstr "[limit:] Einträge pro Seite anzeigen, beginned auf Seite [offset:]"

6. Compound pluralizable messages including a tag:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <i18n:message switch="num">
      <p i18n:singular="">You have <strong i18n:param="num">$num</strong> unread message.</p>
      <p i18n:plural="">You have <strong i18n:param="num">$num</strong> unread messages.</p>
    </i18n:message>
  </html>
  msgid        "You have [%(num)s] unread message."
  msgid_plural "You have [%(num)s] unread messages."
  msgstr[0]    "Sie haben [%(num)s] ungelesene Nachricht."
  msgstr[1]    "Sie haben [%(num)s] ungelesene Nachrichten."

follow-up: ↓ 4   Changed 16 months ago by james.harris@…

  • cc james.harris@… added

The singular/plural selection concept has always been appealing to me, as it's a problem which needs a solution even outside of I18N.

However, When applied in an I18N situation it can quickly grow to be quite in-depth as many languages have complex rules for which noun form to use. From memory Arabic has up to six different plural forms.

The first thought I had was to define a predicate that determines which plural form to use for each different language:

def select_en_plural_form(num):
  if num == 1:
    return 0 # use singular form
  return 1 # use the english languages only plural form

Then write your message switch so as to indicate which choice corresponds to each of the possible results returned from the function defined above:

<i18n:message switch="num">
  <p i18n:plural-form="0">You have <strong i18n:param="num">$num</strong> unread message.</p>
  <p i18n:plural-form="1">You have <strong i18n:param="num">$num</strong> unread messages.</p>
</i18n:message>

This could hopefully go some way towards providing a versatile solution.

in reply to: ↑ 3   Changed 16 months ago by cmlenz

Replying to james.harris@icecave.com.au:

However, When applied in an I18N situation it can quickly grow to be quite in-depth as many languages have complex rules for which noun form to use. From memory Arabic has up to six different plural forms.

But luckily, that stuff is already addressed by gettext ;-)

http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms

The pluralization example above is simply a variant of ngettext() for messages that contain markup.

  Changed 16 months ago by cmlenz

There's been some feedback on the mailing list. One thing was that i18n:tag should probably used everywhere, but then we could just as well go with automatic numbering of nested tags, and drop i18n:tag completely. The following is the updated proposal taking that into account, which I think is a bit nicer.

  1. Compound messages including a tag:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">Help</a> for details.
    </p>
  </html>
  msgid  "Please see [1:Help] for details."
  msgstr "Details finden Sie unter [1:Hilfe]."
  1. Compound messages including nested tags:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">or <em>Help</em> page</a>
      for details.
    </p>
  </html>
  msgid  "Please see [1:our [2:Help] page] for details."
  msgstr "Details finden Sie auf [1:unserer [2:Hilfe]-Seite]."
  1. Compound messages including an empty tag:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Show me <input type="text" name="num" value="10" /> entries per page.
    </p>
  </html>
  msgid  "Show me [1:] entries per page."
  msgstr "[1:] Einträge pro Seite anzeigen."
  1. Compound messages including multiple, not nested tags:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">Help</a>
      for <strong>details</strong>.
    </p>
  </html>
  msgid  "Please see [1:Help] for [2:details]."
  msgstr "[2:Details] finden Sie unter [1:Hilfe]."
  1. Compound messages including multiple empty tags:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Show me <input type="text" name="num" value="10" />
      entries per page, starting at page
      <input type="text" name="num" value="10" />.
    </p>
  </html>
  msgid  "Show me [1:] entries per page, starting at [2:]"
  msgstr "[1:] Einträge pro Seite anzeigen, beginned auf Seite [2:]"
  1. Compound pluralizable messages including a tag:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="num">
      <i18n:singular>
        You have <strong i18n:param="num">$num</strong> unread message.
      </i18n:plural>
      <i18n:plural>
        You have <strong i18n:param="num">$num</strong> unread messages.
      </i18n:plural>
    </p>
  </html>
  msgid        "You have [1:%(num)s] unread message."
  msgid_plural "You have [1:%(num)s] unread messages."
  msgstr[0]    "Sie haben [1:%(num)s] ungelesene Nachricht."
  msgstr[1]    "Sie haben [1:%(num)s] ungelesene Nachrichten."

  Changed 15 months ago by cmlenz

I've started implementing this in [671].

Note that the name i18n:message has changed to the shorter i18n:msg.

follow-up: ↓ 9   Changed 14 months ago by palgarvio

  • cc palgarvio added

On the following template:

<!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:py="http://genshi.edgewall.org/"
      xmlns:xi="http://www.w3.org/2001/XInclude"
      xmlns:i18n="http://genshi.edgewall.org/i18n">
  <xi:include href="../layout.html"/>
  <head>
    <title>Welcome</title>
  </head>

  <body>
    <py:choose test="">
      <py:when test="c.user">
      <h2>Welcome $c.user.name!!!!</h2>
      <p i18n:msg="">Your last login was on ${h.format_datetime(c.user.lastlogin,
                                                    format='full',
                                                    tzinfo=c.timezone,
                                                    locale=g.locale)}</p>
      <p i18n:msg="">You can update your accout details
        <a href="${h.url_for(controller='account', action='index', id=None)}">
          here</a>.</p>

      </py:when>
      <h1 py:otherwise="">Welcome!!!</h1>
    </py:choose>
    $c.messages
  </body>
</html>

text still get's split apart on the pot file :\

#: oil/templates/main/index.html:10 oil/templates/main/index.html:16
msgid "Welcome"
msgstr ""

#: oil/templates/main/index.html:17
msgid "Your last login was on"
msgstr ""

#: oil/templates/main/index.html:21
msgid "You can update your accout details"
msgstr ""

#: oil/templates/main/index.html:22
msgid "here"
msgstr ""

#: oil/templates/main/index.html:26
msgid "Welcome!!!"
msgstr ""

  Changed 14 months ago by anonymous

  • cc james.harris@… removed

Removing myself from CC, and trying to defeat spam filter.

in reply to: ↑ 7   Changed 14 months ago by anonymous

Replying to palgarvio:

Consider the above comment invalid, I was using the stable branch, not trunk.... Stupid.....

I confirm that indeed it works, although it does need some work, all that white space ;) but that's not a major problem and we're "porting" the white space to the pot file, so I should just write better templates...

For the same template above here's the .pot contents snippet:

#: oil/templates/main/index.html:10 oil/templates/main/index.html:16
msgid "Welcome"
msgstr ""

#: oil/templates/main/index.html:17
msgid "Your last login was on"
msgstr ""

#: oil/templates/main/index.html:21
msgid ""
"You can update your accout details\n"
"        [1:\n"
"          here]."
msgstr ""

#: oil/templates/main/index.html:26
msgid "Welcome!!!"
msgstr ""

Notice the first msgid:

msgid "Welcome"

Wouldn't it be better(translation wise) to include the used variable:

msgid "Welcome $c.user.id"

  Changed 14 months ago by palgarvio

The above comment was mine, sorry.

Found a bug. The order of some messages get's switched when using the Translation filter.

Template:

  <body>
    <py:choose test="">
      <py:when test="c.user">
      <h2>Welcome $c.user.name!!!!</h2>
      <p i18n:msg="">Your last login was on ${h.format_datetime(c.user.lastlogin,
                                                    format='full',
                                                    tzinfo=c.timezone,
                                                    locale=g.locale)}</p>
      <p i18n:msg="">You can update your accout details
        <a href="${h.url_for(controller='account', action='index', id=None)}">
          here</a>.</p>

      </py:when>
      <h1 py:otherwise="">Welcome!!!</h1>
    </py:choose>
    $c.messages
  </body>

The generated HTML:

      <div id="content">
      <h2>Welcome Foo Bar!!!!</h2>
      <p>Sunday, August 26, 2007 0:40:57 AM Portugal (Lisbon) TimeYour last login was on</p>
      <p>You can update your accout details
        <a href="/account">
          here</a>.</p>

      </div>

The formated date came first!?!?

  Changed 14 months ago by palgarvio

Update:

The problem occurs when using the i18n xml tag.

  Changed 11 months ago by cmlenz

  • priority changed from critical to major
  • milestone changed from 0.5 to 0.6

  Changed 3 months ago by cmlenz

[801] adds parameter support for i18n:msg elements. The value of the i18n:msg attribute is a comma-separated list of parameter names, in the order they appear in the original content inside the element. For example:

<div i18n:msg="name, time">
  Comment by <strong>${comment.username}</strong>
  at ${format.datetime(comment.time, 'medium')}
</div>

This is converted to the following msgid:

"Comment by [1:%(name)s]\n"
"  at %(time)s"

It should then be possible for translators to rearrange the parameters in the translation as needed.

Changed 2 months ago by palgarvio

Suport for comments extraction

  Changed 2 months ago by palgarvio

Ok, the i18n:choose patch won't work if you have expressions inside the translatable strings. Working on it ....

Changed 2 months ago by palgarvio

Extraction currently working now, I think! ;)

Changed 2 months ago by palgarvio

Translation also working now!!!!!! I think ;)

  Changed 2 months ago by palgarvio

If you wish to try the patches, apply them in the order they were added to this ticket.

  Changed 2 months ago by palgarvio

Still missing, context and domain support.

Changed 2 months ago by palgarvio

This one now works correctly

Changed 2 months ago by palgarvio

Now also working for SUB kind's

follow-up: ↓ 18   Changed 2 months ago by palgarvio

Updated all_patches_bundle.patch to include cmlenz's suggestions.

in reply to: ↑ 17   Changed 2 months ago by palgarvio

Replying to palgarvio:

Updated all_patches_bundle.patch to include cmlenz's suggestions.

This also means that the previous staged patches are obsolete. I can re-add them if anyone asks.

  Changed 2 months ago by palgarvio

Added domain support which it might be complete or not. I'll explain.

Currently I haven't found a way to pass the current domains to xi:include templates nor into py:def directives. One has to define the domain in use on both the include template and on the def directive.

Any ideas cmlenz? asmodai?

Changed 2 months ago by palgarvio

  Changed 2 months ago by palgarvio

Updated i18n_domain.patch, it now passes the domain in use to xi:include and it also handle's the py:def issue since, normally, one has a "macros.html" with all py:def's which is then xi:included.

  Changed 2 months ago by palgarvio

Ok, there are still problems regarding py:def directives.

Since the i18n filter is the first to run, variables defined with py:with or arguments passed to py:def are not know at this time and i18n:choose might not know enough to do it's work.

Might there be a way to do this as a late evaluation/translation?

Oh, and once again, why can't the i18n filter be the last one instead of the first?

Changed 2 months ago by palgarvio

  Changed 2 months ago by palgarvio

Updated i18n_domain.patch. Cleaner solution.

Regarding i18n:choose still on a dead end since we don't know at that stage some of the template vars.

  Changed 8 weeks ago by palgarvio

That latest patch I added solves all issues I mentioned regarding the other patches but is meant to be used on the custom-directives branch which makes this much, much more simple.

Changed 8 weeks ago by palgarvio

Changed 7 weeks ago by palgarvio

Add/Change #129 (Sentences get broken into too small fragments)

Author



Change Properties
<Author field>
Action
as assigned
as The resolution will be set. Next status will be 'closed'
to The owner will change from cmlenz. Next status will be 'new'
 
Note: See TracTickets for help on using tickets.