Edgewall Software

Ticket #129 (closed defect: fixed)

Opened 7 years ago

Last modified 4 years ago

Sentences get broken into too small fragments

Reported by: jruigrok Owned by: cmlenz
Priority: major Milestone: 0.6
Component: Internationalization Version: devel
Keywords: Cc: palgarvio, dfraser

Description

Currently sentences are cut up in too small fragments. This complicates matters since grammar differs a lot from language to language and such cut ups do not necessarily make sense in other languages.

Moved from Babel ticketing to Genshi.

Attachments

i18n_comment.patch Download (3.7 KB) - added by palgarvio 6 years ago.
Suport for comments extraction
i18n_choose.patch Download (10.1 KB) - added by palgarvio 6 years ago.
Extraction currently working now, I think! ;)
i18n_choose_translate.patch Download (4.1 KB) - added by palgarvio 6 years ago.
Translation also working now!!!!!! I think ;)
i18n_choose_translate.2.patch Download (6.6 KB) - added by palgarvio 6 years ago.
This one now works correctly
i18n_choose_translate.3.patch Download (7.5 KB) - added by palgarvio 6 years ago.
Now also working for SUB kind's
all_patches_bundle.patch Download (18.4 KB) - added by palgarvio 6 years ago.
i18n_domain.patch Download (6.7 KB) - added by palgarvio 6 years ago.
i18n_directives_full.patch Download (74.8 KB) - added by palgarvio 6 years ago.
i18n_directives_full_with_py_strip.patch Download (85.2 KB) - added by palgarvio 6 years ago.

Change History

  Changed 7 years ago by cmlenz

  • component changed from Template processing to Internationalization

This referes to sentences that have embedded markup, such as an emphasis or a link. This is a problem with many I18n systems, but the issue is compounded in Genshi because we do automatic escaping.

  Changed 7 years ago by cmlenz

  • status changed from new to assigned
  • priority changed from major to critical

Here's a proposal:

The idea is to introduce an additional namespace for any non-trivial I18n functionality, which is understood by the genshi.filters.Translator filter, both in extraction and translation.

This namespace would provide attributes/elements such as:

  • i18n:message: handle all nested content as a single message for internationalization purposes
  • i18n:tag: assign a symbolic name to a tag nested in a message
  • i18n:param: mark the content of a tag as a message parameter (not sure about this one yet, see example below)
  • i18n:singular / i18n:plural: can be nested inside i18n:message to define a pluralizable message

Examples:

1. Compound messages including a tag:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">Help</a> for details.
    </p>
  </html>
  msgid  "Please see [Help] for details."
  msgstr "Details finden Sie unter [Hilfe]."

2. Compound messages including nested tags:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">or <em>Help</em> page</a>
      for details.
    </p>
  </html>
  msgid  "Please see [our [Help] page] for details."
  msgstr "Details finden Sie auf [unserer [Hilfe]-Seite]."

3. Compound messages including an empty tag:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Show me <input type="text" name="num" value="10" /> entries per page.
    </p>
  </html>
  msgid  "Show me [] entries per page."
  msgstr "[] Einträge pro Seite anzeigen."

4. Compound messages including multiple, not nested tags:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html" i18n:tag="help">Help</a>
      for <strong>details</strong>.
    </p>
  </html>
  msgid  "Please see [help:Help] for [details]."
  msgstr "[Details] finden Sie unter [help:Hilfe]."

5. Compound messages including multiple empty tags:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Show me <input type="text" name="num" value="10" i18n:tag="limit" />
      entries per page, starting at page
      <input type="text" name="num" value="10" i18n:tag="offset"/>.
    </p>
  </html>
  msgid  "Show me [limit:] entries per page, starting at [offset:]"
  msgstr "[limit:] Einträge pro Seite anzeigen, beginned auf Seite [offset:]"

6. Compound pluralizable messages including a tag:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <i18n:message switch="num">
      <p i18n:singular="">You have <strong i18n:param="num">$num</strong> unread message.</p>
      <p i18n:plural="">You have <strong i18n:param="num">$num</strong> unread messages.</p>
    </i18n:message>
  </html>
  msgid        "You have [%(num)s] unread message."
  msgid_plural "You have [%(num)s] unread messages."
  msgstr[0]    "Sie haben [%(num)s] ungelesene Nachricht."
  msgstr[1]    "Sie haben [%(num)s] ungelesene Nachrichten."

follow-up: ↓ 4   Changed 7 years ago by james.harris@…

  • cc james.harris@… added

The singular/plural selection concept has always been appealing to me, as it's a problem which needs a solution even outside of I18N.

However, When applied in an I18N situation it can quickly grow to be quite in-depth as many languages have complex rules for which noun form to use. From memory Arabic has up to six different plural forms.

The first thought I had was to define a predicate that determines which plural form to use for each different language:

def select_en_plural_form(num):
  if num == 1:
    return 0 # use singular form
  return 1 # use the english languages only plural form

Then write your message switch so as to indicate which choice corresponds to each of the possible results returned from the function defined above:

<i18n:message switch="num">
  <p i18n:plural-form="0">You have <strong i18n:param="num">$num</strong> unread message.</p>
  <p i18n:plural-form="1">You have <strong i18n:param="num">$num</strong> unread messages.</p>
</i18n:message>

This could hopefully go some way towards providing a versatile solution.

in reply to: ↑ 3   Changed 7 years ago by cmlenz

Replying to james.harris@icecave.com.au:

However, When applied in an I18N situation it can quickly grow to be quite in-depth as many languages have complex rules for which noun form to use. From memory Arabic has up to six different plural forms.

But luckily, that stuff is already addressed by gettext ;-)

 http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms

The pluralization example above is simply a variant of ngettext() for messages that contain markup.

  Changed 7 years ago by cmlenz

There's been some  feedback on the mailing list. One thing was that i18n:tag should probably used everywhere, but then we could just as well go with automatic numbering of nested tags, and drop i18n:tag completely. The following is the updated proposal taking that into account, which I think is a bit nicer.

  1. Compound messages including a tag:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">Help</a> for details.
    </p>
  </html>
  msgid  "Please see [1:Help] for details."
  msgstr "Details finden Sie unter [1:Hilfe]."
  1. Compound messages including nested tags:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">or <em>Help</em> page</a>
      for details.
    </p>
  </html>
  msgid  "Please see [1:our [2:Help] page] for details."
  msgstr "Details finden Sie auf [1:unserer [2:Hilfe]-Seite]."
  1. Compound messages including an empty tag:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Show me <input type="text" name="num" value="10" /> entries per page.
    </p>
  </html>
  msgid  "Show me [1:] entries per page."
  msgstr "[1:] Einträge pro Seite anzeigen."
  1. Compound messages including multiple, not nested tags:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">Help</a>
      for <strong>details</strong>.
    </p>
  </html>
  msgid  "Please see [1:Help] for [2:details]."
  msgstr "[2:Details] finden Sie unter [1:Hilfe]."
  1. Compound messages including multiple empty tags:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Show me <input type="text" name="num" value="10" />
      entries per page, starting at page
      <input type="text" name="num" value="10" />.
    </p>
  </html>
  msgid  "Show me [1:] entries per page, starting at [2:]"
  msgstr "[1:] Einträge pro Seite anzeigen, beginned auf Seite [2:]"
  1. Compound pluralizable messages including a tag:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="num">
      <i18n:singular>
        You have <strong i18n:param="num">$num</strong> unread message.
      </i18n:plural>
      <i18n:plural>
        You have <strong i18n:param="num">$num</strong> unread messages.
      </i18n:plural>
    </p>
  </html>
  msgid        "You have [1:%(num)s] unread message."
  msgid_plural "You have [1:%(num)s] unread messages."
  msgstr[0]    "Sie haben [1:%(num)s] ungelesene Nachricht."
  msgstr[1]    "Sie haben [1:%(num)s] ungelesene Nachrichten."

  Changed 7 years ago by cmlenz

I've started implementing this in [671].

Note that the name i18n:message has changed to the shorter i18n:msg.

follow-up: ↓ 9   Changed 7 years ago by palgarvio

  • cc palgarvio added

On the following template:

<!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:py="http://genshi.edgewall.org/"
      xmlns:xi="http://www.w3.org/2001/XInclude"
      xmlns:i18n="http://genshi.edgewall.org/i18n">
  <xi:include href="../layout.html"/>
  <head>
    <title>Welcome</title>
  </head>

  <body>
    <py:choose test="">
      <py:when test="c.user">
      <h2>Welcome $c.user.name!!!!</h2>
      <p i18n:msg="">Your last login was on ${h.format_datetime(c.user.lastlogin,
                                                    format='full',
                                                    tzinfo=c.timezone,
                                                    locale=g.locale)}</p>
      <p i18n:msg="">You can update your accout details
        <a href="${h.url_for(controller='account', action='index', id=None)}">
          here</a>.</p>

      </py:when>
      <h1 py:otherwise="">Welcome!!!</h1>
    </py:choose>
    $c.messages
  </body>
</html>

text still get's split apart on the pot file :\

#: oil/templates/main/index.html:10 oil/templates/main/index.html:16
msgid "Welcome"
msgstr ""

#: oil/templates/main/index.html:17
msgid "Your last login was on"
msgstr ""

#: oil/templates/main/index.html:21
msgid "You can update your accout details"
msgstr ""

#: oil/templates/main/index.html:22
msgid "here"
msgstr ""

#: oil/templates/main/index.html:26
msgid "Welcome!!!"
msgstr ""

  Changed 7 years ago by anonymous

  • cc james.harris@… removed

Removing myself from CC, and trying to defeat spam filter.

in reply to: ↑ 7   Changed 7 years ago by anonymous

Replying to palgarvio:

Consider the above comment invalid, I was using the stable branch, not trunk.... Stupid.....

I confirm that indeed it works, although it does need some work, all that white space ;) but that's not a major problem and we're "porting" the white space to the pot file, so I should just write better templates...

For the same template above here's the .pot contents snippet:

#: oil/templates/main/index.html:10 oil/templates/main/index.html:16
msgid "Welcome"
msgstr ""

#: oil/templates/main/index.html:17
msgid "Your last login was on"
msgstr ""

#: oil/templates/main/index.html:21
msgid ""
"You can update your accout details\n"
"        [1:\n"
"          here]."
msgstr ""

#: oil/templates/main/index.html:26
msgid "Welcome!!!"
msgstr ""

Notice the first msgid:

msgid "Welcome"

Wouldn't it be better(translation wise) to include the used variable:

msgid "Welcome $c.user.id"

  Changed 7 years ago by palgarvio

The above comment was mine, sorry.

Found a bug. The order of some messages get's switched when using the Translation filter.

Template:

  <body>
    <py:choose test="">
      <py:when test="c.user">
      <h2>Welcome $c.user.name!!!!</h2>
      <p i18n:msg="">Your last login was on ${h.format_datetime(c.user.lastlogin,
                                                    format='full',
                                                    tzinfo=c.timezone,
                                                    locale=g.locale)}</p>
      <p i18n:msg="">You can update your accout details
        <a href="${h.url_for(controller='account', action='index', id=None)}">
          here</a>.</p>

      </py:when>
      <h1 py:otherwise="">Welcome!!!</h1>
    </py:choose>
    $c.messages
  </body>

The generated HTML:

      <div id="content">
      <h2>Welcome Foo Bar!!!!</h2>
      <p>Sunday, August 26, 2007 0:40:57 AM Portugal (Lisbon) TimeYour last login was on</p>
      <p>You can update your accout details
        <a href="/account">
          here</a>.</p>

      </div>

The formated date came first!?!?

  Changed 7 years ago by palgarvio

Update:

The problem occurs when using the i18n xml tag.

  Changed 6 years ago by cmlenz

  • priority changed from critical to major
  • milestone changed from 0.5 to 0.6

  Changed 6 years ago by cmlenz

[801] adds parameter support for i18n:msg elements. The value of the i18n:msg attribute is a comma-separated list of parameter names, in the order they appear in the original content inside the element. For example:

<div i18n:msg="name, time">
  Comment by <strong>${comment.username}</strong>
  at ${format.datetime(comment.time, 'medium')}
</div>

This is converted to the following msgid:

"Comment by [1:%(name)s]\n"
"  at %(time)s"

It should then be possible for translators to rearrange the parameters in the translation as needed.

Changed 6 years ago by palgarvio

Suport for comments extraction

  Changed 6 years ago by palgarvio

Ok, the i18n:choose patch won't work if you have expressions inside the translatable strings. Working on it ....

Changed 6 years ago by palgarvio

Extraction currently working now, I think! ;)

Changed 6 years ago by palgarvio

Translation also working now!!!!!! I think ;)

  Changed 6 years ago by palgarvio

If you wish to try the patches, apply them in the order they were added to this ticket.

  Changed 6 years ago by palgarvio

Still missing, context and domain support.

Changed 6 years ago by palgarvio

This one now works correctly

Changed 6 years ago by palgarvio

Now also working for SUB kind's

follow-up: ↓ 18   Changed 6 years ago by palgarvio

Updated all_patches_bundle.patch to include cmlenz's suggestions.

in reply to: ↑ 17   Changed 6 years ago by palgarvio

Replying to palgarvio:

Updated all_patches_bundle.patch to include cmlenz's suggestions.

This also means that the previous staged patches are obsolete. I can re-add them if anyone asks.

  Changed 6 years ago by palgarvio

Added domain support which it might be complete or not. I'll explain.

Currently I haven't found a way to pass the current domains to xi:include templates nor into py:def directives. One has to define the domain in use on both the include template and on the def directive.

Any ideas cmlenz? asmodai?

Changed 6 years ago by palgarvio

  Changed 6 years ago by palgarvio

Updated i18n_domain.patch Download, it now passes the domain in use to xi:include and it also handle's the py:def issue since, normally, one has a "macros.html" with all py:def's which is then xi:included.

  Changed 6 years ago by palgarvio

Ok, there are still problems regarding py:def directives.

Since the i18n filter is the first to run, variables defined with py:with or arguments passed to py:def are not know at this time and i18n:choose might not know enough to do it's work.

Might there be a way to do this as a late evaluation/translation?

Oh, and once again, why can't the i18n filter be the last one instead of the first?

Changed 6 years ago by palgarvio

  Changed 6 years ago by palgarvio

Updated i18n_domain.patch Download. Cleaner solution.

Regarding i18n:choose still on a dead end since we don't know at that stage some of the template vars.

  Changed 6 years ago by palgarvio

That latest patch I added solves all issues I mentioned regarding the other patches but is meant to be used on the custom-directives branch which makes this much, much more simple.

Changed 6 years ago by palgarvio

Changed 6 years ago by palgarvio

  Changed 5 years ago by wichert@…

Can someone give an indication of the status of this work?

  Changed 5 years ago by palgarvio

Some of the issues and required features raised on this ticket are beeing worked on on this branch.

  Changed 5 years ago by wichert@…

Is the spec as documented in this ticket considered to be final?

The reason that I'm asking is that I am looking at adding this to chameleon.genshi, but I don't want to risk implementing a standard that will change.

  Changed 5 years ago by palgarvio

Regarding the spec, there's no documentation yet and since it's not included in the main development trunk, ie, it's in a branch, it's still possible that it changes thought, it's almost definitive.

Either way, for the current spec, there's nothing like the source code I pointed you at, specially i18n.py.

  Changed 5 years ago by dfraser

  • cc dfraser added

  Changed 4 years ago by cboos

  • status changed from assigned to closed
  • resolution set to fixed

With [1072], the i18n branch has been merged in trunk and I think the issue can be considered to be fixed.

Further problems or enhancements requests related to this topic should be opened as new tickets.

Add/Change #129 (Sentences get broken into too small fragments)

Author


E-mail address and user name can be saved in the Preferences.


Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.