Edgewall Software

Ticket #129 (assigned defect)

Opened 13 months ago

Last modified 3 days ago

Sentences get broken into too small fragments

Reported by: jruigrok Owned by: cmlenz
Priority: major Milestone: 0.6
Component: Internationalization Version: devel
Keywords: Cc: palgarvio

Description

Currently sentences are cut up in too small fragments. This complicates matters since grammar differs a lot from language to language and such cut ups do not necessarily make sense in other languages.

Moved from Babel ticketing to Genshi.

Attachments

Change History

  Changed 13 months ago by cmlenz

  • component changed from Template processing to Internationalization

This referes to sentences that have embedded markup, such as an emphasis or a link. This is a problem with many I18n systems, but the issue is compounded in Genshi because we do automatic escaping.

  Changed 13 months ago by cmlenz

  • priority changed from major to critical
  • status changed from new to assigned

Here's a proposal:

The idea is to introduce an additional namespace for any non-trivial I18n functionality, which is understood by the genshi.filters.Translator filter, both in extraction and translation.

This namespace would provide attributes/elements such as:

  • i18n:message: handle all nested content as a single message for internationalization purposes
  • i18n:tag: assign a symbolic name to a tag nested in a message
  • i18n:param: mark the content of a tag as a message parameter (not sure about this one yet, see example below)
  • i18n:singular / i18n:plural: can be nested inside i18n:message to define a pluralizable message

Examples:

1. Compound messages including a tag:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">Help</a> for details.
    </p>
  </html>
  msgid  "Please see [Help] for details."
  msgstr "Details finden Sie unter [Hilfe]."

2. Compound messages including nested tags:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">or <em>Help</em> page</a>
      for details.
    </p>
  </html>
  msgid  "Please see [our [Help] page] for details."
  msgstr "Details finden Sie auf [unserer [Hilfe]-Seite]."

3. Compound messages including an empty tag:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Show me <input type="text" name="num" value="10" /> entries per page.
    </p>
  </html>
  msgid  "Show me [] entries per page."
  msgstr "[] Einträge pro Seite anzeigen."

4. Compound messages including multiple, not nested tags:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html" i18n:tag="help">Help</a>
      for <strong>details</strong>.
    </p>
  </html>
  msgid  "Please see [help:Help] for [details]."
  msgstr "[Details] finden Sie unter [help:Hilfe]."

5. Compound messages including multiple empty tags:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Show me <input type="text" name="num" value="10" i18n:tag="limit" />
      entries per page, starting at page
      <input type="text" name="num" value="10" i18n:tag="offset"/>.
    </p>
  </html>
  msgid  "Show me [limit:] entries per page, starting at [offset:]"
  msgstr "[limit:] Einträge pro Seite anzeigen, beginned auf Seite [offset:]"

6. Compound pluralizable messages including a tag:

  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <i18n:message switch="num">
      <p i18n:singular="">You have <strong i18n:param="num">$num</strong> unread message.</p>
      <p i18n:plural="">You have <strong i18n:param="num">$num</strong> unread messages.</p>
    </i18n:message>
  </html>
  msgid        "You have [%(num)s] unread message."
  msgid_plural "You have [%(num)s] unread messages."
  msgstr[0]    "Sie haben [%(num)s] ungelesene Nachricht."
  msgstr[1]    "Sie haben [%(num)s] ungelesene Nachrichten."

follow-up: ↓ 4   Changed 13 months ago by james.harris@…

  • cc james.harris@… added

The singular/plural selection concept has always been appealing to me, as it's a problem which needs a solution even outside of I18N.

However, When applied in an I18N situation it can quickly grow to be quite in-depth as many languages have complex rules for which noun form to use. From memory Arabic has up to six different plural forms.

The first thought I had was to define a predicate that determines which plural form to use for each different language:

def select_en_plural_form(num):
  if num == 1:
    return 0 # use singular form
  return 1 # use the english languages only plural form

Then write your message switch so as to indicate which choice corresponds to each of the possible results returned from the function defined above:

<i18n:message switch="num">
  <p i18n:plural-form="0">You have <strong i18n:param="num">$num</strong> unread message.</p>
  <p i18n:plural-form="1">You have <strong i18n:param="num">$num</strong> unread messages.</p>
</i18n:message>

This could hopefully go some way towards providing a versatile solution.

in reply to: ↑ 3   Changed 13 months ago by cmlenz

Replying to james.harris@icecave.com.au:

However, When applied in an I18N situation it can quickly grow to be quite in-depth as many languages have complex rules for which noun form to use. From memory Arabic has up to six different plural forms.

But luckily, that stuff is already addressed by gettext ;-)

http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms

The pluralization example above is simply a variant of ngettext() for messages that contain markup.

  Changed 13 months ago by cmlenz

There's been some feedback on the mailing list. One thing was that i18n:tag should probably used everywhere, but then we could just as well go with automatic numbering of nested tags, and drop i18n:tag completely. The following is the updated proposal taking that into account, which I think is a bit nicer.

  1. Compound messages including a tag:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">Help</a> for details.
    </p>
  </html>
  msgid  "Please see [1:Help] for details."
  msgstr "Details finden Sie unter [1:Hilfe]."
  1. Compound messages including nested tags:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">or <em>Help</em> page</a>
      for details.
    </p>
  </html>
  msgid  "Please see [1:our [2:Help] page] for details."
  msgstr "Details finden Sie auf [1:unserer [2:Hilfe]-Seite]."
  1. Compound messages including an empty tag:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Show me <input type="text" name="num" value="10" /> entries per page.
    </p>
  </html>
  msgid  "Show me [1:] entries per page."
  msgstr "[1:] Einträge pro Seite anzeigen."
  1. Compound messages including multiple, not nested tags:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Please see <a href="help.html">Help</a>
      for <strong>details</strong>.
    </p>
  </html>
  msgid  "Please see [1:Help] for [2:details]."
  msgstr "[2:Details] finden Sie unter [1:Hilfe]."
  1. Compound messages including multiple empty tags:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="">
      Show me <input type="text" name="num" value="10" />
      entries per page, starting at page
      <input type="text" name="num" value="10" />.
    </p>
  </html>
  msgid  "Show me [1:] entries per page, starting at [2:]"
  msgstr "[1:] Einträge pro Seite anzeigen, beginned auf Seite [2:]"
  1. Compound pluralizable messages including a tag:
  <html xmlns:py="http://genshi.edgewall.org/"
        xmlns:i18n="http://genshi.edgewall.org/#i18n">
    <p i18n:message="num">
      <i18n:singular>
        You have <strong i18n:param="num">$num</strong> unread message.
      </i18n:plural>
      <i18n:plural>
        You have <strong i18n:param="num">$num</strong> unread messages.
      </i18n:plural>
    </p>
  </html>
  msgid        "You have [1:%(num)s] unread message."
  msgid_plural "You have [1:%(num)s] unread messages."
  msgstr[0]    "Sie haben [1:%(num)s] ungelesene Nachricht."
  msgstr[1]    "Sie haben [1:%(num)s] ungelesene Nachrichten."

  Changed 13 months ago by cmlenz

I've started implementing this in [671].

Note that the name i18n:message has changed to the shorter i18n:msg.

follow-up: ↓ 9   Changed 11 months ago by palgarvio

  • cc palgarvio added

On the following template:

<!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:py="http://genshi.edgewall.org/"
      xmlns:xi="http://www.w3.org/2001/XInclude"
      xmlns:i18n="http://genshi.edgewall.org/i18n">
  <xi:include href="../layout.html"/>
  <head>
    <title>Welcome</title>
  </head>

  <body>
    <py:choose test="">
      <py:when test="c.user">
      <h2>Welcome $c.user.name!!!!</h2>
      <p i18n:msg="">Your last login was on ${h.format_datetime(c.user.lastlogin,
                                                    format='full',
                                                    tzinfo=c.timezone,
                                                    locale=g.locale)}</p>
      <p i18n:msg="">You can update your accout details
        <a href="${h.url_for(controller='account', action='index', id=None)}">
          here</a>.</p>

      </py:when>
      <h1 py:otherwise="">Welcome!!!</h1>
    </py:choose>
    $c.messages
  </body>
</html>

text still get's split apart on the pot file :\

#: oil/templates/main/index.html:10 oil/templates/main/index.html:16
msgid "Welcome"
msgstr ""

#: oil/templates/main/index.html:17
msgid "Your last login was on"
msgstr ""

#: oil/templates/main/index.html:21
msgid "You can update your accout details"
msgstr ""

#: oil/templates/main/index.html:22
msgid "here"
msgstr ""

#: oil/templates/main/index.html:26
msgid "Welcome!!!"
msgstr ""

  Changed 11 months ago by anonymous

  • cc james.harris@… removed

Removing myself from CC, and trying to defeat spam filter.

in reply to: ↑ 7   Changed 11 months ago by anonymous

Replying to palgarvio:

Consider the above comment invalid, I was using the stable branch, not trunk.... Stupid.....

I confirm that indeed it works, although it does need some work, all that white space ;) but that's not a major problem and we're "porting" the white space to the pot file, so I should just write better templates...

For the same template above here's the .pot contents snippet:

#: oil/templates/main/index.html:10 oil/templates/main/index.html:16
msgid "Welcome"
msgstr ""

#: oil/templates/main/index.html:17
msgid "Your last login was on"
msgstr ""

#: oil/templates/main/index.html:21
msgid ""
"You can update your accout details\n"
"        [1:\n"
"          here]."
msgstr ""

#: oil/templates/main/index.html:26
msgid "Welcome!!!"
msgstr ""

Notice the first msgid:

msgid "Welcome"

Wouldn't it be better(translation wise) to include the used variable:

msgid "Welcome $c.user.id"

  Changed 11 months ago by palgarvio

The above comment was mine, sorry.

Found a bug. The order of some messages get's switched when using the Translation filter.

Template:

  <body>
    <py:choose test="">
      <py:when test="c.user">
      <h2>Welcome $c.user.name!!!!</h2>
      <p i18n:msg="">Your last login was on ${h.format_datetime(c.user.lastlogin,
                                                    format='full',
                                                    tzinfo=c.timezone,
                                                    locale=g.locale)}</p>
      <p i18n:msg="">You can update your accout details
        <a href="${h.url_for(controller='account', action='index', id=None)}">
          here</a>.</p>

      </py:when>
      <h1 py:otherwise="">Welcome!!!</h1>
    </py:choose>
    $c.messages
  </body>

The generated HTML:

      <div id="content">
      <h2>Welcome Foo Bar!!!!</h2>
      <p>Sunday, August 26, 2007 0:40:57 AM Portugal (Lisbon) TimeYour last login was on</p>
      <p>You can update your accout details
        <a href="/account">
          here</a>.</p>

      </div>

The formated date came first!?!?

  Changed 11 months ago by palgarvio

Update:

The problem occurs when using the i18n xml tag.

  Changed 8 months ago by cmlenz

  • priority changed from critical to major
  • milestone changed from 0.5 to 0.6

  Changed 2 weeks ago by cmlenz

[801] adds parameter support for i18n:msg elements. The value of the i18n:msg attribute is a comma-separated list of parameter names, in the order they appear in the original content inside the element. For example:

<div i18n:msg="name, time">
  Comment by <strong>${comment.username}</strong>
  at ${format.datetime(comment.time, 'medium')}
</div>

This is converted to the following msgid:

"Comment by [1:%(name)s]\n"
"  at %(time)s"

It should then be possible for translators to rearrange the parameters in the translation as needed.

Add/Change #129 (Sentences get broken into too small fragments)

Author



Change Properties
<Author field>
Action
as assigned
as The resolution will be set. Next status will be 'closed'
to The owner will change. Next status will be 'new'
 
Note: See TracTickets for help on using tickets.