Edgewall Software

Changes between Initial Version and Version 1 of GenshiRecipes/HtmlTransform


Ignore:
Timestamp:
Jul 28, 2006, 7:17:16 PM (18 years ago)
Author:
cmlenz
Comment:

New recipe for transforming HTML

Legend:

Unmodified
Added
Removed
Modified
  • GenshiRecipes/HtmlTransform

    v1 v1  
     1= [MarkupRecipes Markup Recipes]: Transforming HTML documents =
     2
     3While [MarkupTemplates Markup templates] need to be valid XML files, that does not mean you can't use Markup to transform “old-school” HTML documents. Markup can parse HTML input, and apply ''match templates'' to that input, in order to apply any kind of modification, such as adding site-specific chrome.
     4
     5Let's say you have the following HTML document (maybe produced by some application or component out of your control), and you'd like to integrate it in your site:
     6
     7{{{
     8#!xml
     9<HTML>
     10 <HEAD>
     11  <TITLE>Aaarrgh</TITLE>
     12  <LINK REL=stylesheet href='badstyle.css'>
     13 </HEAD>
     14 
     15 <BODY>
     16  <H1>Aaargh</H1>
     17  <P>
     18    <B>Lorem <I>ipsum</I></B> dolor sit amet, consectetur<BR>
     19    adipisicing elit, sed do eiusmod tempor incididunt ut<BR>
     20    labore et dolore magna aliqua. Ut enim ad minim veniam,<BR>
     21    quis nostrud exercitation ullamco laboris nisi ut<BR>
     22    aliquip ex ea commodo consequat.
     23  </P>
     24  <P>
     25    Duis aute irure dolor in reprehenderit in voluptate velit<BR>
     26    esse cillum dolore eu fugiat nulla pariatur. Excepteur sint<BR>
     27    occaecat cupidatat non proident, sunt in culpa qui officia<BR>
     28    deserunt mollit anim <I>id est laborum</I>.
     29  </P>
     30 </BODY>
     31
     32</HTML>
     33}}}
     34
     35What you'd like to do is:
     36 * Make that valid XHTML, with a proper DOCTYPE to trigger standards rendering mode in browsers.
     37 * Use “semantic” tags such as `<em>` and `<strong>` instead of the more presentational `<i>` and `<b>` (whether or not that's really a good idea.)
     38 * Add a new `<div id="header">` at the top of the page that contains your site logo.
     39
     40To do that, first start with the following template:
     41
     42{{{
     43#!xml
     44<!DOCTYPE html
     45    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     46    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
     47<html xmlns:py="http://markup.edgewall.org/" py:strip="">
     48
     49  <!--! Add a header DIV on top of every page with a logo image -->
     50  <body py:match="body">
     51    <div id="header">
     52      <img src="logo.png" alt="Bad Style"/>
     53    </div>
     54    ${select('*')}
     55  </body>
     56
     57  <!--! Use semantic instead of presentational tags for emphasis -->
     58  <strong py:match="B|b">${select('*|text()')}</strong>
     59  <em py:match="I|i">${select('*|text()')}</em>
     60
     61  <!--! Include the actual HTML stream, which will be processed by the rules
     62        defined above -->
     63  ${input}
     64
     65</html>
     66}}}
     67
     68That template defines a couple of match templates that do what we need. At the end, it pulls in the actual HTML content using the “input” variable.
     69
     70Finally, the following script would drive the transformation:
     71
     72{{{
     73#!python
     74import os, sys
     75from markup.input import HTMLParser
     76from markup.template import Context, Template
     77
     78def transform(html_filename, tmpl_filename):
     79    html_fileobj = open(html_filename)
     80    html = HTMLParser(html_fileobj, html_filename)
     81    html_fileobj.close()
     82
     83    tmpl_fileobj = open(tmpl_filename)
     84    tmpl = Template(tmpl_fileobj, tmpl_filename)
     85    tmpl_fileobj.close()
     86
     87    print tmpl.generate(Context(input=html)).render('xhtml')
     88
     89if __name__ == '__main__':
     90    transform(sys.argv[1], sys.argv[2])
     91}}}
     92
     93This would then produce the following output (ignoring some small whitespace differences):
     94
     95{{{
     96#!xml
     97<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     98    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
     99<html>
     100 <head>
     101  <title>Aaarrgh</title>
     102  <link rel="stylesheet" href="badstyle.css" />
     103 </head>
     104 <body>
     105  <div id="header">
     106   <img src="logo.png" alt="Bad Style" />
     107  </div>
     108  <h1>Aaargh</h1>
     109  <p>
     110    <strong>Lorem <em>ipsum</em></strong> dolor sit amet, consectetur<br />
     111    adipisicing elit, sed do eiusmod tempor incididunt ut<br />
     112    labore et dolore magna aliqua. Ut enim ad minim veniam,<br />
     113    quis nostrud exercitation ullamco laboris nisi ut<br />
     114    aliquip ex ea commodo consequat.
     115  </p><p>
     116    Duis aute irure dolor in reprehenderit in voluptate velit<br />
     117    esse cillum dolore eu fugiat nulla pariatur. Excepteur sint<br />
     118    occaecat cupidatat non proident, sunt in culpa qui officia<br />
     119    deserunt mollit anim <em>id est laborum</em>.
     120  </p>
     121 </body>
     122</html>
     123}}}
     124
     125----
     126See also: MarkupRecipes, MarkupTemplates