| 1 | = [MarkupRecipes Markup Recipes]: Transforming HTML documents = |
| 2 | |
| 3 | While [MarkupTemplates Markup templates] need to be valid XML files, that does not mean you can't use Markup to transform “old-school” HTML documents. Markup can parse HTML input, and apply ''match templates'' to that input, in order to apply any kind of modification, such as adding site-specific chrome. |
| 4 | |
| 5 | Let's say you have the following HTML document (maybe produced by some application or component out of your control), and you'd like to integrate it in your site: |
| 6 | |
| 7 | {{{ |
| 8 | #!xml |
| 9 | <HTML> |
| 10 | <HEAD> |
| 11 | <TITLE>Aaarrgh</TITLE> |
| 12 | <LINK REL=stylesheet href='badstyle.css'> |
| 13 | </HEAD> |
| 14 | |
| 15 | <BODY> |
| 16 | <H1>Aaargh</H1> |
| 17 | <P> |
| 18 | <B>Lorem <I>ipsum</I></B> dolor sit amet, consectetur<BR> |
| 19 | adipisicing elit, sed do eiusmod tempor incididunt ut<BR> |
| 20 | labore et dolore magna aliqua. Ut enim ad minim veniam,<BR> |
| 21 | quis nostrud exercitation ullamco laboris nisi ut<BR> |
| 22 | aliquip ex ea commodo consequat. |
| 23 | </P> |
| 24 | <P> |
| 25 | Duis aute irure dolor in reprehenderit in voluptate velit<BR> |
| 26 | esse cillum dolore eu fugiat nulla pariatur. Excepteur sint<BR> |
| 27 | occaecat cupidatat non proident, sunt in culpa qui officia<BR> |
| 28 | deserunt mollit anim <I>id est laborum</I>. |
| 29 | </P> |
| 30 | </BODY> |
| 31 | |
| 32 | </HTML> |
| 33 | }}} |
| 34 | |
| 35 | What you'd like to do is: |
| 36 | * Make that valid XHTML, with a proper DOCTYPE to trigger standards rendering mode in browsers. |
| 37 | * Use “semantic” tags such as `<em>` and `<strong>` instead of the more presentational `<i>` and `<b>` (whether or not that's really a good idea.) |
| 38 | * Add a new `<div id="header">` at the top of the page that contains your site logo. |
| 39 | |
| 40 | To do that, first start with the following template: |
| 41 | |
| 42 | {{{ |
| 43 | #!xml |
| 44 | <!DOCTYPE html |
| 45 | PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" |
| 46 | "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
| 47 | <html xmlns:py="http://markup.edgewall.org/" py:strip=""> |
| 48 | |
| 49 | <!--! Add a header DIV on top of every page with a logo image --> |
| 50 | <body py:match="body"> |
| 51 | <div id="header"> |
| 52 | <img src="logo.png" alt="Bad Style"/> |
| 53 | </div> |
| 54 | ${select('*')} |
| 55 | </body> |
| 56 | |
| 57 | <!--! Use semantic instead of presentational tags for emphasis --> |
| 58 | <strong py:match="B|b">${select('*|text()')}</strong> |
| 59 | <em py:match="I|i">${select('*|text()')}</em> |
| 60 | |
| 61 | <!--! Include the actual HTML stream, which will be processed by the rules |
| 62 | defined above --> |
| 63 | ${input} |
| 64 | |
| 65 | </html> |
| 66 | }}} |
| 67 | |
| 68 | That template defines a couple of match templates that do what we need. At the end, it pulls in the actual HTML content using the “input” variable. |
| 69 | |
| 70 | Finally, the following script would drive the transformation: |
| 71 | |
| 72 | {{{ |
| 73 | #!python |
| 74 | import os, sys |
| 75 | from markup.input import HTMLParser |
| 76 | from markup.template import Context, Template |
| 77 | |
| 78 | def transform(html_filename, tmpl_filename): |
| 79 | html_fileobj = open(html_filename) |
| 80 | html = HTMLParser(html_fileobj, html_filename) |
| 81 | html_fileobj.close() |
| 82 | |
| 83 | tmpl_fileobj = open(tmpl_filename) |
| 84 | tmpl = Template(tmpl_fileobj, tmpl_filename) |
| 85 | tmpl_fileobj.close() |
| 86 | |
| 87 | print tmpl.generate(Context(input=html)).render('xhtml') |
| 88 | |
| 89 | if __name__ == '__main__': |
| 90 | transform(sys.argv[1], sys.argv[2]) |
| 91 | }}} |
| 92 | |
| 93 | This would then produce the following output (ignoring some small whitespace differences): |
| 94 | |
| 95 | {{{ |
| 96 | #!xml |
| 97 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" |
| 98 | "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
| 99 | <html> |
| 100 | <head> |
| 101 | <title>Aaarrgh</title> |
| 102 | <link rel="stylesheet" href="badstyle.css" /> |
| 103 | </head> |
| 104 | <body> |
| 105 | <div id="header"> |
| 106 | <img src="logo.png" alt="Bad Style" /> |
| 107 | </div> |
| 108 | <h1>Aaargh</h1> |
| 109 | <p> |
| 110 | <strong>Lorem <em>ipsum</em></strong> dolor sit amet, consectetur<br /> |
| 111 | adipisicing elit, sed do eiusmod tempor incididunt ut<br /> |
| 112 | labore et dolore magna aliqua. Ut enim ad minim veniam,<br /> |
| 113 | quis nostrud exercitation ullamco laboris nisi ut<br /> |
| 114 | aliquip ex ea commodo consequat. |
| 115 | </p><p> |
| 116 | Duis aute irure dolor in reprehenderit in voluptate velit<br /> |
| 117 | esse cillum dolore eu fugiat nulla pariatur. Excepteur sint<br /> |
| 118 | occaecat cupidatat non proident, sunt in culpa qui officia<br /> |
| 119 | deserunt mollit anim <em>id est laborum</em>. |
| 120 | </p> |
| 121 | </body> |
| 122 | </html> |
| 123 | }}} |
| 124 | |
| 125 | ---- |
| 126 | See also: MarkupRecipes, MarkupTemplates |