Edgewall Software

Changes between Initial Version and Version 1 of GenshiRecipes/FilterHTMLUsingRegex

Show
Ignore:
Timestamp:
09/14/10 16:24:21 (4 years ago)
Author:
anatoly techtonik <techtonik@…>
Comment:

how to filter HTML with regular expression

Legend:

Unmodified
Added
Removed
Modified
  • GenshiRecipes/FilterHTMLUsingRegex

    v1 v1  
     1Genshi XPath is very limited and doesn't allow to do such things as selecting an empty row in a table. This recipe shows how to select and remove HTML elements using regular expressions in Transformers. 
     2 
     3{{{ 
     4#!python 
     5from genshi.input import HTML 
     6from genshi.filters.transform import Transformer, StreamBuffer 
     7import re 
     8 
     9html2 = HTML(''' 
     10<table>  
     11 <tr><th></th><td></td><td></td></tr>  
     12 <tr><th>not empty</th><td></td><td></td></tr>  
     13</table> 
     14''') 
     15 
     16buffer = StreamBuffer() 
     17 
     18def rowfilter(): # attention, closure 
     19  text = buffer.render() 
     20  text = re.sub(r'(?s)<tr>(\s*<t[hd](/>|>\s*</t[hd]>))+\s*</tr>', '', text) 
     21  #print(text) 
     22  return HTML(text) 
     23 
     24transtream = html2 | Transformer().select('.')\ 
     25          .copy(buffer).replace(rowfilter) 
     26print transtream 
     27}}}