Edgewall Software

ApiDocs/genshi.input

Version 5 (modified by cmlenz, 8 years ago)

--

genshi.input

Support for constructing markup streams from files, strings, or other sources.

ET(element)

Convert a given ElementTree element to a markup stream.

param element:an ElementTree element
return:a markup stream

ParseError

Exception raised when fatal syntax errors are found in the input being parsed.

XMLParser

Generator-based XML parser based on roughly equivalent code in Kid/ElementTree.

The parsing is initiated by iterating over the parser object:

>>> parser = XMLParser(StringIO('<root id="2"><child>Foo</child></root>'))
>>> for kind, data, pos in parser:
...     print('%s %s' % (kind, data))
START (QName('root'), Attrs([(QName('id'), u'2')]))
START (QName('child'), Attrs())
TEXT Foo
END child
END root

parse(self)

Generator that parses the XML source, yielding markup events.

return:a markup event stream
raises ParseError:
 if the XML text is not well formed

XML(text)

Parse the given XML source and return a markup stream.

Unlike with XMLParser, the returned stream is reusable, meaning it can be iterated over multiple times:

>>> xml = XML('<doc><elem>Foo</elem><elem>Bar</elem></doc>')
>>> print(xml)
<doc><elem>Foo</elem><elem>Bar</elem></doc>
>>> print(xml.select('elem'))
<elem>Foo</elem><elem>Bar</elem>
>>> print(xml.select('elem/text()'))
FooBar
param text:the XML source
return:the parsed XML event stream
raises ParseError:
 if the XML text is not well-formed

HTMLParser

Parser for HTML input based on the Python HTMLParser module.

This class provides the same interface for generating stream events as XMLParser, and attempts to automatically balance tags.

The parsing is initiated by iterating over the parser object:

>>> parser = HTMLParser(BytesIO(u'<UL compact><LI>Foo</UL>'.encode('utf-8')), encoding='utf-8')
>>> for kind, data, pos in parser:
...     print('%s %s' % (kind, data))
START (QName('ul'), Attrs([(QName('compact'), u'compact')]))
START (QName('li'), Attrs())
TEXT Foo
END li
END ul

parse(self)

Generator that parses the HTML source, yielding markup events.

return:a markup event stream
raises ParseError:
 if the HTML text is not well formed

handle_starttag(self, tag, attrib)

(Not documented)

handle_endtag(self, tag)

(Not documented)

handle_data(self, text)

(Not documented)

handle_charref(self, name)

(Not documented)

handle_entityref(self, name)

(Not documented)

handle_pi(self, data)

(Not documented)

handle_comment(self, text)

(Not documented)

HTML(text, encoding=None)

Parse the given HTML source and return a markup stream.

Unlike with HTMLParser, the returned stream is reusable, meaning it can be iterated over multiple times:

>>> html = HTML('<body><h1>Foo</h1></body>', encoding='utf-8')
>>> print(html)
<body><h1>Foo</h1></body>
>>> print(html.select('h1'))
<h1>Foo</h1>
>>> print(html.select('h1/text()'))
Foo
param text:the HTML source
return:the parsed XML event stream
raises ParseError:
 if the HTML text is not well-formed, and error recovery fails


See ApiDocs, Documentation