genshi.input
Support for constructing markup streams from files, strings, or other sources.
ET(element)
Convert a given ElementTree element to a markup stream.
param element: an ElementTree element return: a markup stream ParseError
Exception raised when fatal syntax errors are found in the input being parsed.
XMLParser
Generator-based XML parser based on roughly equivalent code in Kid/ElementTree.
The parsing is initiated by iterating over the parser object:
>>> parser = XMLParser(StringIO('<root id="2"><child>Foo</child></root>')) >>> for kind, data, pos in parser: ... print kind, data START (QName(u'root'), Attrs([(QName(u'id'), u'2')])) START (QName(u'child'), Attrs()) TEXT Foo END child END root
parse(self)
Generator that parses the XML source, yielding markup events.
return: a markup event stream raises ParseError: if the XML text is not well formed
XML(text)
Parse the given XML source and return a markup stream.
Unlike with XMLParser, the returned stream is reusable, meaning it can be iterated over multiple times:
>>> xml = XML('<doc><elem>Foo</elem><elem>Bar</elem></doc>') >>> print xml <doc><elem>Foo</elem><elem>Bar</elem></doc> >>> print xml.select('elem') <elem>Foo</elem><elem>Bar</elem> >>> print xml.select('elem/text()') FooBar
param text: the XML source return: the parsed XML event stream raises ParseError: if the XML text is not well-formed HTMLParser
Parser for HTML input based on the Python HTMLParser module.
This class provides the same interface for generating stream events as XMLParser, and attempts to automatically balance tags.
The parsing is initiated by iterating over the parser object:
>>> parser = HTMLParser(StringIO('<UL compact><LI>Foo</UL>')) >>> for kind, data, pos in parser: ... print kind, data START (QName(u'ul'), Attrs([(QName(u'compact'), u'compact')])) START (QName(u'li'), Attrs()) TEXT Foo END li END ul
parse(self)
Generator that parses the HTML source, yielding markup events.
return: a markup event stream raises ParseError: if the HTML text is not well formed handle_starttag(self, tag, attrib)
(Not documented)
handle_endtag(self, tag)
(Not documented)
handle_data(self, text)
(Not documented)
handle_charref(self, name)
(Not documented)
handle_entityref(self, name)
(Not documented)
handle_pi(self, data)
(Not documented)
handle_comment(self, text)
(Not documented)
HTML(text, encoding='utf-8')
Parse the given HTML source and return a markup stream.
Unlike with HTMLParser, the returned stream is reusable, meaning it can be iterated over multiple times:
>>> html = HTML('<body><h1>Foo</h1></body>') >>> print html <body><h1>Foo</h1></body> >>> print html.select('h1') <h1>Foo</h1> >>> print html.select('h1/text()') Foo
param text: the HTML source return: the parsed XML event stream raises ParseError: if the HTML text is not well-formed, and error recovery fails