genshi.core
Core classes for markup processing.
StreamEventKind
A kind of event on an XML stream.
Stream
Represents a stream of markup events.
This class is basically an iterator over the events.
Also provided are ways to serialize the stream to text. The serialize() method will return an iterator over generated strings, while render() returns the complete generated text at once. Both accept various parameters that impact the way the stream is serialized.
Stream events are tuples of the form:
(kind, data, position)
where kind is the event kind (such as START, END, TEXT, etc), data depends on the kind of event, and position is a (filename, line, offset) tuple that contains the location of the original element or text in the input. If the original location is unknown, position is (None, -1, -1).
filter(self, *filters)
Apply filters to the stream.
This method returns a new stream with the given filters applied. The filters must be callables that accept the stream object as parameter, and return the filtered stream.
The call:
stream.filter(filter1, filter2)
is equivalent to:
stream | filter1 | filter2
render(self, method='xml', encoding='utf-8', **kwargs)
Return a string representation of the stream.
- @param method: determines how the stream is serialized; can be either
- "xml", "xhtml", "html", "text", or a custom serializer class
- @param encoding: how the output string should be encoded; if set to
- None, this method returns a unicode object
Any additional keyword arguments are passed to the serializer, and thus depend on the method parameter value.
select(self, path, namespaces=None, variables=None)
Return a new stream that contains the events matching the given XPath expression.
@param path: a string containing the XPath expression
serialize(self, method='xml', **kwargs)
Generate strings corresponding to a specific serialization of the stream.
Unlike the render() method, this method is a generator that returns the serialized output incrementally, as opposed to returning a single string.
- @param method: determines how the stream is serialized; can be either
- "xml", "xhtml", "html", "text", or a custom serializer class
Any additional keyword arguments are passed to the serializer, and thus depend on the method parameter value.
Attrs
Sequence type that stores the attributes of an element.
The order of the attributes is preserved, while accessing and manipulating attributes by name is also supported.
>>> attrs = Attrs([('href', '#'), ('title', 'Foo')]) >>> attrs [(u'href', '#'), (u'title', 'Foo')]
>>> 'href' in attrs True >>> 'tabindex' in attrs False
>>> attrs.get(u'title') 'Foo' >>> attrs.set(u'title', 'Bar') >>> attrs [(u'href', '#'), (u'title', 'Bar')] >>> attrs.remove(u'title') >>> attrs [(u'href', '#')]
New attributes added using the set() method are appended to the end of the list:
>>> attrs.set(u'accesskey', 'k') >>> attrs [(u'href', '#'), (u'accesskey', 'k')]
An Attrs instance can also be initialized with keyword arguments.
>>> attrs = Attrs(class_='bar', href='#', title='Foo') >>> attrs.get('class') 'bar' >>> attrs.get('href') '#' >>> attrs.get('title') 'Foo'
Reserved words can be used by appending a trailing underscore to the name, and any other underscore is replaced by a dash:
>>> attrs = Attrs(class_='bar', accept_charset='utf-8') >>> attrs.get('class') 'bar' >>> attrs.get('accept-charset') 'utf-8'
Thus this shorthand can not be used if attribute names should contain actual underscore characters.
get(self, name, default=None)
Return the value of the attribute with the specified name, or the value of the default parameter if no such attribute is found.
remove(self, name)
Remove the attribute with the specified name.
If no such attribute is found, this method does nothing.
set(self, name, value)
Set the specified attribute to the given value.
If an attribute with the specified name is already in the list, the value of the existing entry is updated. Otherwise, a new attribute is appended to the end of the list.
totuple(self)
Return the attributes as a markup event.
The returned event is a TEXT event, the data is the value of all attributes joined together.
plaintext(text, keeplinebreaks=True)
Returns the text as a unicode string with all entities and tags removed.
stripentities(text, keepxmlentities=False)
Return a copy of the given text with any character or numeric entities replaced by the equivalent UTF-8 characters.
If the keepxmlentities parameter is provided and evaluates to True, the core XML entities (&, ', >, < and ") are not stripped.
striptags(text)
Return a copy of the text with all XML/HTML tags removed.
Markup
Marks a string as being safe for inclusion in HTML/XML output without needing to be escaped.
join(self, seq, escape_quotes=True)
(Not documented)
escape(cls, text, quotes=True)
Create a Markup instance from a string and escape special characters it may contain (<, >, & and ").
If the quotes parameter is set to False, the " character is left as is. Escaping quotes is generally only required for strings that are to be used in attribute values.
unescape(self)
Reverse-escapes &, <, > and " and returns a unicode object.
stripentities(self, keepxmlentities=False)
Return a copy of the text with any character or numeric entities replaced by the equivalent UTF-8 characters.
If the keepxmlentities parameter is provided and evaluates to True, the core XML entities (&, ', >, < and ") are not stripped.
striptags(self)
Return a copy of the text with all XML/HTML tags removed.
unescape(text)
Reverse-escapes &, <, > and " and returns a unicode object.
Namespace
Utility class creating and testing elements with a namespace.
Internally, namespace URIs are encoded in the QName of any element or attribute, the namespace URI being enclosed in curly braces. This class helps create and test these strings.
A Namespace object is instantiated with the namespace URI.
>>> html = Namespace('http://www.w3.org/1999/xhtml') >>> html <Namespace "http://www.w3.org/1999/xhtml"> >>> html.uri u'http://www.w3.org/1999/xhtml'
The Namespace object can than be used to generate QName objects with that namespace:
>>> html.body u'{http://www.w3.org/1999/xhtml}body' >>> html.body.localname u'body' >>> html.body.namespace u'http://www.w3.org/1999/xhtml'
The same works using item access notation, which is useful for element or attribute names that are not valid Python identifiers:
>>> html['body'] u'{http://www.w3.org/1999/xhtml}body'
A Namespace object can also be used to test whether a specific QName belongs to that namespace using the in operator:
>>> qname = html.body >>> qname in html True >>> qname in Namespace('http://www.w3.org/2002/06/xhtml2') False
QName
A qualified element or attribute name.
The unicode value of instances of this class contains the qualified name of the element or attribute, in the form {namespace}localname. The namespace URI can be obtained through the additional namespace attribute, while the local name can be accessed through the localname attribute.
>>> qname = QName('foo') >>> qname u'foo' >>> qname.localname u'foo' >>> qname.namespace
>>> qname = QName('http://www.w3.org/1999/xhtml}body') >>> qname u'{http://www.w3.org/1999/xhtml}body' >>> qname.localname u'body' >>> qname.namespace u'http://www.w3.org/1999/xhtml'