Genshi Recipes: Recursive Include Scanner
Motivation
This recipe evolved after discussion of another recipe, Implicit dependencies with scons, on the IrcChannel. The solution, and the source code used in this recipe, was kindly provided by Christopher Lenz (aka cmlenz).
When working with a relatively large set of xml sources that make use of XIncludes there is a common question that comes in two forms:
- What files will be included by a particular source ?
- What files were included by a particular source ?
This recipe seeks to address the first sense as far as is possible. Implicit dependencies with scons could be used as a starting point for answering the second.
Code
scan-includes.py:
"""Recursive xincludes scanner for Genshi This solution was kindly provided by Christopher Lenz <cmlenz@gmx.de> Updated 2008-02-04 to work with changes to Genshi by Stephan Sokolow <http://www.ssokolow.com/ContactMe> """ import os, sys from genshi.core import START from genshi.input import XMLParser def scan_xincludes(filepath): basedir, filename = os.path.split(filepath) includes = set([filename]) notfound = set() visited = set() def collect(filename): try: fileobj = open(os.path.join(basedir, filename), 'U') try: for kind, data, pos in XMLParser(fileobj, filename=filename): if kind is START: tag, attrib = data if tag.namespace == 'http://www.w3.org/2001/XInclude' and tag.localname == 'include': includes.add(attrib.get('href')) finally: fileobj.close() visited.add(filename) except IOError: includes.remove(filename) notfound.add(filename) while len(includes) > len(visited): for filename in includes - visited: collect(filename) return includes,notfound if __name__ == '__main__': includes,notfound = scan_xincludes(sys.argv[1]) for include in includes: print include if notfound: print "WARNING: the follwing include hrefs were not found:" for ref in notfound: print ref
Limitations
No consideration is given to conditional includes. All includes, that refer to existent files, are listed. If you make use of conditional includes, this scanner will yield false positives.
No attempt is made to handle includes that make use of dynamically generated file names. Any such references will end up in the 'notfound' set.
So this recipe can only reliably answer "What files may be included by a particular source?"
Discussion
The Genshi XML template language supports conditional includes and includes whose target file names are dynamic. The latter makes it impossible to know for certain "before the show", which files will be included. Conditional includes that depend on static state could be determined before the show. This is, however, far from trivial.
Integrating Genshi, or anything like it, into a build system is a typical scenario that prompts these questions. Typically you will want automatic dependencies, and reliable, but minimal, rebuilds in the event that any of your source files are changed.
For build system dependencies the consequence of false positives is often acceptable. The consequence being more sources are rebuilt than strictly necessary. And, answering the latter form of the question, "what files were included" is usually sufficient for ensuring re-builds are both minimal and correct.