Edgewall Software

Genshi Recipes: Recursive Include Scanner

Motivation

This recipe evolved after discussion of another recipe, Implicit dependencies with scons, on the IrcChannel. The solution, and the source code used in this recipe, was kindly provided by Christopher Lenz (aka cmlenz).

When working with a relatively large set of xml sources that make use of XIncludes there is a common question that comes in two forms:

  • What files will be included by a particular source ?
  • What files were included by a particular source ?

This recipe seeks to address the first sense as far as is possible. Implicit dependencies with scons could be used as a starting point for answering the second.

Code

scan-includes.py:

"""Recursive xincludes scanner for Genshi

This solution was kindly provided by Christopher Lenz <cmlenz@gmx.de>
Updated 2008-02-04 to work with changes to Genshi by Stephan Sokolow <http://www.ssokolow.com/ContactMe>
"""

import os, sys
from genshi.core import START
from genshi.input import XMLParser

def scan_xincludes(filepath):
    basedir, filename = os.path.split(filepath)
    includes = set([filename])
    notfound = set()
    visited = set()

    def collect(filename):
        try:
            fileobj = open(os.path.join(basedir, filename), 'U')
            try:
                for kind, data, pos in XMLParser(fileobj, filename=filename):
                    if kind is START:
                        tag, attrib = data
                        if tag.namespace == 'http://www.w3.org/2001/XInclude' and tag.localname == 'include':
                            includes.add(attrib.get('href'))
            finally:
                fileobj.close()
            visited.add(filename)
        except IOError:
            includes.remove(filename)
            notfound.add(filename)

    while len(includes) > len(visited):
        for filename in includes - visited:
            collect(filename)

    return includes,notfound
 
 
if __name__ == '__main__':
    includes,notfound = scan_xincludes(sys.argv[1])
    for include in includes:
        print include
    if notfound:
        print "WARNING: the follwing include hrefs were not found:"
        for ref in notfound:
            print ref

Limitations

No consideration is given to conditional includes. All includes, that refer to existent files, are listed. If you make use of conditional includes, this scanner will yield false positives.

No attempt is made to handle includes that make use of dynamically generated file names. Any such references will end up in the 'notfound' set.

So this recipe can only reliably answer "What files may be included by a particular source?"

Discussion

The Genshi XML template language supports conditional includes and includes whose target file names are dynamic. The latter makes it impossible to know for certain "before the show", which files will be included. Conditional includes that depend on static state could be determined before the show. This is, however, far from trivial.

Integrating Genshi, or anything like it, into a build system is a typical scenario that prompts these questions. Typically you will want automatic dependencies, and reliable, but minimal, rebuilds in the event that any of your source files are changed.

For build system dependencies the consequence of false positives is often acceptable. The consequence being more sources are rebuilt than strictly necessary. And, answering the latter form of the question, "what files were included" is usually sufficient for ensuring re-builds are both minimal and correct.


See also GenshiRecipes, GenshiRecipes/SconsXIncludeScanner

Last modified 16 years ago Last modified on Feb 4, 2008, 7:07:58 PM