Edgewall Software

GenshiRecipes/RecursiveIncludeScanner

Version 2 (modified by cmlenz, 8 years ago)

sp

Motivation

This recipe evolved after discussion of another recipe, Implicit dependencies with scons?, on #markup. The solution, and the source code used in this recipe, was kindly provided by Christopher Lenz (aka cmlenz).

When working with a relatively large set of xml sources that make use of XIncludes there is a common question that comes in two forms:

  • What files will be included by a particular source ?
  • What files were included by a particular source ?

This recipe seeks to address the first sense as far as is possible. Implicit dependencies with scons? could be used as a starting point for answering the second.

Code

scan-includes.py:

"""Recursive xincludes scanner for Markup

This solution was kindly provided by Christopher Lenz <cmlenz@gmx.de>
"""

import os,sys
from markup.core import START
from markup.filters import IncludeFilter
from markup.input import XMLParser
 
def scan_xincludes(filename):
    basedir, filename = os.path.split(filename)
    namespace = IncludeFilter.NAMESPACE
    includes = set([filename])
    notfound = set()
    visited = set()
 
    def collect(filename):
        try:
            fileobj = open(os.path.join(basedir, filename), 'U')
            try:
                for kind, data, pos in XMLParser(fileobj, filename=filename):
                    if kind is START:
                        tag, attrib = data
                        if tag in namespace and tag.localname == 'include':
                            includes.add(attrib.get('href'))
            finally:
                fileobj.close()
            visited.add(filename)
        except IOError:
            includes.remove(filename)
            notfound.add(filename)

 
    while len(includes) > len(visited):
        for filename in includes - visited:
            collect(filename)
 
    return includes,notfound
 
 
if __name__ == '__main__':
    includes,notfound = scan_xincludes(sys.argv[1])
    for include in includes:
        print include
    if notfound:
        print "WARNING: the follwing include hrefs were not found:"
        for ref in notfound:
            print ref

Limitations

No consideration is given to conditional includes. All includes, that refer to existent files, are listed. If you make use of conditional includes, this scanner will yield false positives.

No attempt is made to handle includes that make use of dynamically generated file names. Any such references will end up in the 'notfound' set.

So this recipe can only reliably answer "What files may be included by a particular source?"

Discussion

Markup syntax supports conditional includes and includes whose target file names are dynamic. The latter makes it impossible to know for certain "before the show", which files will be included. Conditional includes that depend on static state could be determined before the show. This is, however, far from trivial.

Integrating Markup, or anything like it, into a build system is a typical scenario that prompts these questions. Typically you will want automatic dependencies, and reliable, but minimal, rebuilds in the event that any of your source files are changed.

For build system dependencies the consequence of false positives is often acceptable. The consequence being more sources are rebuilt than strictly necessary. And, answering the latter form of the question, "what files were included" is usually sufficient for ensuring re-builds are both minimal and correct.