Edgewall Software

Opened 16 years ago

Closed 16 years ago

Last modified 14 years ago

#190 closed defect (fixed)

Memory leak when Python code in templates raises an exception

Reported by: cboos Owned by: cmlenz
Priority: critical Milestone: 0.5
Component: Expression evaluation Version: 0.4.4
Keywords: memory Cc:

Description

This is a follow-up to the some of the memory issues we've seen in Trac (#T6614). athomas was able to create a sample program that reproduces the issue:

import gc
from itertools import groupby
from genshi.template import MarkupTemplate
from genshi.template.eval import UndefinedError


m = MarkupTemplate("""
<html xmlns:py="http://genshi.edgewall.org/">
  <body>
    <py:for each="i, j in groupby(test, key=lambda i: len(i))">
      ${say.hello}
    </py:for>
  </body>
</html>
""", lookup='lenient')

while True:
    test = dict.fromkeys(map(str, range(1000, 100000)))
    try:
        print m.generate(test=test, groupby=groupby)
    except UndefinedError:
        pass
    print gc.collect()
    print len(gc.get_objects())

Running it shows something like:

$ python25 aat2.py
354
12926
0
12949
0
12972
0
12995
0
13018
0
13041
0
13064
0
...

And it won't stop growing. This is with latest Genshi (r800).

Now, the interesting thing is that when we revert r770, the problem seems to disappear:

$ python25 aat2.py
354
12924
16
12924
16
12924
16
12924
16
12924
16
12924
16
12924
16
12924
16
...

and memory usage is constant as well.

To me, these results seem to indicate clearly that the following eval seems to hold a hidden reference to the data when an exception is raised.

    def evaluate(self, data):
        """Evaluate the expression against the given data dictionary.
        
        :param data: a mapping containing the data to evaluate against
        :return: the result of the evaluation
        """
        __traceback_hide__ = 'before_and_this'
        _globals = self._globals()
        _globals['__data__'] = data
        return eval(self.code, _globals, {'__data__': data})

(see source:trunk/genshi/template/eval.py@800#L135)

When removing the locals from that last line:

  • genshi/template/eval.py

     
    141141        __traceback_hide__ = 'before_and_this'
    142142        _globals = self._globals()
    143143        _globals['__data__'] = data
    144         return eval(self.code, _globals, {'__data__': data})
     144        return eval(self.code, _globals)
    145145
    146146
    147147class Suite(Code):

the problem persists, so that reference seems to be related to the globals (and by the way, this makes me think of that blog entry "How to Add Memory Leaks to Python", for which I never saw an explanation, maybe it's the same underlying bug).

Attachments (11)

clear-data-r803.patch (4.5 KB) - added by cboos 16 years ago.
Workaround the bug by clearing the Context
exprleak.py (729 bytes) - added by cmlenz 16 years ago.
Stripped down test case
noastplay.diff (2.5 KB) - added by cmlenz 16 years ago.
Remove AST stuff
exprleak2.py (676 bytes) - added by cboos 16 years ago.
Any exception would do, so no need for expr2 and no need to push a new frame.
no__data__.diff (3.5 KB) - added by cmlenz 16 years ago.
Another workaround, cleaner but as of yet incomplete
match_unbuffered.diff (1.4 KB) - added by cmlenz 16 years ago.
Unbuffered match templates
match_unbuffered3.diff (18.7 KB) - added by cmlenz 16 years ago.
Unbuffered match templates, almost complete this time
match_unbuffered4.diff (18.2 KB) - added by cmlenz 16 years ago.
Improved patch
match_unbuffered5.diff (19.2 KB) - added by cmlenz 16 years ago.
And a couple more fixes
match_unbuffered6.diff (20.2 KB) - added by cmlenz 16 years ago.
More improvements
trac_buffer_false.diff (1.5 KB) - added by cmlenz 16 years ago.
Turn off match template buffering in Trac where possible

Download all attachments as: .zip

Change History (46)

Changed 16 years ago by cboos

Workaround the bug by clearing the Context

comment:1 Changed 16 years ago by cboos

The details of the changes are inside the patch, but as this Trac is not yet running Trac 0.11, I'll inline them here:


Workaround a memory leak happening when an exception is raised while rendering a template.

More specifically, when an exception is raised while evaluating Python code located in expressions or code blocks inside a template, there are some situations where the data dictionary placed in the globals() used for the evaluation will be kept alive. This will have particularly severe consequences if that data dictionary itself references lots of objects, which is not uncommon.

The workaround consists to clear that leaked data (a genshi.template.base.Context object) so that at least the objects referenced by the Context won't be leaked (the Context itself will still be leaked, but it's far less critical). For clearing the Context, we need to clear all the three of frames, pop and push attributes. Note that the problem happens whether frames is a real deque or the list based one, and the fix is the same in both cases as well.

Fixes #190.


I wrote "fixes #190" because I really think that's about all we can do to handle this issue in a reasonable time frame, this doesn't mean that the real reason of the leak is completely elucidated at this point.

Changed 16 years ago by cmlenz

Stripped down test case

comment:2 follow-up: Changed 16 years ago by cmlenz

  • Status changed from new to assigned

Thanks Christian, this would be a viable option if we can't figure out the actual cause, but I'd like to explore the problem more deeply this week.

I've created another test case exprleak.py that is even further stripped down. And some observations:

  • The AST transformation and bytecode generation does not seem to be at fault here. Replacing it with a simple compile(source, filename, 'eval') still results in the same leak.
  • When doing normal Python scoping (i.e. removing the Context from the example), the leak goes away.

I'll be doing more investigation tomorrow.

comment:3 Changed 16 years ago by cmlenz

Another observation: it doesn't make a difference whether run() is a generator or simply a normal function returning a list.

Changed 16 years ago by cmlenz

Remove AST stuff

comment:4 in reply to: ↑ 2 ; follow-up: Changed 16 years ago by cmlenz

Replying to cmlenz:

  • The AST transformation and bytecode generation does not seem to be at fault here. Replacing it with a simple compile(source, filename, 'eval') still results in the same leak.

Correction: it does make a difference if we stop stuffing __data__ into _globals, see noastplay.diff

comment:5 in reply to: ↑ 4 Changed 16 years ago by cboos

Replying to cmlenz:

Correction: it does make a difference if we stop stuffing __data__ into _globals, see noastplay.diff

Yes, that's what the difference seen with or without r770 already hinted at.

I've managed to reduce the test case a bit further, see attachment:exprleak2.py

Changed 16 years ago by cboos

Any exception would do, so no need for expr2 and no need to push a new frame.

comment:6 follow-ups: Changed 16 years ago by aronacher

Looks like genshi is not to blame. itertools.groupby has a grouper with a reference to the groupby type but no traverse func. As soon as a circular reference ends up in the groupby (which happens thanks to the func_globals in our lambda) genshi leaks.

So we could argue that it's Python's fault but that makes nobody happy :(

comment:7 in reply to: ↑ 6 ; follow-up: Changed 16 years ago by aronacher

Oh. And exceptions have nothing to do with it. This is the minimal code which triggers the leak:

import gc
from itertools import groupby
from genshi.template.base import Context
from genshi.template.eval import Expression

expr1 = Expression("iter(groupby(test, key=lambda x: len(x)))")

for i in xrange(10):
    ctxt = Context(test=dict.fromkeys(map(str, range(10, 100))),
                   groupby=groupby)
    for i, j in expr1.evaluate(ctxt):
        ctxt['j'] = j
        break
    gc.collect()
    print len(gc.get_objects())

Changed 16 years ago by cmlenz

Another workaround, cleaner but as of yet incomplete

comment:8 follow-up: Changed 16 years ago by cmlenz

I've added a different workaround (no__data__.diff) that I would prefer here. It seems to help (by removing a circular ref?), though I'm not quite sure yet why. Also, still need to reimplement augmented assignment in a slightly different way.

comment:9 Changed 16 years ago by cmlenz

(Note that this last workaround will also require some kludging around for Py2.3 compatibility, something I'm hoping we'll be able to drop in 0.6 or 0.7)

comment:10 in reply to: ↑ 8 Changed 16 years ago by cmlenz

Replying to cmlenz:

I've added a different workaround (no__data__.diff) that I would prefer here. It seems to help (by removing a circular ref?), though I'm not quite sure yet why. Also, still need to reimplement augmented assignment in a slightly different way.

Um, not sure what I was testing, but that workaround doesn't actually help in anyway (the circular ref is still there through the closure, so it didn't really make sense to me anyway).

comment:11 Changed 16 years ago by aronacher

For a more real world example I propose cleaning up the context completely after rendering. That could break up that circular reference in some cases.

comment:12 in reply to: ↑ 7 Changed 16 years ago by cboos

Replying to aronacher:

More minimizing (no iter, no Context):

import gc
from itertools import groupby
from genshi.template.eval import Expression

expr1 = Expression("groupby(test, key=lambda x: 1)")

for i in xrange(10):
    data = dict(test=dict.fromkeys(map(str, range(10, 100))), groupby=groupby)
    for i, j in expr1.evaluate(data):
    # for i, j in groupby(data['test'], key=lambda x: 1):
        data['j'] = j
        break
    gc.collect()
    print len(gc.get_objects()

If that would be only a problem with itertools.groupby and the grouper, the direct Python code (i.e. using the commented out line in the above) would leak as well, no? And it doesn't.

comment:13 in reply to: ↑ 6 Changed 16 years ago by cboos

Replying to aronacher:

... As soon as a circular reference ends up in the groupby (which happens thanks to the func_globals in our lambda) genshi leaks.

Ah, I see ;-)

comment:14 Changed 16 years ago by aronacher

Here a python only example of the leak: http://paste.pocoo.org/show/31350/

A more real-world example is left as an exercise for the reader.

comment:15 Changed 16 years ago by jruigrok

Raymond Hettinger confirmed it as a genuine itertools.groupby() bug. I have now logged an issue at http://bugs.python.org/issue2246 for this since Both Armin and Christopher were unable to find a patch that felt comfortable with. So upwards escalation seemed best.

comment:16 Changed 16 years ago by jruigrok

As Raymond lets me know:

"It's fixed and will be in the next release for 2.5, 2.6, and 3.0."

So how do we want to deal with this ticket? Close it because it is fixed upstream? Christian put a workaround into trac's compat.py for groupby().

comment:17 follow-up: Changed 16 years ago by cboos

Here's another one that might be relevant: http://bugs.python.org/issue2231

Would be interesting to see the memory usage without itertools.chain.

comment:18 in reply to: ↑ 17 Changed 16 years ago by cboos

Replying to cboos:

... would be interesting to see the memory usage without itertools.chain.

It didn't make a difference.

comment:19 Changed 16 years ago by cmlenz

Okay, working with Christian's test environment and running the the excessive report {11}, I've been able to track down where the bulk of the memory usage come from: the buffering of matched streams in the py:match directive.

I'm attaching a patch that adds a buffer="false" optimization hint to py:match. When this is added to match directive on the <body> tag in the Trac layout template, memory usage on the report drops from ~140M to ~30M. So far so good. BUT…

There's a big problem with this right now though: it fails on nested match templates. I could only test this by commenting out the inclusion of theme.html, which also defines a match template for <body>. I'm still looking into a proper fix for this, and it looks like that might be rather involved.

Changed 16 years ago by cmlenz

Unbuffered match templates

comment:20 follow-up: Changed 16 years ago by cboos

I tried the patch with very good results:

  1. with [trac 6692], buffer="true" (i.e. same as without the patch), the memory usage after rendering a TracBrowser page with 1000 entries is 25MB, with peak usage of 31MB. After a while (18 repeated requests), the memory usage reaches 29MB (+/- 1MB) and seems to stabilize there.
  2. with [trac 6692], buffer="false", the memory usage after rendering is 20.7MB, with peak usage of 25.3MB. It then stabilizes at 23.6 (+/- 0.5MB).
  3. for reference, with [trac 6696] which adds yet another py:match on <body>, the memory after rendering is 25.3MB as in 1., but the peak memory usage is 36.3MB. It stabilizes also near 30 MB, though.

So what's happening with content = list(self._include(chain(..., how come this list doesn't get garbage collected?

comment:21 in reply to: ↑ 20 ; follow-up: Changed 16 years ago by athomas

Replying to cboos:

So what's happening with content = list(self._include(chain(..., how come this list doesn't get garbage collected?

Because of this bug that we found :\.

comment:22 in reply to: ↑ 21 ; follow-up: Changed 16 years ago by cboos

Replying to athomas:

Replying to cboos:

So what's happening with content = list(self._include(chain(..., how come this list doesn't get garbage collected?

Because of this bug that we found :\.

I can't see how that bug could explain why attachment:match_unbuffered.diff makes a difference: when the list() of events is created (without the patch), that will not generate more unicode objects than when those events are not buffered (with the patch). The fact that this list and the events it refers to are staying around seems to be yet another issue.

comment:23 in reply to: ↑ 22 ; follow-up: Changed 16 years ago by cboos

Amending comment:22:

Replying to athomas:

Replying to cboos:

So what's happening with content = list(self._include(chain(..., how come this list doesn't get garbage collected?

Because of this bug that we found :\.

I can't see how that bug could explain why attachment:match_unbuffered.diff makes a difference: when the list() of events is created (without the patch), that will not generate more unicode objects than when those events are not buffered (with the patch).

Sorry, I can now see "how" this happens, as I've verified that even when I delete that content list immediately after its construction, it doesn't help...

It's indeed the fact that we're creating such a big list in the first place, with all those unicode strings present at the same time that explains the differences with and without the patch.

The fact that this list and the events it refers to are staying around seems to be yet another issue.

Ok, the list doesn't stay around, and the http://bugs.python.org/issue2321 simply makes it looks like it stays around.

Nevertheless, even with issue2321 fixed (Python 2.5.3) we will see high peak memory usage, even more so with [T6696], as it looks like there's a second such big list that gets created in this case. I hope that not every IStreamFilter will need its own additional copy of the content, otherwise we're really in bad shape...

For reference, summarizing the peak memory usage I found in comment:20:

Trac Genshi Peak Memory Usage
r6992* r809 + match_unbuffered.diff 25MB
r6992 r809 30MB
r6996 r809 36MB
  • * Trac's layout.html modified to contain <py:match path="body" buffer="false" ...

comment:24 in reply to: ↑ 23 Changed 16 years ago by athomas

Replying to cboos:

Nevertheless, even with issue2321 fixed (Python 2.5.3) we will see high peak memory usage, even more so with [T6696], as it looks like there's a second such big list that gets created in this case. I hope that not every IStreamFilter will need its own additional copy of the content, otherwise we're really in bad shape...

Chris is working on fixing the problem in Genshi, but apparently it's a bit involved and may take a little while.

!IStreamFilter's (eg. Transformer) generally strive to keep the minimum amount of buffered data possible, so this shouldn't be a particular problem.

Changed 16 years ago by cmlenz

Unbuffered match templates, almost complete this time

comment:25 follow-up: Changed 16 years ago by cmlenz

Okay, here's a new patch that implements the unbuffered match template processing.

This one's looking pretty good, same drastic memory use improvements as before, only slightly slower AFAICT. It works with nested templates, i.e. disabling theme.html is no longer needed to test.

There's still one unit test failing (and I haven't added any new ones yet). The failing test correctly points out that the change breaks using Python code blocks inside match templates, but that should be fixable with some more refactoring.

comment:26 Changed 16 years ago by cmlenz

Oh, and this patch also includes the patch to #186, just because I'm lazy ;)

comment:27 in reply to: ↑ 25 Changed 16 years ago by cboos

That's really great, I've got very good results testing memory hungry /report pages in Trac, with match_unbuffered3.diff.

However, when I tried to repeat the browser test of comment:20, I hit an internal error:

...
  File "C:\Workspace\src\trac\repos\trunk\trac\versioncontrol\templates\dir_entries.html", line 25, in <Suite u"chgset_context = context('changeset', change.rev)">
...
UndefinedError: "change" not defined

But change is defined in a surrounding <py:with>.

As I wasn't sure if by "Python code blocks" you meant <?python ... ?> PI or any Suite, I thought I should mention that issue.

Changed 16 years ago by cmlenz

Improved patch

comment:28 Changed 16 years ago by cmlenz

I've attached another patch that should fix the previous errors. I'll try to clean it up a little bit more and then check in.

Please let me know if you find any additional problems. Thanks!

Changed 16 years ago by cmlenz

And a couple more fixes

Changed 16 years ago by cmlenz

More improvements

comment:29 Changed 16 years ago by cmlenz

Okay, one more patch for testing. As far as I can tell, this is a good candidate for checkin.

Changed 16 years ago by cmlenz

Turn off match template buffering in Trac where possible

comment:30 follow-ups: Changed 16 years ago by cboos

Hm, I see I took too much time to test :-) Anyway, here are my intermediate results for match_unbuffered4.diff, and I'll complement them soon with results for the later patches.

The previous issue (comment:27) is indeed gone, so I've been able to test again the browser view with 1000 files:

Trac Genshi Peak Memory Usage Time (min - max)
r6992 r809 + match_unbuffered.diff 31MB 8s - 9s
r6992*r809 + match_unbuffered.diff 25MB 8.2s
r6992 r815 + match_unbuffered4.diff 31.3MB 7.2s - 8.5s
r6992*r815 + match_unbuffered4.diff 25.5MB 8.2s - 9s
r6996 r815 + match_unbuffered4.diff 36.3MB 9.3s - 10s
r6996*r815 + match_unbuffered4.diff 26MB 9.3s - 11s

(*) with <py:match path="body" once="true" buffer="false">

Sorry, I have a high variability in the timings, perhaps this is because tests are done on Windows, so those numbers just give an idea.

I also tested with 10000 entries, and that scales pretty well as the memory peaks at only 64MB. OTOH, Trac seems to take forever for rendering that 9.3MB page, although the rendering time in itself was only 73s.

Finally, I've looked around at most of the other pages in Trac, and everything seems to be fine except for one last issue, when accessing the Preferences or Admin tab:

 File "c:\workspace\src\trac\dependencies\genshi\trunk\genshi\template\markup.py", line 284, in  _match  
  test(tail[0], namespaces, ctxt, updateonly=True)

(tail being [])

In one environment, I have the transformer in the stack, in another I don't have it but get the same error with the the just have the _strip_accesskeys. I think it's easy to reproduce so I don't give more info here (just tell me if you nevertheless need it).

comment:31 in reply to: ↑ 30 Changed 16 years ago by cmlenz

Replying to cboos:

Finally, I've looked around at most of the other pages in Trac, and everything seems to be fine except for one last issue, when accessing the Preferences or Admin tab:

That should be fixed in the latest patch.

comment:32 in reply to: ↑ 30 ; follow-up: Changed 16 years ago by cboos

Trac Genshi Peak Memory Usage Time (min - max)
r6992 r815 + match_unbuffered6.diff 31MB 7.9s - 8.5s
r6992* r815 + match_unbuffered6.diff 26MB 8.1s - 9.4s

(*) with <py:match path="body" once="true" buffer="false">

Well, the peak memory usage increased from 25.4MB to 26MB in about 10 requests, and seems to keep increasing very slowly. I didn't have that impression with the match_unbuffered4.diff patch. I'll do some more tests.

... last issue, when accessing the Preferences or Admin tab:

Gone :-)

comment:33 in reply to: ↑ 32 ; follow-up: Changed 16 years ago by cboos

... the peak memory usage increased from 25.4MB to 26MB in about 10 requests, and seems to keep increasing very slowly. I didn't have that impression with the match_unbuffered4.diff patch. I'll do some more tests.

Well, after watching again the behavior with match_unbuffered4.diff, I don't think there's a difference.

Now I also tested on Linux, and there, the test results are good as well.

  1. The /report/1 query still leaks no more than /report/1?format=csv
  2. A timeline of 90 daysback gave some quite surprising results:
    • without the patch, I had a quite constant memory growth of 160Kb per requests (20 requests) and an average time of 5.77s
    • with match_unbuffered6.diff, I initially had a constant memory usage and a timing of 5.67s (so better!). But after the 20 first requests, the memory usage started to grow as well (average of 65Kb/request)

So all in all, this is a big progress, and I think it's unlikely that the remaining leaks I see in Trac are due to Genshi.

I also wonder: if the timing results I got on Linux can be reproduced and the unbuffered matches proves to be a bit faster (or as fast), couldn't the unbuffered mode be the default?

comment:34 in reply to: ↑ 33 Changed 16 years ago by cmlenz

Replying to cboos:

So all in all, this is a big progress, and I think it's unlikely that the remaining leaks I see in Trac are due to Genshi.

Thanks for the testing!

I also wonder: if the timing results I got on Linux can be reproduced and the unbuffered matches proves to be a bit faster (or as fast), couldn't the unbuffered mode be the default?

Well, at least in some cases it causes a slowdown. This is the case for the basic benchmark, for example. Also, you currently need the buffering when a match templates contains more than one invocation of the select() function.

So for now the default will be buffered. Maybe in a future version the need for buffering could be inferred automatically from the template.

comment:35 Changed 16 years ago by cmlenz

  • Resolution set to fixed
  • Status changed from assigned to closed

Okay, latest patch applied in [816] (plus some docs).

Note: See TracTickets for help on using tickets.