#190 closed defect (fixed)
Memory leak when Python code in templates raises an exception
Reported by: | cboos | Owned by: | cmlenz |
---|---|---|---|
Priority: | critical | Milestone: | 0.5 |
Component: | Expression evaluation | Version: | 0.4.4 |
Keywords: | memory | Cc: |
Description
This is a follow-up to the some of the memory issues we've seen in Trac (#T6614). athomas was able to create a sample program that reproduces the issue:
import gc from itertools import groupby from genshi.template import MarkupTemplate from genshi.template.eval import UndefinedError m = MarkupTemplate(""" <html xmlns:py="http://genshi.edgewall.org/"> <body> <py:for each="i, j in groupby(test, key=lambda i: len(i))"> ${say.hello} </py:for> </body> </html> """, lookup='lenient') while True: test = dict.fromkeys(map(str, range(1000, 100000))) try: print m.generate(test=test, groupby=groupby) except UndefinedError: pass print gc.collect() print len(gc.get_objects())
Running it shows something like:
$ python25 aat2.py 354 12926 0 12949 0 12972 0 12995 0 13018 0 13041 0 13064 0 ...
And it won't stop growing. This is with latest Genshi (r800).
Now, the interesting thing is that when we revert r770, the problem seems to disappear:
$ python25 aat2.py 354 12924 16 12924 16 12924 16 12924 16 12924 16 12924 16 12924 16 12924 16 ...
and memory usage is constant as well.
To me, these results seem to indicate clearly that the following eval seems to hold a hidden reference to the data when an exception is raised.
def evaluate(self, data): """Evaluate the expression against the given data dictionary. :param data: a mapping containing the data to evaluate against :return: the result of the evaluation """ __traceback_hide__ = 'before_and_this' _globals = self._globals() _globals['__data__'] = data return eval(self.code, _globals, {'__data__': data})
(see source:trunk/genshi/template/eval.py@800#L135)
When removing the locals from that last line:
-
genshi/template/eval.py
141 141 __traceback_hide__ = 'before_and_this' 142 142 _globals = self._globals() 143 143 _globals['__data__'] = data 144 return eval(self.code, _globals , {'__data__': data})144 return eval(self.code, _globals) 145 145 146 146 147 147 class Suite(Code):
the problem persists, so that reference seems to be related to the globals (and by the way, this makes me think of that blog entry "How to Add Memory Leaks to Python", for which I never saw an explanation, maybe it's the same underlying bug).
Attachments (11)
Change History (46)
Changed 17 years ago by cboos
comment:1 Changed 17 years ago by cboos
The details of the changes are inside the patch, but as this Trac is not yet running Trac 0.11, I'll inline them here:
Workaround a memory leak happening when an exception is raised while rendering a template.
More specifically, when an exception is raised while evaluating Python code located in expressions or code blocks inside a template, there are some situations where the data dictionary placed in the globals() used for the evaluation will be kept alive. This will have particularly severe consequences if that data dictionary itself references lots of objects, which is not uncommon.
The workaround consists to clear that leaked data (a genshi.template.base.Context object) so that at least the objects referenced by the Context won't be leaked (the Context itself will still be leaked, but it's far less critical). For clearing the Context, we need to clear all the three of frames, pop and push attributes. Note that the problem happens whether frames is a real deque or the list based one, and the fix is the same in both cases as well.
Fixes #190.
I wrote "fixes #190" because I really think that's about all we can do to handle this issue in a reasonable time frame, this doesn't mean that the real reason of the leak is completely elucidated at this point.
comment:2 follow-up: ↓ 4 Changed 17 years ago by cmlenz
- Status changed from new to assigned
Thanks Christian, this would be a viable option if we can't figure out the actual cause, but I'd like to explore the problem more deeply this week.
I've created another test case exprleak.py that is even further stripped down. And some observations:
- The AST transformation and bytecode generation does not seem to be at fault here. Replacing it with a simple compile(source, filename, 'eval') still results in the same leak.
- When doing normal Python scoping (i.e. removing the Context from the example), the leak goes away.
I'll be doing more investigation tomorrow.
comment:3 Changed 17 years ago by cmlenz
Another observation: it doesn't make a difference whether run() is a generator or simply a normal function returning a list.
comment:4 in reply to: ↑ 2 ; follow-up: ↓ 5 Changed 17 years ago by cmlenz
Replying to cmlenz:
- The AST transformation and bytecode generation does not seem to be at fault here. Replacing it with a simple compile(source, filename, 'eval') still results in the same leak.
Correction: it does make a difference if we stop stuffing __data__ into _globals, see noastplay.diff
comment:5 in reply to: ↑ 4 Changed 17 years ago by cboos
Replying to cmlenz:
Correction: it does make a difference if we stop stuffing __data__ into _globals, see noastplay.diff
Yes, that's what the difference seen with or without r770 already hinted at.
I've managed to reduce the test case a bit further, see attachment:exprleak2.py
Changed 17 years ago by cboos
Any exception would do, so no need for expr2 and no need to push a new frame.
comment:6 follow-ups: ↓ 7 ↓ 13 Changed 17 years ago by aronacher
Looks like genshi is not to blame. itertools.groupby has a grouper with a reference to the groupby type but no traverse func. As soon as a circular reference ends up in the groupby (which happens thanks to the func_globals in our lambda) genshi leaks.
So we could argue that it's Python's fault but that makes nobody happy :(
comment:7 in reply to: ↑ 6 ; follow-up: ↓ 12 Changed 17 years ago by aronacher
Oh. And exceptions have nothing to do with it. This is the minimal code which triggers the leak:
import gc from itertools import groupby from genshi.template.base import Context from genshi.template.eval import Expression expr1 = Expression("iter(groupby(test, key=lambda x: len(x)))") for i in xrange(10): ctxt = Context(test=dict.fromkeys(map(str, range(10, 100))), groupby=groupby) for i, j in expr1.evaluate(ctxt): ctxt['j'] = j break gc.collect() print len(gc.get_objects())
comment:8 follow-up: ↓ 10 Changed 17 years ago by cmlenz
I've added a different workaround (no__data__.diff) that I would prefer here. It seems to help (by removing a circular ref?), though I'm not quite sure yet why. Also, still need to reimplement augmented assignment in a slightly different way.
comment:9 Changed 17 years ago by cmlenz
(Note that this last workaround will also require some kludging around for Py2.3 compatibility, something I'm hoping we'll be able to drop in 0.6 or 0.7)
comment:10 in reply to: ↑ 8 Changed 17 years ago by cmlenz
Replying to cmlenz:
I've added a different workaround (no__data__.diff) that I would prefer here. It seems to help (by removing a circular ref?), though I'm not quite sure yet why. Also, still need to reimplement augmented assignment in a slightly different way.
Um, not sure what I was testing, but that workaround doesn't actually help in anyway (the circular ref is still there through the closure, so it didn't really make sense to me anyway).
comment:11 Changed 17 years ago by aronacher
For a more real world example I propose cleaning up the context completely after rendering. That could break up that circular reference in some cases.
comment:12 in reply to: ↑ 7 Changed 17 years ago by cboos
Replying to aronacher:
More minimizing (no iter, no Context):
import gc from itertools import groupby from genshi.template.eval import Expression expr1 = Expression("groupby(test, key=lambda x: 1)") for i in xrange(10): data = dict(test=dict.fromkeys(map(str, range(10, 100))), groupby=groupby) for i, j in expr1.evaluate(data): # for i, j in groupby(data['test'], key=lambda x: 1): data['j'] = j break gc.collect() print len(gc.get_objects()
If that would be only a problem with itertools.groupby and the grouper, the direct Python code (i.e. using the commented out line in the above) would leak as well, no? And it doesn't.
comment:13 in reply to: ↑ 6 Changed 17 years ago by cboos
Replying to aronacher:
... As soon as a circular reference ends up in the groupby (which happens thanks to the func_globals in our lambda) genshi leaks.
Ah, I see ;-)
comment:14 Changed 17 years ago by aronacher
Here a python only example of the leak: http://paste.pocoo.org/show/31350/
A more real-world example is left as an exercise for the reader.
comment:15 Changed 17 years ago by jruigrok
Raymond Hettinger confirmed it as a genuine itertools.groupby() bug. I have now logged an issue at http://bugs.python.org/issue2246 for this since Both Armin and Christopher were unable to find a patch that felt comfortable with. So upwards escalation seemed best.
comment:16 Changed 17 years ago by jruigrok
As Raymond lets me know:
"It's fixed and will be in the next release for 2.5, 2.6, and 3.0."
So how do we want to deal with this ticket? Close it because it is fixed upstream? Christian put a workaround into trac's compat.py for groupby().
comment:17 follow-up: ↓ 18 Changed 17 years ago by cboos
Here's another one that might be relevant: http://bugs.python.org/issue2231
Would be interesting to see the memory usage without itertools.chain.
comment:18 in reply to: ↑ 17 Changed 17 years ago by cboos
Replying to cboos:
... would be interesting to see the memory usage without itertools.chain.
It didn't make a difference.
comment:19 Changed 17 years ago by cmlenz
Okay, working with Christian's test environment and running the the excessive report {11}, I've been able to track down where the bulk of the memory usage come from: the buffering of matched streams in the py:match directive.
I'm attaching a patch that adds a buffer="false" optimization hint to py:match. When this is added to match directive on the <body> tag in the Trac layout template, memory usage on the report drops from ~140M to ~30M. So far so good. BUT…
There's a big problem with this right now though: it fails on nested match templates. I could only test this by commenting out the inclusion of theme.html, which also defines a match template for <body>. I'm still looking into a proper fix for this, and it looks like that might be rather involved.
comment:20 follow-up: ↓ 21 Changed 16 years ago by cboos
I tried the patch with very good results:
- with [trac 6692], buffer="true" (i.e. same as without the patch), the memory usage after rendering a TracBrowser page with 1000 entries is 25MB, with peak usage of 31MB. After a while (18 repeated requests), the memory usage reaches 29MB (+/- 1MB) and seems to stabilize there.
- with [trac 6692], buffer="false", the memory usage after rendering is 20.7MB, with peak usage of 25.3MB. It then stabilizes at 23.6 (+/- 0.5MB).
- for reference, with [trac 6696] which adds yet another py:match on <body>, the memory after rendering is 25.3MB as in 1., but the peak memory usage is 36.3MB. It stabilizes also near 30 MB, though.
So what's happening with content = list(self._include(chain(..., how come this list doesn't get garbage collected?
comment:21 in reply to: ↑ 20 ; follow-up: ↓ 22 Changed 16 years ago by athomas
comment:22 in reply to: ↑ 21 ; follow-up: ↓ 23 Changed 16 years ago by cboos
Replying to athomas:
Replying to cboos:
So what's happening with content = list(self._include(chain(..., how come this list doesn't get garbage collected?
Because of this bug that we found :\.
I can't see how that bug could explain why attachment:match_unbuffered.diff makes a difference: when the list() of events is created (without the patch), that will not generate more unicode objects than when those events are not buffered (with the patch). The fact that this list and the events it refers to are staying around seems to be yet another issue.
comment:23 in reply to: ↑ 22 ; follow-up: ↓ 24 Changed 16 years ago by cboos
Amending comment:22:
Replying to athomas:
Replying to cboos:
So what's happening with content = list(self._include(chain(..., how come this list doesn't get garbage collected?
Because of this bug that we found :\.
I can't see how that bug could explain why attachment:match_unbuffered.diff makes a difference: when the list() of events is created (without the patch), that will not generate more unicode objects than when those events are not buffered (with the patch).
Sorry, I can now see "how" this happens, as I've verified that even when I delete that content list immediately after its construction, it doesn't help...
It's indeed the fact that we're creating such a big list in the first place, with all those unicode strings present at the same time that explains the differences with and without the patch.
The fact that this list and the events it refers to are staying around seems to be yet another issue.
Ok, the list doesn't stay around, and the http://bugs.python.org/issue2321 simply makes it looks like it stays around.
Nevertheless, even with issue2321 fixed (Python 2.5.3) we will see high peak memory usage, even more so with [T6696], as it looks like there's a second such big list that gets created in this case. I hope that not every IStreamFilter will need its own additional copy of the content, otherwise we're really in bad shape...
For reference, summarizing the peak memory usage I found in comment:20:
Trac | Genshi | Peak Memory Usage |
r6992* | r809 + match_unbuffered.diff | 25MB |
r6992 | r809 | 30MB |
r6996 | r809 | 36MB |
- * Trac's layout.html modified to contain <py:match path="body" buffer="false" ...
comment:24 in reply to: ↑ 23 Changed 16 years ago by athomas
Replying to cboos:
Nevertheless, even with issue2321 fixed (Python 2.5.3) we will see high peak memory usage, even more so with [T6696], as it looks like there's a second such big list that gets created in this case. I hope that not every IStreamFilter will need its own additional copy of the content, otherwise we're really in bad shape...
Chris is working on fixing the problem in Genshi, but apparently it's a bit involved and may take a little while.
!IStreamFilter's (eg. Transformer) generally strive to keep the minimum amount of buffered data possible, so this shouldn't be a particular problem.
comment:25 follow-up: ↓ 27 Changed 16 years ago by cmlenz
Okay, here's a new patch that implements the unbuffered match template processing.
This one's looking pretty good, same drastic memory use improvements as before, only slightly slower AFAICT. It works with nested templates, i.e. disabling theme.html is no longer needed to test.
There's still one unit test failing (and I haven't added any new ones yet). The failing test correctly points out that the change breaks using Python code blocks inside match templates, but that should be fixable with some more refactoring.
comment:26 Changed 16 years ago by cmlenz
Oh, and this patch also includes the patch to #186, just because I'm lazy ;)
comment:27 in reply to: ↑ 25 Changed 16 years ago by cboos
That's really great, I've got very good results testing memory hungry /report pages in Trac, with match_unbuffered3.diff.
However, when I tried to repeat the browser test of comment:20, I hit an internal error:
... File "C:\Workspace\src\trac\repos\trunk\trac\versioncontrol\templates\dir_entries.html", line 25, in <Suite u"chgset_context = context('changeset', change.rev)"> ... UndefinedError: "change" not defined
But change is defined in a surrounding <py:with>.
As I wasn't sure if by "Python code blocks" you meant <?python ... ?> PI or any Suite, I thought I should mention that issue.
comment:28 Changed 16 years ago by cmlenz
I've attached another patch that should fix the previous errors. I'll try to clean it up a little bit more and then check in.
Please let me know if you find any additional problems. Thanks!
comment:29 Changed 16 years ago by cmlenz
Okay, one more patch for testing. As far as I can tell, this is a good candidate for checkin.
comment:30 follow-ups: ↓ 31 ↓ 32 Changed 16 years ago by cboos
Hm, I see I took too much time to test :-) Anyway, here are my intermediate results for match_unbuffered4.diff, and I'll complement them soon with results for the later patches.
The previous issue (comment:27) is indeed gone, so I've been able to test again the browser view with 1000 files:
Trac | Genshi | Peak Memory Usage | Time (min - max) |
r6992 | r809 + match_unbuffered.diff | 31MB | 8s - 9s |
r6992* | r809 + match_unbuffered.diff | 25MB | 8.2s |
r6992 | r815 + match_unbuffered4.diff | 31.3MB | 7.2s - 8.5s |
r6992* | r815 + match_unbuffered4.diff | 25.5MB | 8.2s - 9s |
r6996 | r815 + match_unbuffered4.diff | 36.3MB | 9.3s - 10s |
r6996* | r815 + match_unbuffered4.diff | 26MB | 9.3s - 11s |
(*) with <py:match path="body" once="true" buffer="false">
Sorry, I have a high variability in the timings, perhaps this is because tests are done on Windows, so those numbers just give an idea.
I also tested with 10000 entries, and that scales pretty well as the memory peaks at only 64MB. OTOH, Trac seems to take forever for rendering that 9.3MB page, although the rendering time in itself was only 73s.
Finally, I've looked around at most of the other pages in Trac, and everything seems to be fine except for one last issue, when accessing the Preferences or Admin tab:
File "c:\workspace\src\trac\dependencies\genshi\trunk\genshi\template\markup.py", line 284, in _match test(tail[0], namespaces, ctxt, updateonly=True)
(tail being [])
In one environment, I have the transformer in the stack, in another I don't have it but get the same error with the the just have the _strip_accesskeys. I think it's easy to reproduce so I don't give more info here (just tell me if you nevertheless need it).
comment:31 in reply to: ↑ 30 Changed 16 years ago by cmlenz
Replying to cboos:
Finally, I've looked around at most of the other pages in Trac, and everything seems to be fine except for one last issue, when accessing the Preferences or Admin tab:
That should be fixed in the latest patch.
comment:32 in reply to: ↑ 30 ; follow-up: ↓ 33 Changed 16 years ago by cboos
Trac | Genshi | Peak Memory Usage | Time (min - max) |
r6992 | r815 + match_unbuffered6.diff | 31MB | 7.9s - 8.5s |
r6992* | r815 + match_unbuffered6.diff | 26MB | 8.1s - 9.4s |
(*) with <py:match path="body" once="true" buffer="false">
Well, the peak memory usage increased from 25.4MB to 26MB in about 10 requests, and seems to keep increasing very slowly. I didn't have that impression with the match_unbuffered4.diff patch. I'll do some more tests.
... last issue, when accessing the Preferences or Admin tab:
Gone :-)
comment:33 in reply to: ↑ 32 ; follow-up: ↓ 34 Changed 16 years ago by cboos
... the peak memory usage increased from 25.4MB to 26MB in about 10 requests, and seems to keep increasing very slowly. I didn't have that impression with the match_unbuffered4.diff patch. I'll do some more tests.
Well, after watching again the behavior with match_unbuffered4.diff, I don't think there's a difference.
Now I also tested on Linux, and there, the test results are good as well.
- The /report/1 query still leaks no more than /report/1?format=csv
- A timeline of 90 daysback gave some quite surprising results:
- without the patch, I had a quite constant memory growth of 160Kb per requests (20 requests) and an average time of 5.77s
- with match_unbuffered6.diff, I initially had a constant memory usage and a timing of 5.67s (so better!). But after the 20 first requests, the memory usage started to grow as well (average of 65Kb/request)
So all in all, this is a big progress, and I think it's unlikely that the remaining leaks I see in Trac are due to Genshi.
I also wonder: if the timing results I got on Linux can be reproduced and the unbuffered matches proves to be a bit faster (or as fast), couldn't the unbuffered mode be the default?
comment:34 in reply to: ↑ 33 Changed 16 years ago by cmlenz
Replying to cboos:
So all in all, this is a big progress, and I think it's unlikely that the remaining leaks I see in Trac are due to Genshi.
Thanks for the testing!
I also wonder: if the timing results I got on Linux can be reproduced and the unbuffered matches proves to be a bit faster (or as fast), couldn't the unbuffered mode be the default?
Well, at least in some cases it causes a slowdown. This is the case for the basic benchmark, for example. Also, you currently need the buffering when a match templates contains more than one invocation of the select() function.
So for now the default will be buffered. Maybe in a future version the need for buffering could be inferred automatically from the template.
comment:35 Changed 16 years ago by cmlenz
- Resolution set to fixed
- Status changed from assigned to closed
Okay, latest patch applied in [816] (plus some docs).
Workaround the bug by clearing the Context