Adds get_new_ids and at_addrs functions#36
Conversation
mgedmin
left a comment
There was a problem hiding this comment.
Could you please revert the changes to generated images (and the one .dot file) in docs/? These clutter the diff and make it difficult to review the PR.
I also did not notice any tests for the new functions? Currently objgraph has 100% test coverage and I do not want to lose that.
mgedmin
left a comment
There was a problem hiding this comment.
Now that I finally understand what this is for, I quite like the idea. Thanks for sharing it!
The PR will need some massaging to be mergeable. I'll try to help.
objgraph.py
Outdated
| __version__ = '3.3.1.dev0' | ||
| __date__ = '2017-12-28' | ||
| __version__ = '3.3.2.dev0' | ||
| __date__ = '2018-01-08' |
There was a problem hiding this comment.
New function are new features, so it'll be 3.4.0.dev0, according to SemVer.
docs/objgraph.txt
Outdated
|
|
||
| .. autofunction:: show_growth([limit=10, peak_stats={}, shortnames=True, file=sys.stdout]) | ||
|
|
||
| .. autofunction:: get_ids([skip_update=False, limit=10, sortby='deltas']) |
There was a problem hiding this comment.
Oh my, I don't remember how my own documentation works! I totally forgot to include growth() and perhaps a few other new functions in here!
objgraph.py
Outdated
| weakref 3807 3864 +57 +57 | ||
| dict 6892 6947 +73 +55 | ||
| frame 34 70 +53 +36 | ||
| ... |
There was a problem hiding this comment.
Ah, you have some tests! Sorry I missed them on the first glance.
There was a problem hiding this comment.
API-wise I'm not very happy about functions that both print and return values.
Despite having read the description, I don't quite get how this one differs from growth()/show_growth().
Also, the name (get_ids) is too generic, and doesn't describe what the function does.
(I'm afraid my criticism is not very constructive at this point. I'll have to think about this more.)
There was a problem hiding this comment.
Ah, so this function also observes (or tries to observe) the object churn! Interesting.
Given how object IDs can be reused by Python, are the numbers accurate?
There was a problem hiding this comment.
Suggestion for the function name: get_new_ids(). It could also return just the NEW_IDs, for simplicity's sake.
I'm thinking the three parameters for tracking state are a bit unwieldy (e.g. you cannot do OLD_IDS = CURRENT_IDS; CURRENT_IDS = defaultdict(set)), so how about one _state={} parameter that you initialize like this:
def get_new_ids(..., _state={}):
...
_state['old'] = old = _state.get('current', defaultdict(set))
_state['current'] = current = defaultdict(set)
_state['new'] = new = defaultdict(set)
...
return newThere was a problem hiding this comment.
Given how object IDs can be reused by Python, are the numbers accurate?
I'm convinced now that given the purpose of this function that doesn't matter. Objects that were freed and reallocated at the same memory address (while maintaining the same type name) aren't contributing to memory leaks.
There was a problem hiding this comment.
If we change the code to:
_state['old'] = old = _state.get('current', defaultdict(set))
Then the id of _state['old'] will change to the id of _state['current'].
This would also require a new dict be created every time for _state['current'] with new sets under every class_name key.
I want to minimize the creation of new ids to store this info, so I prefer the old_dict[class_name].clear(), old_dict[class_name].update(current_dict[class_name])
current_dict[class_name].clear(), current_dict[class_name].add(new_id)
pattern that way we re-use dictionaries and sets that are already reserved in memory.
objgraph.py
Outdated
|
|
||
| if ``skip_update`` is True, the sets of [OLD_IDS, CURRENT_IDS, NEW_IDS] | ||
| will be returned from when the function was last run without examining the | ||
| objects currently iin memory. |
objgraph.py
Outdated
| (width, 'Type', 'Old_ids', 'Current_ids', 'New_ids', 'Count_Deltas')) | ||
| print('='*(width+13*4)) | ||
| for row in rows: | ||
| row_class, old, current, new, delta = row |
There was a problem hiding this comment.
You can do for row_class, old, current, new, delta in rows: directly.
objgraph.py
Outdated
| for k, v in CURRENT_IDS.items(): | ||
| OLD_IDS[k].update(v) | ||
| for k in CURRENT_IDS.keys(): | ||
| CURRENT_IDS[k].clear() |
There was a problem hiding this comment.
OLD_IDS.clear()
OLD_IDS.update(CURRENT_IDS)
CURRENT_IDS.clear()might be simpler than the three loops.
objgraph.py
Outdated
| for o in objects: | ||
| CURRENT_IDS[type(o).__name__].add(id(o)) | ||
| for k in NEW_IDS.keys(): | ||
| NEW_IDS[k].clear() |
objgraph.py
Outdated
| for k in NEW_IDS.keys(): | ||
| NEW_IDS[k].clear() | ||
| rows = [] | ||
| for class_name in CURRENT_IDS.keys(): |
There was a problem hiding this comment.
for class_name in CURRENT_IDS: would skip generating an unnecessary intermediate list object on Python 2.
objgraph.py
Outdated
| rows = [] | ||
| for class_name in CURRENT_IDS.keys(): | ||
| new_ids_set = CURRENT_IDS[class_name] - OLD_IDS[class_name] | ||
| NEW_IDS[class_name].update(new_ids_set) |
There was a problem hiding this comment.
You're clearing NEW_IDs before this loop, and class names will not repeat, so you can do NEW_IDS[...] = new_ids_set and skip the update.
objgraph.py
Outdated
| >>> [old, current, new_ids] = get_ids() | ||
| new_lists = at_addrs(new_ids['list']) | ||
| for new_list in new_lists: | ||
| call show_chain on each new_list object |
There was a problem hiding this comment.
Doctests need a >>> in front of every statement, and a ... in front of every continuation line of a compound statement such as a for loop.
Doctests are executed by the test runner, and so cannot contain pseudocode.
Any other updates? |
|
@mgedmin Any other updates? |
mgedmin
left a comment
There was a problem hiding this comment.
Sorry! I'm trying to find the time to look at the code again.
objgraph.py
Outdated
| # remove the key from our dicts if we don't have any old or | ||
| # curent class_name objects | ||
| del old_ids[class_name] | ||
| del current_ids[class_name] |
There was a problem hiding this comment.
You're modifying a dictionary while iterating over it. This can cause problems (RuntimeError: dictionary changed size during iteration).
| rows.sort(key=lambda row: row[index_by_sortby[sortby]], reverse=True) | ||
| if limit: | ||
| rows = rows[:limit] | ||
| width = max(len(row[0]) for row in rows) |
There was a problem hiding this comment.
This could raise ValueError: max() arg is an empty sequence if rows is empty (which is probably unlikely, but could happen if e.g. a user accidentally passes limit=0).
…eption if bad limit is passed in
|
mgedmin
left a comment
There was a problem hiding this comment.
Looks good to me. I've a couple of small suggestions, and I'll wait a couple of days before merging so you could implement them if you wish to do so.
Thank you for your patience!
objgraph.py
Outdated
| for class_name in current_ids: | ||
| current_ids[class_name].clear() | ||
| for o in objects: | ||
| class_name = type(o).__name__ |
There was a problem hiding this comment.
I still think this should support long names, but that can be added later.
There was a problem hiding this comment.
(TBH it doesn't handle old-style classes on Python 2 well either. _short_typename is there for a reason.)
objgraph.py
Outdated
| new_ids[class_name].update(new_ids_set) | ||
| num_new = len(new_ids_set) | ||
| num_delta = num_current - num_old | ||
| row = [class_name, num_old, num_current, num_new, num_delta] |
objgraph.py
Outdated
| del current_ids[key] | ||
| del new_ids[key] | ||
| index_by_sortby = {'old': 1, 'current': 2, 'new': 3, 'deltas': 4} | ||
| rows.sort(key=lambda row: row[index_by_sortby[sortby]], reverse=True) |
There was a problem hiding this comment.
You could simplify this to rows.sort(key=operator.itemgetter(index_by_sortby[sortby]), reverse=True).
| def at_addrs(address_set): | ||
| """Returns a list of objects for a given set of memory addresses. | ||
|
|
||
| The reverse of [id(obj1), id(obj2), ...] |
There was a problem hiding this comment.
Please mention that the objects are returned in an arbitrary order.
…and at_addrs docstring tweak
|
@mgedmin I made the updates that you suggested. Can you merge them in and post the new version on pypi? |
|
Thank you! |
|
objgraph 3.4.0 is out on PyPI. |
|
Thanks! |
In conducting a memory leak investigation, I found it helpful to look at my new objects created between calls to show_growth. So I made two new functions to help users out here.
get_new_ids is like show_growth, except that it stores dictionaries which hold sets of object address ids.
So now one can grab a set of all the new 'list' ids.
at_addrs: Once one has the new 'list' ids, one can call at_addrs to go from a set of ids to the objects at those addresses. One can then make graphs of the back references on those new objects.