Another Reason To Love Decorators: Pickled Functions
The more I work with python the cooler it becomes, especially the more I let my mind think of way out there things to do with the tools python provides. In my work I frequently find my self writing code in this order:
- Data parsing
- Reformatting
- Complex transformation A of the data
- Complex transformation B of transformation A
- Use transformation B to do another complex task
So my script starts empty, then my main method calls my parsing and reformatting code and saves out the results. Then I clear the main method and start working on steps 3-5 which generally comprise the rest of the main method. Now this means when ever i run my code it re-runs steps 3-5 everytime, this is okay if they are fast, but a pain if you make a change in step 4 and have to wait for step 3 to finish before you can move onto step 5. And worse if you are making changes to step 5 and have to keep waiting for steps 3 and 4 to re run with every change.
Now you could say that I should be saving the results of steps 3 and 4 and just loading them from the disk which speeds things up. But now if I change something in step 4 I have to manually rerun and resave the results. That gets annoying, so does rewriting all the code to keep saving things out to a file every project I work on. So as a solution, enter the "pickled" decorator:
def main():
data = parseAndLoadData()
transformed_data = doComplexTransformation(data)
transformed_data = doAnotherComplexTransformation(data)
doCoolThing(transformed_data)
@pickled
def parseAndLoadData():
# parse and load the data
return data
@pickled
def doComplexTransformation(data):
# do some nasty complex transformation that takes a long time
return trasnformation
@pickled(depends=doComplexTransformation)
def doAnotherComplexTransformation(data):
# do some other nasty complex transformation that takes a long time
return trasnformation
Now the first time each function "parseAndLoadData" and "doComplexTransformation" are called they get run like normal and the result is saved out to a pickle file. The second call to them when we run the script again after changing "doCoolThing" the results will be loaded from the pickle file. Now here is the cool part, If you go back and change the code of "doComplexTransformation" the function will be called again and the old pickled result replaced with the new result! And the coolness doesn't stop there, since "doAnotherComplexTransformation" depends on "doComplexTransformation" and changes to "doComplexTransformation" will cause "doAnotherComplexTransformation" to be rerun as well.
Here is the magic:
from decorator import decorator
import cPickle as pickle
import os
import types
def pickled(*args, **kw):
"""Allows the results of a function to be pickeled and reloaded in order to
save computation time. If the results of a function depend on another function
then this decorator can be applied using "@pickled(depends=function_list)"
Since this code depends on the decorator module, in order for dependencies
to work any custom decorators must use the decorator module as well.
"""
simple_decorator = False
# get the functions this depends on and make sure its a tuple
depends = kw.get('depends', None)
if depends is not None and type(depends) is not types.TupleType:
depends = (depends,)
# since we may be working on decorated functions we need access to the
# undecorated function in order to create the correct hash of the code
depends = map(lambda f: getattr(f, 'undecorated', f), depends)
if len(args) > 0 and type(args[0]) is types.FunctionType:
simple_decorator = True
def _pickled(func, *args, **kw):
# convert the dependencies into a list of hashes
depends_str = ''
if depends is not None:
depends_str = '*'.join((str(hash(d.func_code)) for d in depends))
fstart = '%s_func=%s__' % (os.path.basename(__file__), func.__name__)
fname = '%shash=%d__deps=%s.pkl' % (fstart, hash(func.func_code), depends_str)
# create a place to save the functions
if not os.path.exists('pickled_functions'):
os.mkdir('pickled_functions')
# look through the files in the current directory
FILE = None
for f in os.listdir('pickled_functions'):
# if we find a file matching the pickle name then open it
# if just the start of the file's name matches its an old version
# so remove it to keep the directory clean
if f == fname:
FILE = open('pickled_functions/%s' % fname, 'rb')
elif f.startswith(fstart) and f.endswith('.pkl'):
print 'removing:', f
os.remove('pickled_functions/%s' % f)
# if we found a matching file load the pickle
# other wise call the function and save out the results
if FILE is not None:
print 'loading:', fname
result = pickle.load(FILE)
else:
result = func(*args, **kw)
print 'writing:', fname
pickle.dump(result, open('pickled_functions/%s' % fname, 'wb'), pickle.HIGHEST_PROTOCOL)
return result
if simple_decorator:
# called as a plain decorator "@pickled"
return decorator(_pickled, args[0])
else:
# called as a decorator with depends "@pickled(depends=someFunc)"
return decorator(_pickled)
Post new comment