My Ramblings Code, music and travel

Remember kids dict.copy() is a shallow copy

So imagine you need to generate a few thousand models for creating some default data. Now you the data you are generating is the pretty much all the same for all these models and you decide to keep the common default data in a dictionary:

just_some_data = {
    'nothing': 'to',
    'see': 'here',
    'we': {
        'need': [],
        'to': go.deeper
    }
}

Simple enough right ? Well, you realise that you need to modify the list in the nested dictionary by having a single default item unique to each model being generated. No big deal, you just make a copy of the dictionary and modify the required data structure.

just_some_data = {
    'nothing': 'to',
    'see': 'here',
    'we': {
        'need': [],
        'to': go.deeper
    }
}

# Loop to generate models
for in range(a_few_thousand):
    copied_data = just_some_data.copy() # This will totally work
    copied_data['we']['need'].append(random_cool_data)
    generate_model_with_default_data(copied_data)

You go thorugh it once thinking all is well, and run the script. You notice that each model takes longer to create than the last one. Wondering what’s wrong you stop the script and have a look at the models being created.

To your horror you realise that for some inexplicable reason the copy() did not work! The list in the nested dictionary of the last model created has all the unique values of all the previous models created.

So off you go frantically searching google and StackOverflow for answers. And then you are hit by the fact that dict.copy() is a shallow copy. Therefore when the copy() was called on the dictionary only the top level keys and values were copied. The nested structures were still references to the structures in the original dictionary. So each time you were appending to the inner list, the object was being appended to the origin list.

After a bit of StackOverflow prospecting, you learn about the copy module in python which has the method deepcopy which as the name suggests copies the entire object over, including nested structures. But, keep in mind, you probably shouldn’t be copying highly nested structures since it would have a performance impact.

So after fixing the script and running it for real this time, you learn the valuable lesson of python and it’s data structures.

Disclaimer This happened to a friend of mine, definitely not me… ha ha.