Certain calls in python for loop sometimes executing twice per loop iteration

I have run into an extremely bizzare problem in python 2.7.9, involving the following loop:

for key,value in master_lines.iteritems():
   value = parse_line(value)
   if conflict_dict.get(key):
      if len(conflict_dict[key]) > 0:
         conflictcounter += 1

Specifically, the two lines that modify value are being executed twice per loop iteration, causing the final value that is written at the end of this stanza to be malformed and have a bunch of extra information. This isn't happening consistently -- some key/value pairs are processed with no problem at all, and then two or three in a row will be double-processed, and then the next ten will be fine. Note that the writer.writerow(value) call does not appear to be affected by this problem, because the corruption in the output file is at the line level -- I don't ever get multiple copies of the same line.

I should mention that this code is running against a massive dictionary (200,000 + entries), and that the problematic behavior doesn't appear to begin until I reach at least the 100,000th record. The behavior is completely consistent across runs, with only particular lines being affected.

I've tried using all of the approaches I know of to iterate through the dictionary (.iteritems(), .iterkeys() .items(), for key in dict:, etc.) -- and get the same weird results no matter what technique I use.

Any thoughts folks have would be greatly appreciated!

1 answer

  • answered 2017-01-11 14:21 grepe

    A dictionary in python is an unordered set and you do modify your dictionary while you are iterating it. Try this:

    >>> a={'b':[1,2,3],'c':[4,5,6]}
    >>> for k,v in a.iteritems():
    ...    v.insert(0,k)
    >>> print a
    {'c':['c',4,5,6], 'b':['b',1,2,3]}

    Depending on implementation, the order in which the dictionary items are processed does not need to be the same as the order in which you add them to the dictionary. And when you change your set while you iterate it, nobody can guarantee that the order will not change and the element that you just processed may also become the "next element to process".

    It's hard to guess what's going on without seeing your actual dictionary content and going through implementation of Python dict, but the correct way to do what you do would be to make copy of your data. To copy a list item use new_item = old_item[:] like this:

    >>> a={'b':[1,2,3],'c':[4,5,6]}
    >>> a2 = {}
    >>> for k,v in a.iteritems():
    ...    v2 = v[:]
    ...    v2.insert(0,k)
    ...    a2[k] = v2
    >>> print a
    {'c':[4,5,6], 'b':[1,2,3]}
    >>> print a2
    {'c':['c',4,5,6], 'b':['b',1,2,3]}

    Anyway, if you are processing dictionaries with over two-hundred-thousands items, you are probably doing something wrong.