Pretty common question when dealing with lists (arrays, vectors or whatever is used in you favorite programming language) is to find duplicates and list the as a hashmap.
Example (Python). From the list like this
lst = [5,5,4,4,4,3]
get a dict
dupls = {5 => [0,1], 4 => [2,3,4]}
The question was raised on the SO here and got some standard solutions which basically do the job with O(n) time.
I wanted to suggest another solution, which is slower — O(n*log(n)) but it allows generating an interesting structure.
The solution for original problem is this:
def dupl_rbespal(c): alreadyAdded = False dupl_c = dict() sorted_ind_c = sorted(range(len(c)), key=lambda x: c[x]) # sort incoming list but save the indexes of sorted items for i in xrange(len(c) - 1): # loop over indexes of sorted items if c[sorted_ind_c[i]] == c[sorted_ind_c[i+1]]: # if two consecutive indexes point to the same value, add it to the duplicates if not alreadyAdded: dupl_c[ c[sorted_ind_c[i]] ] = [sorted_ind_c[i], sorted_ind_c[i+1]] alreadyAdded = True else: dupl_c[ c[sorted_ind_c[i]] ].append( sorted_ind_c[i+1] ) else: alreadyAdded = False return dupl_c
Now what I want to get is a following structure:
From the list like this
lst = [5,5,4,4,4,3]
get a dict
dupls = {0 => 1, 2 => 3, 3 => 4 }
That is I want to get a dict where key points to the next index with the same value.
With a slight change of the above algorithm we get:
def nextDuplicates(c): dupl_c = dict() sorted_ind_c = sorted(range(len(c)), key=lambda x: c[x]) for i in xrange(len(c) - 1): if c[sorted_ind_c[i]] == c[sorted_ind_c[i+1]]: dupl_c[ sorted_ind_c[i] ] = sorted_ind_c[i+1] return dupl_c
So it’s even shorter than the original one.