# Find indexes of duplicate items in python

Pretty common question when dealing with lists (arrays, vectors or whatever is used in you favorite programming language) is to find duplicates and list the as a hashmap.
Example (Python). From the list like this

`lst = [5,5,4,4,4,3]`

get a dict

`dupls = {5 => [0,1], 4 => [2,3,4]} `

The question was raised on the SO here and got some standard solutions which basically do the job with O(n) time.

I wanted to suggest another solution, which is slower — O(n*log(n)) but it allows generating an interesting structure.
The solution for original problem is this:

```def dupl_rbespal(c):
dupl_c = dict()
sorted_ind_c = sorted(range(len(c)), key=lambda x: c[x]) # sort incoming list but save the indexes of sorted items
for i in xrange(len(c) - 1): # loop over indexes of sorted items
if c[sorted_ind_c[i]] == c[sorted_ind_c[i+1]]: # if two consecutive indexes point to the same value, add it to the duplicates
dupl_c[ c[sorted_ind_c[i]] ] = [sorted_ind_c[i], sorted_ind_c[i+1]]
else:
dupl_c[ c[sorted_ind_c[i]] ].append( sorted_ind_c[i+1] )
else:
return dupl_c```

Now what I want to get is a following structure:

From the list like this

`lst = [5,5,4,4,4,3]`

get a dict

`dupls = {0 => 1, 2 => 3, 3 => 4 } `

That is I want to get a dict where key points to the next index with the same value.
With a slight change of the above algorithm we get:

```def nextDuplicates(c):
dupl_c = dict()
sorted_ind_c = sorted(range(len(c)), key=lambda x: c[x])
for i in xrange(len(c) - 1):
if c[sorted_ind_c[i]] == c[sorted_ind_c[i+1]]:
dupl_c[ sorted_ind_c[i] ] = sorted_ind_c[i+1]
return dupl_c```

So it’s even shorter than the original one.

This site uses Akismet to reduce spam. Learn how your comment data is processed.