Saturday 23 July 2011

Removing Duplicates

OK the next task is to remove successive duplicates from a sequence.  There is a Clojure function distinct that nearly does this, but it removes all duplicates, not just successive ones:-

user=> (distinct "Mississippi")
(\M \i \s \p)


My first thought is that I can build this from other existing functions.  First, partition the sequence into pairs of elements, that is to say elements 1 and 2, then elements 2 and 3, then elements 3 and 4 etc.:

user=> (partition 2 1 "Mississippi")
((\M \i) (\i \s) (\s \s) (\s \i) (\i \s) (\s \s) (\s \i) (\i \p) (\p \p) (\p \i))


Now filter out the ones where the elements are the same - ie (first %) is not equal to (second %) - because these are the ones where there is a duplicated element:

user=> (filter #(not (= (first %) (second %))) *1)
((\M \i) (\i \s) (\s \i) (\i \s) (\s \i) (\i \p) (\p \i))


(In the Clojure REPL, *1 returns the result of the previous evaluation)

And now put the remaining pairs back into a sequence by getting the first element of each pair with a map:

user=> (map first *1)
(\M \i \s \i \s \i \p)


Nope - of course this misses the final element of the original list, which was never the first element of one of the pairs.

Incidentally we can turn this sequence back into its original string form as follows:

user=> (apply str *1)
"Misisip"


Anyway, perhaps this will require a purpose built function.  Here is the function that eliminates successive duplicates:

(defn deduplicate [sq]
  (if (empty? sq)
  '()
  (let [x (first sq)
        xs (rest sq)]
    (cons x (deduplicate (drop-while #(= x %) xs))))))


This is a basic pattern to transfer one sequence to another via a recursive function: the part that does the work is the drop-while which removes any elements from the rest of the list that match the element being processed.

To drop this into an existing line of code you would wrap the definition in a (do....) construction and let the value you've just defined be returned as the value of the expression.

However, the rules on 4Clojure say that you can't define any new functions.  Therefore it has to be an anonymous function.  How to write an anonymous recursive function? Hmm.  Here I'm going to use the loop/recur construction, so the function is iterative rather than recursive.  It contains a GOTO rather than a CALL to do the loop.  I pass two parameters, one the source sequence that the elements are coming from and the other a destination sequence which is where they will be accumulated.  Yes anyway it looks like this:

(fn [sq]
  (loop [source sq destination '()]
    (if (empty? source)
        (reverse destination)
        (let [x (first source]
              xs (rest source)]
          (recur
            (drop-while #(= x %) xs)
            (cons head destination))))))


Using a recur you build the destination list the wrong way round so I return the reverse thereof.

No comments:

Post a Comment