Expanding Reducers
James Uther
2013-07-31
When playing with a new bit of language, it can be helpful to restrict the problem space to an old, well understood algorithm. For me at least, learning one thing at a time is easier! For this post, It'll be prime sieves, and I'll be exploring clojure reducers.
A quick recap, the sieve of eratosthenes is a not-maximally-non-optimal way of finding primes. It's usually expressed as follows:
To find primes below n: generate a list of n integers greater than 1
while the list is not empty:
take the head of the list:
add it to the output
remove all numbers evenly divisible by it from the list
In clojure, something like:
(defn sieve
([n] (sieve [] (range 2 n)))
([primes xs]
(if-let [prime (first xs)]
(recur (conj primes prime)
(remove #(zero? (mod % prime)) xs))
primes)))
(sieve 10)
;= [2 3 5 7]
Which is fine, but I'd like it lazy so I only pay for what I use, and I can use as much as I'm willing to pay for. Let's look at lazy sequences. Luckily for us, there is an example of exactly this on the lazy-seq documentation, which we slightly modify like so:
(defn lazy-sieve [s]
(cons (first s)
(lazy-seq
(lazy-sieve (remove #(zero? (mod % (first s))) (rest s))))))
(defn primes []
(lazy-seq (lazy-sieve (iterate inc 2))))
(take 5 (primes))
;= (2 3 5 7)
So now we have a nice generic source of primes that grows only as we take more. But is there another way?
A few months ago Rich Hickey introduced reducers. By turning the concept of 'reducing' inside out the new framework allows a parallel reduce (fold) in some circumstances. Which doesn't apply here. But let's see if we can build a different form of sieve using the new framework. First a quick overview (cribbing from the original blog post):
Collections are now _reducible_, in that they implement a \(reduce\) protocol. \(Filter\), \(map\), etc are implemented as functions that can be applied by a reducible to itself to return another reducible, but lazily, and possibly in parallel. So in the example below we have a reducible (a vector), that maps inc to itself to return a reducible that is then wrapped with a filter on \(even?\) which returns a further reducible, that reduce then collects with \(+\).
(require '[clojure.core.reducers :as r])
; We'll be referring to r here and there – just remember it's the
clojure.core.reducers namespace
(reduce + (r/filter even? (r/map inc [1 1 1 2])))
;= 6
These are composable, so we can build 'recipes'.
;;red is a reducer awaiting a collection
(def red (comp (r/filter even?) (r/map inc)))
(reduce + (red [1 1 1 2]))
;= 6
\(into\) uses reduce internally, so we can use it to build collections instead of reducing:
(into [] (r/filter even? (r/map inc [1 1 1 2])))
;= [2 2 2]
So here's the core of 'reducer', which > Given a reducible collection, and a transformation function \(xf\), returns a reducible collection, where any supplied reducing \(fn\) will be transformed by \(xf\). \(xf\) is a function of reducing \(fn\) to reducing \(fn\).
(defn reducer ([coll xf]
(reify clojure.core.protocols/CollReduce
(coll-reduce [_ f1 init] (clojure.core.protocols/coll-reduce coll (xf f1)
init))))
And we can then use that to implement mapping as so:
(defn mapping [f]
(fn [f1] (fn [result input] (f1 result (f input)))))
(defn rmap [f coll] (reducer coll (mapping f)))
(reduce + 0 (rmap inc [1 2 3 4]))
;= 14
Fine. So what about sieves? One thought is we could build up a list of composed filters, built as new primes are found (see the \(lazy-seq\) example above). But there's no obvious place to do the building, as applying the reducing functions is left to the reducible implementation. Another possibility is to introduce a new type of reducing function, the 'progressive-filter', which keeps track of past finds and can filter against them.
(defn prog-filter [f]
(let [flt (atom [])]
(fn [f1] (fn [result input]
(if (not-any? #(f input %) @flt)
(do (swap! flt conj input)
(f1 result input))
result)))))
(defn progressive-filter [f coll]
(reducer coll (prog-filter f)))
And we then reduce with a filtering function that is a function of the current candidate and one of the list of found primes (see the #(f input %) bit above)
(into [] (progressive-filter #(zero? (mod %1 %2)) (range 2 10)))
;= [2 3 5 7]
It's nicely lazy, so we can use iterate to generate integers, and take only a few (\(r/take\), as it's operating on a reducer):
(into [] (r/take 5 (progressive-filter #(zero? (mod %1 %2)) (iterate inc 2))))
;= [2 3 5 7 11]
Or even
(def primes
(progressive-filter #(zero? (mod %1 %2)) (iterate inc 2)))
(into [] (r/take 5 primes))
;= [2 3 5 7 11]
You get the idea.
(Originally here)