Clojure Programming Cookbook
上QQ阅读APP看书,第一时间看更新

Clojure collections and their basic functions

There are four collection types in Clojure:

  • Lists
  • Vectors
  • Maps
  • Sets

In this recipe, we will describe what these types are and some basic functions for them.

Getting ready

You only need REPL described in the recipe Repl up! in Chapter 1, Live Programming with Clojure, and no additional libraries. Start REPL so that you can review the sample code in this recipe.

How to do it...

We will learn collection types in Clojure including lists, vectors, maps, and sets. We will learn how to create them and use basic functions for them.

Lists

Lists are commonly used in Lisp. Clojure also supports the list data type. Lists are internally implemented as a linked list. To create a list, begin with quote (') and then enclose elements with (). If you want to create an empty list, use ' (), or (list):

'("A Study in Scarlet" 
"The Sign of the Four" 
"The Hound of the Baskervilles" 
"The Valley of Fear") 
;;=> ("A Study in Scarlet" "The Sign of the Four" "The Hound of the Baskervilles" "The Valley of Fear") 
'() 
;;=> () 
(list) 
;;=> () 

There are three basic functions, car, cdr, and cons, to manipulate lists in traditional Lisp. Instead of them, Clojure uses first, rest, and conj. Let's see how they work:

  • first takes the head element from a list:
          (first '(1 2 3 4 5)) 
          ;;=> 1 
          (first '()) 
          ;;=> nil 
          (first nil) 
          ;;=> nil 
    
  • The first of an empty list or nil returns nil.
  • rest returns a list after the first element:
          (rest '(1 2 3 4 5)) 
          ;;=> (2 3 4 5) 
          (rest '()) 
          ;;=> () 
          (rest nil) 
          ;;=> () 
    
  • The rest of an empty list or nil returns an empty list.
  • conj adds elements at the beginning:
          (conj '(2 3 4 5) 1) 
          ;;=> (1 2 3 4 5) 
    
  • conj is a variadic function and can take multiple elements as its arguments:
          ```clj 
          (conj '("d" "e" "f") "c" "b" "a") 
          ;;=> ("a" "b" "c" "d" "e" "f") 
    

Vectors

A vector also represents a sequence of elements, like list, but it supports efficient access to its elements by index.

To create a vector, enclose its elements with []. To create an empty vector, use [] or (vector):

["A Study in Scarlet" 
"The Sign of the Four" 
"The Hound of the Baskervilles" 
"The Valley of Fear"] 
;;=> ["A Study in Scarlet" "The Sign of the Four" "The Hound of the Baskervilles" "The Valley of Fear"] 
[] 
;;=> [] 
(vector) 
;;=> [] 

We can use first, rest, and conj with vectors. Let's begin with first and rest:

(first [1 2 3 4 5]) 
;;=> 1 
(first []) 
;;=> nil 
(rest [1 2 3 4 5]) 
;;=> (2 3 4 5) 

The rest of the vector looks like a returning list, but it is not list. It is ChunkedSeq:

(class (rest [1 2 3 4 5])) 
;;=> clojure.lang.PersistentVector$ChunkedSeq 

To add elements to a vector, use the conj function. conj appends elements to the vector:

(conj [1 2 3 4] 5) 
;;=> [1 2 3 4 5] 

Maps

Maps are associative lists, and they can be accessed efficiently with keys and return their values.

To create a map, enclose key-value pairs with {}:

{:name "Arthur Ignatius Conan Doyle" 
 :born "22-May-1859" 
 :died "7-July-1930" 
 :occupation ["novelist" "short story writer" "poet" "physician"] 
 :nationality "scotish" 
 :citizenship "United Kingdom" 
 :genre ["Detective fiction", "fantasy", "science fiction", "historical novels", "non-fiction"] 
 :notable-works ["Stories of Sherlock Holmes" "The Lost World"] 
 :spouse ["Louisa Hawkins" "Jean Leckie"] 
 :no-of-children 5 
 } 
;;=> {:genre ["Detective fiction" "fantasy" "science fiction" "historical novels" "non-fiction"], :occupation ["novelist" "short story writer" "poet" "physician"], :name "Arthur Ignatius Conan Doyle", :no-of-children 5, :nationality "scotish", :died "7-July-1930", :spouse ["Louisa Hawkins" "Jean Leckie"], :notable-works ["Stories of Sherlock Holmes" "The Lost World"], :citizenship "United Kingdom", :born "22-May-1859"} 

To create an empty map, use {} or (hash-map):

{} 
;;=> {} 
(hash-map) 
;;=> {} 

We can use first, rest, and conj for maps, just as for lists or vectors:

(first {:a 1, :b 2, :c 3 :d 4 :e 5}) 
;;=> [:a 1] 
(rest {:a 1, :b 2, :c 3 :d 4 :e 5}) 
;;=> ([:b 2] [:c 3] [:d 4] [:e 5]) 
(conj {:a 1, :b 2, :c 3 :d 4 } [:e 5]) 
;;=> {:a 1, :b 2, :c 3, :d 4, :e 5} 
(conj {:a 1, :b 2, :c 3 :d 4 } [:d 5] [:e 6]) 
;;=> {:a 1, :b 2, :c 3, :d 5, :e 6} 
(conj {:a 1, :b 2, :c 3 :d 4 } [:d 5]) 
;;=> {:a 1, :b 2, :c 3, :d 5} 

Maps are often used as dictionaries. get, dissoc, and assoc are more commonly used with maps.

get looks up a map entry by its key and returns its value. If get cannot find the key in a collection, it returns nil.

If you don't want to obtain nil in such a case, specify the value you want when the key does not exist:

(get {:a 1, :b 2, :c 3 :d 4 :e 5} :c) 
;;=> 3 
(get {:a 1, :b 2, :c 3 :d 4 :e 5} :f) 
;;=> nil 
(get {:a 1, :b 2, :c 3 :d 4 :e 5} :f :not-found) 
;;=> :not-found 

assoc adds key-value pairs to maps. If there is already a key-value pair in the map, it replaces the value:

(assoc {:a 1, :b 2, :c 3 :d 4} :e 5) 
;;=> {:a 1, :b 2, :c 3, :d 4, :e 5} 
(assoc {:a 1, :b 2, :c 3 :d 4} :c 5) 
;;=> {:a 1, :b 2, :c 5, :d 4} 

assoc can use multiple key-value pairs as its arguments:

(assoc {:a 1, :b 2, :c 3 :d 4} :e 5 :f 6) 
;;=> {:a 1, :b 2, :c 3, :d 4, :e 5, :f 6} 

The dissoc removes key-value pairs from a map:

(dissoc {:a 1, :b 2, :c 3 :d 4} :c) 
;;=> {:a 1, :b 2, :d 4} 
(dissoc {:a 1, :b 2, :c 3 :d 4} :c) 
;;=> {:a 1, :b 2, :d 4} 
(dissoc {:a 1, :b 2, :c 3 :d 4} :e) 
;;=> {:a 1, :b 2, :c 3, :d 4} 

Sets

Sets maintain the uniqueness of their elements. To create a set, being with a hash (#) and then enclose elements with {}. If you want to create an empty set, use #{} or (hash-set):

#{"apple" "orange" "banana" "peach" "strawberry"} 
;;=> #{"peach" "apple" "banana" "orange" "strawberry"} 
#{} 
;;=> #{} 
(hash-set) 

We can use first, rest, and conj for sets. Sets do not keep the order of elements as you input data, so first or rest may look different to what you expected:

(first #{:a :b :c :d :e}) 
;;=> :e 
(rest #{:a :b :c :d :e}) 
;;=> (:c :b :d :a) 
(conj #{:a :b :c :d} :e) 
;;=> #{:e :c :b :d :a} 
(conj #{:a :b :c :d :e} :e) 
;;=> #{:e :c :b :d :a} 

The contains? function examines whether the element exists in a set, whereas disj eliminates the element from the set:

(contains? #{:a :b :c :d :e} :c) 
;=> true 
(contains? #{:a :b :c :d :e} :f) 
;;=> false 
(disj #{:a :b :c :d :e} :c) 
;;=> #{:e :b :d :a} 
(disj #{:a :b :c :d :e} :c :d) 
;;=> #{:e :b :a} 

How it works...

All collection types in Clojure are persistent and immutable. A persistent data structure preserves the previous versions of itself when it is modified. Clojure has efficient implementation to create modified versions of collections so it doesn't simply copy the entire data structure, but shares the original data.

Clojure collections

All collection types implement the clojure.lang.Seqable interface and its seq method returns clojure.lang.ISeq. ISeq has the cons, first, more, and next functions:

public interface Seqable { 
    ISeq seq(); 
} 
public interface ISeq extends IPersistentCollection { 
    Object first(); 
    ISeq next(); 
    ISeq more(); 
    ISeq cons(Object o); 
} 

ISeq implements IPersistentCollection and it has count, cons, empty, and equiv:

public interface IPersistentCollection 
    extends Seqable { 
    int count(); 
    IPersistentCollection cons(Object o); 
    IPersistentCollection empty(); 
    boolean equiv(Object o); 
    } 

Differences between lists and vectors

Lists and sets represent sequences of elements, but there are some differences:

  • Lists are clojure.lang.PersistentList and vectors are clojure.lang.PersistentVector.
  • Lists are implemented as a linked list. Vectors are implemented using a tree structure, and each node has 32 elements.
  • Vectors are efficient for index access. Index access for lists takes O(n), whereas it takes O(log32(n)) for vectors.
  • That is why, with conj, the order of the results is different in lists and vectors. conj puts an element at the beginning of lists, but it puts an element at the end of vectors:
      (conj '(2 3 4 5) 1) 
      ;;=> (1 2 3 4 5) 
      (conj [1 2 3 4] 5) 
      ;;=> [1 2 3 4 5] 

The following code tests index access for a list and a vector. This test measures access to the last element in a collection consisting of 1 million elements. The test loops 100 times. Access to the list takes half a second, but access to the vector only takes 0.1 milliseconds:

(let [n 1000000 iter 100 
      l (into () (repeat n "a")) 
      v (into [] (repeat n "a")) 
      ] 
  (time (dotimes [_ iter](nth l (dec n) ))) 
  (time (dotimes [_ iter](nth v (dec n) ))) 
  ) 
"Elapsed time: 532.623841 msecs" 
"Elapsed time: 0.167739 msecs" 

Reversing the order of vectors is faster than reversing the order of lists:

(do 
  (time (reverse (into '() (range 100000)))) 
  (time (reverse (into [] (range 100000)))) 
  nil 
  ) 
"Elapsed time: 60.67435 msecs" 
"Elapsed time: 26.431954 msecs" 
nil 

The reason is that clojure.lang.PersistentVector implements clojure.lang.Reversible but clojure.lang.PersistentList does not.

Clojure is immutable

Since Clojure functions for collections are immutable, it does not modify the original collection. It is different from Python or Ruby. In the following Python code, insert modifies the original list:

l = [2, 3, 4, 5] 
l.insert(0, 1) 
l 
#[1, 2, 3, 4, 5] 

The next Ruby code also modifies the original list:

$ irb 
irb(main):001:0> letters = Array.new([2,3,4]) 
=> [2, 3, 4] 
irb(main):002:0> letters.insert(0, 1) 
=> [1, 2, 3, 4] 

But Clojure does not change the original data:

(def l '(2 3 4 5)) 
#'collection.core/l 
(conj '(2 3 4 5) 1) 
;;=> (1 2 3 4 5) 
l 
;;=> (2 3 4 5) 

Using immutable data ensures that arguments are unmodified after returning from other function calls. On the other hand, calling functions causes unexpected modification of arguments, which causes bugs that are difficult to find.

There's more...

Here, we will show you some techniques for manipulating collections.

Converting collections between different types

into converts types in collections. The following code converts between a vector and a list:

(into [] '(1 2 3 4 5)) 
;;=> [1 2 3 4 5] 
(into '() [1 2 3 4 5]) 
;;=> (5 4 3 2 1) 

In the following code, into converts between a map and a vector:

(into {} [[:a 1][:b 2] [:c 3][:d 4][:e 5]]) 
;;=> {:a 1, :b 2, :c 3, :d 4, :e 5} 
(into [] {:a 1 :b 2 :c 3 :d 4 :e 5}) 
;;=> [[:a 1] [:b 2] [:c 3] [:d 4] [:e 5]] 

Finally, this into converts between a set and a vector:

(into [] #{:a :b :c :d :e}) 
;;=> [:e :c :b :d :a] 
(into #{} [:a :b :c :d :e]) 
#{:e :c :b :d :a} 

Using distinct

distinct is similar to into with sets. It eliminates duplicated elements:

(distinct ["apple" "orange" "melon" "grape" "orange" "pinapple"]) 
;;=> ("apple" "orange" "melon" "grape" "pinapple") 
(into #{} ["apple" "orange" "melon" "grape" "orange" "pinapple"]) 
;;=> #{"apple" "pinapple" "orange" "grape" "melon"} 

Unlike set, using distinct does not change the data type.