Clojure — sequences (sequences)
Clojure is a dialect of Lisp, so it is not surprising that the list has a key place in the language. However, in contrast to the traditional dialects (CL or Scheme), instead of the classic dual-slot lists in Clojure uses the abstraction Sequence is a "logical list". In fact, this interface provides methods for working with immutable and lazy, possibly infinite sequences. This article describes the internal structure of these entities.
the
Let's start with a java interface. All the structures in Clojure implement clojure.lang.Seqable:
the
In the turn clojure.lang.ISeq defined as:
the
We can draw an analogy with the Iterable, respectively, ISeq an analogue of Iterator. The main difference is that the objects ISeq is immutable: its methods return a new instance (maybe even a different type) instead of changing the internal state of the object. All sequences in Clojure are both collections implements the legacy interface clojure.lang.IPersistentCollectionand through it Seqable:
the
Here we see a count — count the number of elements. For sequences, the complexity will be O(n). Method is empty returns an empty collection of the same type. For sequences returns (). Well, the method cons adds a new element.
To get the objects ISeq language feature seq. If the collection is empty, returns nil, if we have the object Seqable, then it is called method seq(), otherwise creates a special wrapper (adapter). Wrapper are supported for arrays, iterators, the java collections (including Map) and CharSequence (rows). Add your type in this list will not work — it "sewn" in java code (not even minutes). Of course, you shouldn't modify arrays (or java collections), if they created the sequence. So, the exception ConcurrentModificationException just it is forwarding up.
In Clojure for working with sequences, there are only three basic functions:
the
To add new elements instead of cons usually use the function conj. The difference between them is that the former always creates a new cell in a single list, but the result of the second specific for each type of collection item is added to the most appropriate place. For cons-cell this is the beginning of a list of vectors — the end, for sets, the order is not defined at all. It is important to note that if we applied to the collection seq, then we lose information about its type
the
Unfortunately, the names of methods in java interfaces and the function names do not always coincide: the conj corresponds to the cons and rest — to the more. On the other hand, know the names of the methods only need to write their collections, and this activity can hardly be attributed to common.
All standard functions are automatically called seq. For example, if we write (cons 1 [2 3]), created by cons-cell will actually refers not to the vector [1 2] and (seq [1 2]). The same trick occurs if the work of the other functions (map, filter, etc.) — they all work with sequences, though, and take as parameters the collection.
In fact, only lists and conses directly implement ISeq:
the
Therefore, when iteration through the list is not created any temporary objects. But when we run through the vector, each element creates one temporary instance of ISeq. In fact, with version 1.1, things are a little different, but more on that below.
There are more "obscure" interface, clojure.lang.IndexedSeqto implement the sequence for arrays and strings. It defines the method index() to get the number of the current element (i.e. at what point in the source collection is head).
the
In practice this interface is not used anywhere (an undocumented feature). However, to come up with a use for it really hard. By the way, the objects clojure.lang.APersistenVector$Seq also implement it. But the trouble here is that now seq for vectors returns instances of clojure.lang.PersistentVector$ChunkedSeq that do not already have such a dubious opportunity.
the
An important feature of sequences in Clojure is their potential laziness. This means that at the current time not the entire sequence is built, and its "tail" has not yet been created. In Clojure it's often used for in-line processing input / output.
Implemented lazy very simple. There is a class clojure.lang.LazySeq (which implements ISeq). Objects of this class have a reference to the function (instance of IFn), which at its call should return the actual new sequence (probably too lazy, though not necessarily). In any communication with the methods LazySeq is a synchronized call to the source function, the result is stored, and a reference to the function itself is reset (in order to allow the garbage collector to handle it). It is obvious that there is no way to get the element of the lazy sequence, not computing all the previous ones. If you need such behaviour you can use the macro delay.
the
To generate a lazy sequence in your code is ridiculously simple. Just put the code that calls the cons inside the macro lazy-seq. But, really, even to do that more often than not have most of the standard functions (with very few exceptions) return a collection that is lazy, doing the dirty work for us. Is there really and minor details.
So, by my calculation, a lazy sequence actually turns into a normal linked list, and if you keep a pointer to the first element, then there may be problems with memory. A classic example:
the
Here we are trying to find 20000000-th odd number. And it finds it, but that's just all the intermediate instances of ISeq remain in memory. Correct solution:
the
The rest never returns nil, instead it can return an empty sequence. If we want to nil you should use next.
the
This is done for more laziness. In fact, if the sequence is lazy, in the General case, we do not know whether it has items or not. To check this fact with next we need to calculate at least the first 2 elements: first, we discard, and the presence of the second will understand whether to return nil. Well, the calculation of 2 items — this is clearly not what we usually need.
If the lazy behavior is not required, it is possible to completely calculate the entire sequence using the function dorun and doall. The first returns the original sequence, the second is nil.
the
the
Lazy collections are a very powerful tool, but in practice we have often treated the entire collection as a whole, and lazy-seq introduces noticeable overhead. Therefore, in version 1.1 has been done important optimization chunked seqences. The point is simple — when processing lazy sequences we are not dealing with individual elements and their groups. With the eagerly calculates a number of elements that can "grow".
Guess what prints this code?
the
So, Clojure is defined in a special interface clojure.lang.IChunkedSeq. Describe the java part of this mechanism do not have much sense, so I let her go. The interface implements only a sequence of vectors. Well, as we have seen, the result of the function range.
The processing algorithm similar sequences:
the
For example, let's write our own implementation #(filter odd? %). Simple version:
the
Extended version (with support units):
the
Of course, the code was "a bit" more, and in the face of obvious duplication: we had to create two separate implementations. But such is the price of speed. Fortunately, most of the standard functions already support chunks in full. Note that the sequence may consist of blocks of different lengths (blocks do not stick together, but sometimes truncated, as in the example above), and does be a blend of standard elements and units.
What if we want the usual sequence to turn into block? It simply needs to be converted to a vector using the function vec. The reverse is a little harder:
the
It is expected that the function of reduce takes advantage of the block sequence. So, each block has a special method that performs a convolution in java a loop without creating temporary objects. Generally, reduce has other optimizations, all are interested can look at clojure.core.protocols.
the
Sequences in Clojure powerful and convenient. Their undoubted plus the fact that often you don't even have to think about such "trifles" as laziness or block processing — Clojure is trying to do it for us.
Article based on information from habrahabr.ru
the
sequences
Let's start with a java interface. All the structures in Clojure implement clojure.lang.Seqable:
the
public interface Seqable {
ISeq seq();
}
In the turn clojure.lang.ISeq defined as:
the
public interface ISeq extends IPersistentCollection {
Object first();
ISeq next();
ISeq more();
ISeq cons(Object o);
}
We can draw an analogy with the Iterable, respectively, ISeq an analogue of Iterator. The main difference is that the objects ISeq is immutable: its methods return a new instance (maybe even a different type) instead of changing the internal state of the object. All sequences in Clojure are both collections implements the legacy interface clojure.lang.IPersistentCollectionand through it Seqable:
the
public interface IPersistentCollection extends Seqable {
int count();
IPersistentCollection cons(Object o);
IPersistentCollection empty();
boolean equiv(Object o);
}
Here we see a count — count the number of elements. For sequences, the complexity will be O(n). Method is empty returns an empty collection of the same type. For sequences returns (). Well, the method cons adds a new element.
To get the objects ISeq language feature seq. If the collection is empty, returns nil, if we have the object Seqable, then it is called method seq(), otherwise creates a special wrapper (adapter). Wrapper are supported for arrays, iterators, the java collections (including Map) and CharSequence (rows). Add your type in this list will not work — it "sewn" in java code (not even minutes). Of course, you shouldn't modify arrays (or java collections), if they created the sequence. So, the exception ConcurrentModificationException just it is forwarding up.
In Clojure for working with sequences, there are only three basic functions:
the
first — take the first element of the sequence, or nil if it is empty;
rest to the "tail", i.e. the sequence without the first element.
cons — create a new immutable cons cells.
To add new elements instead of cons usually use the function conj. The difference between them is that the former always creates a new cell in a single list, but the result of the second specific for each type of collection item is added to the most appropriate place. For cons-cell this is the beginning of a list of vectors — the end, for sets, the order is not defined at all. It is important to note that if we applied to the collection seq, then we lose information about its type
the
(conj [1 2 3] 4) ;=> [1 2 3 4]
(conj (seq [1 2 3]) 4) ;=> '(4 1 2 3)
Unfortunately, the names of methods in java interfaces and the function names do not always coincide: the conj corresponds to the cons and rest — to the more. On the other hand, know the names of the methods only need to write their collections, and this activity can hardly be attributed to common.
All standard functions are automatically called seq. For example, if we write (cons 1 [2 3]), created by cons-cell will actually refers not to the vector [1 2] and (seq [1 2]). The same trick occurs if the work of the other functions (map, filter, etc.) — they all work with sequences, though, and take as parameters the collection.
In fact, only lists and conses directly implement ISeq:
the
(seq? '(1 2)) ;=> true
(seq? ()) ;=> true
(seq? (cons 1 nil)) ;=> true
(seq? (cons 1 [2 3])) ;=> true
(seq? [1 2]) ;=> false
(seq? {:a :b}) ;=> false
(seq? #{1 2}) ;=> false
(seq? nil) ;=> false
Therefore, when iteration through the list is not created any temporary objects. But when we run through the vector, each element creates one temporary instance of ISeq. In fact, with version 1.1, things are a little different, but more on that below.
There are more "obscure" interface, clojure.lang.IndexedSeqto implement the sequence for arrays and strings. It defines the method index() to get the number of the current element (i.e. at what point in the source collection is head).
the
(.index (rest (to-array [1 2 3 4]))) ;=> 1
(.index (rest (rest (to-array [1 2 3 4]))) ;=> 2
In practice this interface is not used anywhere (an undocumented feature). However, to come up with a use for it really hard. By the way, the objects clojure.lang.APersistenVector$Seq also implement it. But the trouble here is that now seq for vectors returns instances of clojure.lang.PersistentVector$ChunkedSeq that do not already have such a dubious opportunity.
the
Lazy-sequence (lazy sequences)
An important feature of sequences in Clojure is their potential laziness. This means that at the current time not the entire sequence is built, and its "tail" has not yet been created. In Clojure it's often used for in-line processing input / output.
Implemented lazy very simple. There is a class clojure.lang.LazySeq (which implements ISeq). Objects of this class have a reference to the function (instance of IFn), which at its call should return the actual new sequence (probably too lazy, though not necessarily). In any communication with the methods LazySeq is a synchronized call to the source function, the result is stored, and a reference to the function itself is reset (in order to allow the garbage collector to handle it). It is obvious that there is no way to get the element of the lazy sequence, not computing all the previous ones. If you need such behaviour you can use the macro delay.
the
(nth (map #(print % "") (iterate inc 0)) 5) ;=> 1 2 3 4 5
@(nth (map #(delay (print % "")) (iterate inc 0)) 5) ;=> 5
To generate a lazy sequence in your code is ridiculously simple. Just put the code that calls the cons inside the macro lazy-seq. But, really, even to do that more often than not have most of the standard functions (with very few exceptions) return a collection that is lazy, doing the dirty work for us. Is there really and minor details.
So, by my calculation, a lazy sequence actually turns into a normal linked list, and if you keep a pointer to the first element, then there may be problems with memory. A classic example:
the
(def all-odds (filter odd? (range)))
(println (nth all-odds 20000000))
Here we are trying to find 20000000-th odd number. And it finds it, but that's just all the intermediate instances of ISeq remain in memory. Correct solution:
the
(defn make-all-odds [] (filter odd? (range)))
(println (nth (make-all-odds) 2000000))
The rest never returns nil, instead it can return an empty sequence. If we want to nil you should use next.
the
(first '(1 2 3)) ;=> 1
(rest '(1 2 3)) ;=> '(2 3)
(rest '(1)) ;=> ()
(rest ()) ;=> ()
(next '(1)) ;=> nil
(next ()) ;=> nil
This is done for more laziness. In fact, if the sequence is lazy, in the General case, we do not know whether it has items or not. To check this fact with next we need to calculate at least the first 2 elements: first, we discard, and the presence of the second will understand whether to return nil. Well, the calculation of 2 items — this is clearly not what we usually need.
If the lazy behavior is not required, it is possible to completely calculate the entire sequence using the function dorun and doall. The first returns the original sequence, the second is nil.
the
(dorun
(map #(println % "") (range 5)));=> 1 2 3 4 5
the
Block sequence (chunked sequences)
Lazy collections are a very powerful tool, but in practice we have often treated the entire collection as a whole, and lazy-seq introduces noticeable overhead. Therefore, in version 1.1 has been done important optimization chunked seqences. The point is simple — when processing lazy sequences we are not dealing with individual elements and their groups. With the eagerly calculates a number of elements that can "grow".
Guess what prints this code?
the
(first (map #(print % "") (range 100)))
Response
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
So, Clojure is defined in a special interface clojure.lang.IChunkedSeq. Describe the java part of this mechanism do not have much sense, so I let her go. The interface implements only a sequence of vectors. Well, as we have seen, the result of the function range.
The processing algorithm similar sequences:
the
-
the
- check whether the source IChunkedSeq (chunked-seq?); the
- if not, then perform the usual version of the algorithm (piecemeal processing); the
- if Yes, then take a sequence of the first block (the chunk-first); the
- watch size (the chunk-size); the
- create a new block, going to put the generated elements (the chunk-buffer); the
- actually perform useful work, the elements are added to the block (the chunk-append); the
- wrap our new unit in ChunkedCons (the chunk-cons).
For example, let's write our own implementation #(filter odd? %). Simple version:
the
(defn filter-odd-1 [a]
(lazy-seq
(when-let [s (seq a)]
(let [f (first s)
r (rest s)]
(if (odd? f)
(cons f (filter-odd-1 r))
(filter-odd-1 r))))))
Extended version (with support units):
the
(defn filter-odd-2 [a]
(lazy-seq
(when-let [s (seq a)]
(if (chunked-seq? s)
(let [f (chunk-first s)
c (count f)
b (chunk-buffer c)]
(dotimes [i c]
(let [x (nth f i)]
(when (odd? x)
(chunk-append b x))))
(chunk-cons (chunk b) (filter-odd-2 (chunk-rest s))))
(let [f (first s)
r (rest s)]
(if (odd? f)
(cons f (filter-odd-2 r))
(filter-odd-2 r)))))))
Of course, the code was "a bit" more, and in the face of obvious duplication: we had to create two separate implementations. But such is the price of speed. Fortunately, most of the standard functions already support chunks in full. Note that the sequence may consist of blocks of different lengths (blocks do not stick together, but sometimes truncated, as in the example above), and does be a blend of standard elements and units.
What if we want the usual sequence to turn into block? It simply needs to be converted to a vector using the function vec. The reverse is a little harder:
the
(defn seq1 [s]
(lazy-seq
(when-let [x (seq s)]
(cons (first x) (seq1 (rest x))))))
(first (map println (seq1 (range 1000)))) ;=> 1
Alternative solution
(defn seq1 [^clojure.lang.ISeq s]
(reify clojure.lang.ISeq
(first [_] (.first s))
(more [_] (seq1 (.more s)))
(next [_] (let [sn (.next s)] (and sn (seq1 sn))))
(seq [_] (let [ss (.seq s)] (and ss (seq1 ss))))
(count [_] (.count s))
(cons [_ o] (.cons s o))
(empty [_] (.empty s))
(equiv [_ o] (.equiv s o))))
It is expected that the function of reduce takes advantage of the block sequence. So, each block has a special method that performs a convolution in java a loop without creating temporary objects. Generally, reduce has other optimizations, all are interested can look at clojure.core.protocols.
the
in conclusion
Sequences in Clojure powerful and convenient. Their undoubted plus the fact that often you don't even have to think about such "trifles" as laziness or block processing — Clojure is trying to do it for us.
Комментарии
Отправить комментарий