**************************************************************************** ITERATORS Iterator objects are such that a number of successive calls to its __next__ method will return values, and all subsequent calls will raise StopIteration exception. More exactly, an "iterator" is any object such that - it supports __next__() method called without parameters and possibly raises StopIteration exception, and - if a call to __next__() raises StopIteration than every subsequent call raises the same exception, - it supports __iter__() method called without parameters that returns the object itself. A "finite iterator" is an iterator that raises StopIteration after some number of calls to __next__(). An "infinite iterator" is an iterator that allows to call __next__() any number of times without raising StopIteration. Calling OBJECT.__next__() can be done as next(OBJECT) using a built-in function next(). Calling next(ITERATOR, [DEFAULT]) is similar to calling next(ITERATOR) but instead of raising StopIteration it will return DEFAULT. ii = iter([1,2,3]) # create an iterator, as explained later. for k in range(5): print(next(ii)) 1 2 3 StopIteration exception ii = iter([1,2,3]) for k in range(5): print(next(ii,0)) 1 2 3 0 0 When an iterator raises StopIteration, it means that it exhausted its values; in general, it is not possible to reset the iterator to its starting position. Iterators are similar to sequences in representing a succession of values. However: - a long sequence takes more space than the corresponding iterator; - unlike sequence items, iterator items cannot be accessed using an index; in general, you cannot go back to previous values in an iterator. *************************************************************************** DEFINING A CLASS OF ITERATOR OBJECTS Class ThreeSquares(object) Objects of this class are finite iterators. To construct its object invoke this class with a numeric parameter n. Consecutive calls to that object's method __next__() will return n**2, (n**2)**2 and ((n**2)**2)**2. class ThreeSquares(object): def __init__(self, number): self.number = number self.how_many_more = 3 def __iter__(self): return self def __next__(self): if self.how_many_more == 0: raise StopIteration else: self.number **= 2 self.how_many_more -= 1 return self.number ts = ThreeSquares(2) for k in range(10): print(next(ts)) 4 16 256 StopIteration ----------------------------------- Class RandomIterator Objects of this class are infinite iterators. To construct its object invoke this class with a sequence parameter. Consecutive calls to that object's method __next__() will produce random values from the sequence. import random class RandomIterator(object): def __init__(self, seq): self.data = seq def __iter__(self): return self def __next__(self): return random.choice(self.data) ri = RandomIterator([1,2,3]) for k in range(10): print(next(ri)) 2 2 3 1 3 2 1 1 3 2 *************************************************************************** ITERABLES An iterable is an object that can return values one at a time, typically, a container that can return all its members one at a time. More exactly, an iterable is any object that supports - either __iter__() method called without arguments that returns an iterator for the container, - or __getitem__() method called with an integer argument starting at 0 that returns an item from the container __getitem__() can/should raise IndexError or StopIteration. Notice that all sequences have __getitem__() that takes an integer parameter starting at 0; however instead of calling SEQUENCE.__getitem__(INDEX) you use an alternative syntax SEQUENCE[INDEX]. Notice also that, unlike sequences, iterables do not require a __len__() method. The following are iterables: - iterators, - range objects, - sequences (string, tuple, list, bytes, bytearray, ...), - dictionaries, - dictionary views, - sets and frozensets, - file objects. EXERCISE: use dir() to see which of such objects have __iter__() method, which have __getitem__() method, and which have both. ************************************************************************* FUNCTION iter() WITH ONE ARGUMENT iter(ITERABLE) returns an iterator for ITERABLE (if __iter__() or __getitem__() are implemented correctly). iter(OBJECT) raises TypeError iff OBJECT does not have __iter__() method or __getitem__() method. You seldom need to use iter() function explicitly; it is called automatically whenever you provide an iterable to a for-loop construct and in some other situations. Given an iterator (not just an iterable) as a parameter function iter() returns the same iterator: a = iter([1,2,3]) b = iter(a) a is b True If an iterable has both __iter__ and __getitem__, iter will create an iterator by using __iter__. This is because, it is assumed that __iter__ may be more efficient than repeated calls to __getitem__. ************************************************************************** DEFINING A CLASS OF ITERABLE OBJECTS Class MyIterable To construct its object invoke the class with a sequence parameter. Evaluating OBJECT[INDEX] will produce the corresponding item from that sequence. The object is not a sequence because it does not support __len__() method. class MyIterable(object): def __init__(self, seq): self.data = seq def __getitem__(self, index): return self.data[index] mi = MyIterable([0,1,2]) mi = iter(mi) for n in range(5): print(next(mi)) 0 1 2 StopIteration EXERCISE: define an object with dummy methods __iter__() and __getitem__() and experiment with iter() to determine which of the methods it will use. *************************************************************************** BUILT-IN CONSTRUCTS INVOLVING ITERABLES - list(ITERABLE) returns list - tuple(ITERABLE) returns tuple - frozenset(ITERABLE) returns frozenset - set(ITERABLE) returns set - LIST += ITERABLE - VARIABLES = ITERABLE # unpacking an iterable of a specific length - (VARIABLES) = ITERABLE # unpacking an iterable of a specific length - [VARIABLES] = ITERABLE # unpacking an iterable of a specific length The following are equivalent: x, y, z = range(3) (x, y, z) = range(3) [x, y, z] = range(3) x=0; y=1; z=2 - range([START],STOP,[STRIDE]) returns iterable list(range(5)) => [0,1,2,3,4] list(range(10,16)) => [10,11,12,13,14,15] list(range(10,16,2)) => [10, 12, 14] - for VARIABLE in ITERABLE: in loops, comprehensions and generator expressions - OBJECT in ITERABLE # evaluates to Boolean - OBJECT not in ITERABLE # evaluates to Boolean - all(ITERABLE) returns Boolean true if all are true, otherwise false. - any(ITERABLE) returns Boolean false if all are false, otherwise true. - iter(ITERABLE) returns iterator - iter(CALLABLE, SENTINEL) returns iterator # will be explained later. - filter(FUNCTION, ITERABLE) returns iterator list(filter(lambda n:n%2==0, [0,1,2,3,4,5])) => [0,2,4] - map(CALLABLE, ITERABLES) returns an iterator list(map(lambda n:n**2, [0,1,2,3,4,5])) => [0,1,4,9,16,25] - enumerate(ITERABLE, start=0) returns iterator list(enumerate('pqr')) => [(0,'p'), (1,'q'), (2,'r')] - zip(ITERABLES) returns an iterator list(zip('ABCDE',[1,2,3],'abcd')) => [('A',1,'a'), ('B',2,'b'), ('C',3,'c')] - max(ITERABLE, [key=KEY]) # returns max value from ITERABLE max(range(2,5)) => 4 max(range(2,5), key = lambda x:-x) => 2 max(2,3,4) => 4 - min(ITERABLE, [KEY]) # returns min value from ITERABLE - sum(ITERABLE) # returns the sum of the numbers from ITERABLE - DICTIONARY.keys() returns iterable - DICTIONARY.values() returns iterable - DICTIONARY.items() returns iterable list({"a":1, "b":2}.items()) => [('a', 1), ('b', 2)] - dict(ITERABLE) returns dictionary dict([('a', 1), ('b', 2)]) => {"a":1, "b":2} - open(FILENAME, ...) returns iterable the iterable returns lines from the text file as strings. - reversed(OBJECT) returns iterator if OBJECT is a sequence or supports __reversed__() method. - sorted(ITERABLE, [key=FUNCTION, [reverse=BOOLEAN]]) returns list sorted([-1,2,0,-2,1], key=lambda n:-n, reverse=True) => [-2, -1, 0, 1, 2] *************************************************************************** HOW DO FOR-LOOPS WORK? The following three snippets are equivalent. for k in ITERABLE: print(k) for k in iter(ITERABLE): print(k) temp_iterator = iter(ITERABLE) while True: try: print(next(temp_iterator)) except StopIteration: break The while-loop above shows how a for-loop is actually implemented (although the implementation is at a lower level). An iterable object (such as a string 'abc' below) produces a fresh new iterator each time you pass it to iter() function or use it in a for-loop: for m in range(2): for c in "abc": print(c) a b c a b c An iterator, on the other hand will become exhausted after the first pass: ci = iter("abc") for m in range(2): for c in ci: print(c) a b c *************************************************************************** A better demonstration of an iterator ii = iter([0,1,2]) for k in range(5): try: v = next(ii) except StopIteration: v = 'stop iteration' print(k, v) 0 0 1 1 2 2 3 stop iteration 4 stop iteration *************************************************************************** BUILT-IN FUNCTIONS THAT RETURN ITERATORS - iter - enumerate - filter - map - zip *************************************************************************** GENERATOR FUNCTIONS A "generator function" is a function containing "yield" expressions in the function definition. Such a function returns an iterator. The execution of the iterator's __next__() method causes that the execution of the code in the generator function body, which pauses at the yield expression and returns the object given in that expression, but which can be restarted by another call to __next__(). def gen(n=0): n+=1 yield n yield n+1 for i in range(n+2,n+4): yield i i+=1 gg = gen(2) for k in range(10): print(next(gg)) 3 4 5 6 StopIteration The iterator gets exhausted if control falls off the end of the body of the generator function (as in the example above), or if return is executed, or if StopIteration is raised. There cannot be any return VALUE statements in the code of a generator function. ************************************************************************ GENERATOR EXPRESSIONS A "generator expression" is an expression of any of the following forms; such an expression evaluates to an iterator. (EXPR for PARAMETER1 in ITERABLE1) (EXPR for PARAMETER1 in ITERABLE1 for PARAMETER2 in ITERABLE2) (EXPR for PARAMETER1 in ITERABLE1 if CONDITION1) (EXPR for PARAMETER1 in ITERABLE1 if CONDITION1 for PARAMETER2 in ITERABLE2 if CONDITION2) The number of for-clauses is arbitrary; the if-clauses are optional. ii = (n**2 for n in range(5) if n%2==0) next(ii) 0 next(ii) 4 next(ii) 16 next(ii) StopIteration ************************************************************************ NON-CANONICAL ITERATORS By a "callable" we understand any object which can be called. Functions, methods and classes (both built-in and user-defined) comprise callables. (What is termed "calling a class" in Python, is termed "calling a constructor/initializer" in other O.O. languages.) A "call" is made by providing actual parameters to a callable; More exactly, a call consists of a callable object and a list of other objects (which serve as actual parameters). A call may have many "occurrences" -- this means, the same call (with the same parameter list) may be evaluated many times (with the possibility that an evaluation changes the objects involved). A call is just a concept, and not a Python object. By a "non-canonical iterator" we understand any call. (This terminology does not come from Python documentation.) To see why a call is considered a kind of iterator, notice the similarity: both ITERATOR.__next__() and CALLABLE(PARAMETERS) can be evaluated repeatedly and each time return a value or raise an exception. There are differences, of course: - when ITERATOR.__next__() rises StopIteration, on every subsequent call it must raise StopIteration while CALLABLE(PARAMETERS) may raise an exception at one time and return a value or raise another exception at the next occurrence. - the parameter list in ITERATOR.__next__() is empty while it can be non-empty in CALLABLE(PARAMETERS), - iterators return values through calls to the __next__() method while CALLABLE does not need to involve __next__. Because of the important similarity we want to consider a call as a kind of iterator, but because of the differences we call it non-canonical. If ITERATOR is an iterator (an object), then the calls ITERATOR.__next__() and next(ITERATOR) are non-canonical iterators. A call whose every occurrence returns the same value is called a "trivial" non-canonical iterator. For instance a function which given a parameter n returns the nth prime number is a trivial non-canonical iterator; trivial -- because it returns the same value indefinitely. Non-trivial non-canonical iterators are capable of returning different values from different occurrences of the call. Such non-canonical iterators can be implemented as functions or methods which maintain a state between occurrences of the call. This is actually common with methods -- an object maintains a state, and a method invoked on the object can change this state. Functions can also maintain a state in a non-local variable -- see: closures, below. ************************************************************************ CLOSURES A "free variable" is a variable referred to in a function definition that is not a local variable and not a formal parameter. An entity in a programming language is called a "first-class" object/entity if it can be used in programs without restrictions: - have an intrinsic identity (independent of any given name), - can be expressed as an anonymous literal value, - can be stored in variables and data structures, - can be compared for equality with other entities, - can be passed as a parameter to functions and returned from functions, - can be constructed at run time, - can be written to files and read from files, sockets, etc., - can be printed. For example, in C and C++, functions are not first class entities because they cannot be constructed at run time. In Python functions are first class entities. By a "closure" we understand a first-class function with free variables; we say that the function "closes over its free variables". ----------------------------------------------------------------- A closure can be used to form a non-canonical iterator. def counter(start = 0, increment = 1): value = start def incr(): # incr is a closure because nonlocal value # it refers to value in the parent name-space. oldvalue = value value += increment # increment is also nonlocal but it is an rhs return oldvalue # so it does not need to be declared as nonlocal. return incr # return the closure. c = counter() # c is a closure. d = counter(10, 2) # d is a closure. c() 0 d() 10 c() 1 d() 12 c() 2 d() 14 Closures are callables. They are not iterators, because they do not have a method __next__(). They are not iterables, so they cannot be used directly to set up a for-loop). *************************************************************************** Another function which returns 1,2,3,... in consecutive calls. def c(): if "value" not in c.__dict__: c.value = 0 c.value += 1 return c.value c() 1 c() 2 c.value = 10 c() 11 c() 12 *************************************************************************** FUNCTION iter WITH TWO ARGUMENTS AND CALLS WITH EMPTY PARAMETER LISTS Given a callable (possibly a closure) which can be called with no parameters one can create an iterator by using iter(CALLABLE, SENTINEL). c = counter() # c is a closure; it requires no parameters. # consecutive calls return 0,1,2,3,... ii = iter(c, 5) for k in ii: print(k) 0 1 2 3 4 The iterator returned by iter(CALLABLE, SENTINEL) keeps calling CALLABLE but raises StopIteration when CALLABLE returns a value equal to SENTINEL or when CALLABLE raises StopIteration. If CALLABLE raises another exception, the iterator propagates it. Here is our own implementations of iter(CALLABLE, SENTINEL) faithful in the behavior of the __next__() method (but our iterator objects may have a different set of methods than objects returned by iter()). An implementation using a class. class MyIter(object): def __init__(self, callable, sentinel): self.callable = callable self.sentinel = sentinel self.exhausted = False def __iter__(self): return self def __next__(self): if self.exhausted: raise StopIteration else: try: v = self.callable() except StopIteration: self.exhausted = True raise # Propagate if v == self.sentinel: self.exhausted = True raise StopIteration else: return v c = counter() # c is non-canonical iterator (more exactly, a closure) ii = MyIter(c,5) # ii is the (canonical) iterator corresponding to c for k in ii: print(k) 0 1 2 3 4 StopIteration Another implementation, using a generator function. def myiter(callable, sentinel): exhausted = False while True: if exhausted: raise StopIteration else: try: v = callable() except StopIteration: exhausted = True raise # Propagate if v == sentinel: exhausted = True raise StopIteration else: yield v c = counter() # c is non-canonical iterator (more exactly, a closure) ii = myiter(c,5) # ii is the (canonical) iterator corresponding to c for k in ii: print(k) 0 1 2 3 4 StopIteration --------------------------------------------------------------- CALLS WITH NONEMPTY PARAMETER LISTS def sumkeeper(start = 0): value = start def add(increment): nonlocal value value += increment return value return add # return the closure. s = sumkeeper() s(2) 2 s(2) 4 s(2) 6 s(2) is a call with a nonempty parameter list. We will show how to turn such a call into an iterator. ii = iter(lambda : s(2), 20) or, suggesting a generalization: callable = s parameters = (2,) # a tuple containing all the parameters sentinel = 30 ii = iter(lambda : callable(*parameters), sentinel) Notice that lambda : s(2) or lambda : callable(*parameters) produce a callable which does not take parameters. --------------------------------------------------------------- GENERATE TILL EXERCISE 1 write a class definition for a class GenerateTill with the following behavior. If CALLABLE can be called with an empty parameter list GenerateTill(CALLABLE, [TEST]) returns an iterator whose __next__() makes a call to CALLABLE() and returns the value obtained in this way unless that TEST(value) is True in which case __next__() raises StopIteration (on this and any subsequent call); TEST defaults to lambda x: False. EXERCISE 2 Write a generator function generate_till() such that generate_till(CALLABLE, [TEST]) returns an iterator with the same behavior as that from the class in EXERCISE 1. Notice that with the default test we obtain an infinite iterator. With the test being lambda x: x==SENTINEL we obtain a behavior similar to iter(CALLABLE, SENTINEL). Notice however that, unlike iter(CALLABLE, SENTINEL), our implementations should not catch StopIteration raised by CALLABLE(). **************************************************************************** A PROBLEM WITH FUNCTION iter There is a lack of symmetry between what this fucntion produces when called with one argument and when it is called with two arguments. --------- iter() with one argument --------- mylist = [1,2,3] ii = iter(mylist) jj = iter(mylist) We will see that iterators ii work independently: next(ii) 1 next(jj) 1 next(ii) 2 next(jj) 2 next(ii) 3 next(jj) 3 --------- iter() with two arguments --------- def counter(): value = 1 def incr(): # incr is a closure because nonlocal value # it refers to value in the parent name-space. oldvalue = value value += 1 # increment is also nonlocal but it is an rhs return oldvalue # so it does not need to be declared as nonlocal. return incr # return the closure. mycallable=counter() mycallable() 1 mycallable() 2 mycallable() 3 mycallable=counter() ii = iter(mycallable, 5) jj = iter(mycallable, 5) We will see that iterators ii and jj do NOT work independently: next(ii) 1 next(jj) 2 next(ii) 3 next(jj) 4 **************************************************************************** DIFFERENT WAYS TO DEFINE AN ITERATOR Class hierarchy in Python (roughly) object callable function method class (i.e. constructor/initializer) iterable iterator sequence str tuple list ... dict dictView sets set frozenset file -------- Alternative ways to define an iterator 1. Define a class whose objects have methods __iter__() and __next__(). Any call to the class (i.e. constructor/initializer) returns an iterator. 2. Define a generator function. Any call to the generator function returns an iterator. 3. Evaluate a generator expression. This produces an iterator. 4. Execute a call to any of the following built-in functions: - enumerate - filter - map - zip They return iterators. 5. Execute iter(ITERABLE) this returns an iterator. 6. Given CALLABLE: a) if CALLABLE takes no parameters execute iter(CALLABLE, SENTINEL). This returns an iterator. b) if CALLABLE takes parameters put them in a tuple PARAMETERS and, execute iter(lambda : CALLABLE(*PARAMETERS), SENTINEL) This returns an iterator. 7. The following generalizes 5 as suggested in section GENERATE TILL above. Given CALLABLE: a) if CALLABLE takes no parameters execute generate_till(CALLABLE, TEST). This returns an iterator. b) if CALLABLE takes parameters put them in a tuple PARAMETERS and, execute generate_till(lambda : CALLABLE(*PARAMETERS), TEST) This returns an iterator. By points 5-7 any iterable and any call can produce an iterator. **************************************************************************** EXAMPLE OF A USER DEFINED OPERATION ON ITERABLES Class GiveMany To construct its object call this class with an iterable parameter. Then, any call OBJECT.give(n) will return a list containing n items generated by the iterable. class GiveMany(object): def __init__(self, iterable, raise_exception=False): self.raise_exception = raise_exception self.iter = iter(iterable) def give(self, howmany=1): retval = [] for eachItem in range(howmany): try: retval.append(self.iter.__next__()) except StopIteration: if self.raise_exception: raise else: break return retval cc=GiveMany([1,2,3]) for i in range(2,5): print(cc.give(i)) [1, 2] [3] [] # The same as using a generator function: # ... ******************************************************************* FURTHER READING Modules: - itertools - functools - operator Python documentation: - The Python Library Reference - Functional Programming HOWTO Python tests in /usr/local/lib/python3.0/test/ - test_generators.py find "Sieve of Eratosthenes" and "LazyList" ****************************************************************** MODULE itertools The function tee duplicates nonlocal (!) state (for instance that stored in a variable g below.) g=5 def genfun(): global g yield g g += 1 yield g ii = genfun() import itertools a, b = itertools.tee(ii) next(a) 5 next(b) 5 next(a) 6 next(b) 6 ****************************************************** gincr = 5 def counter(): value = 0 def incr(): nonlocal value oldvalue = value value += gincr return oldvalue return incr *************************** class SimplifyIterator(object): def __init__(self, iterator): self.next = iterator.__next__ def __iter__(self): return self def __next__(self): return self.next()