SUNY Plattsburgh Computer Science Department Technical Report No.: 2010-05-21-SUNY-PLA-CS-Plaza-1 SOME OF THE THINGS WHICH MAKE A PROGRAMMING LANGUAGE MATHEMATICALLY ELEGANT: ALGEBRAIC PROPERTIES OF BASIC OPERATIONS IN PYTHON. (c) 2010 Jan Plaza version of May 21, 2010, for Python 3.1.2 Comments welcome -- jan.plaza@plattsburgh.edu ABSTRACT This paper comments on basic properties of constructs ==, is, repr, eval, id, hash in Python 3 Their implementation in the core language satisfies many properties commonly used in mathamatics, however a programmer who is not careful can define objects which will violate them -- one goal of the paper is to warn about such possibilities. Another goal of this paper is to state explicitely that some natural properties are false and either to ask if it is feasible to uphold them or to look for a rationale for not upholding them. The quotes and examples in the paper concern Python 3.1.1+ or 3.1.2, the test were made under a 64bit Intel compatible machine under Linux). The properties listed in the paper are labeled as T TRUE, T? TRUE probably, F? FALSE probably, F FALSE. Definite statement "TRUE" is made if the official Python 3 documentation justifies it. Definite statement "FALSE" is made if we know a programming counter-example. The statements "TRUE (probably)", "FALSE (probably)" relate authors conviction but need to be verified. Although properties of arithmetical oparations and numeric comparisons make significant contribution to mathematical elegance of Python they are not in the scope of the current version of this paper. Also, the wide extent of orthogonality of programing contructs in Python contibutes to its elegance, however it is not in the scope of this writing. ############################################################################## We need to avoid confusing mathematical concepts and programming concepts. Mathematical concepts set -- a set as understood in mathematics; function -- a function as understood in mathematics; relation -- a relation as understood in mathematics; x -- Cartesian product. dom(f), range(f) -- domain and range of function f, as understood in mathematics. Python concepts Python object -- object as understood in Python, Python function -- * a callable in the sense of Python, or * any Python operator (+, -, *, **, /, //, %, <<, >>, &, |, ^, ~, <, >, <=, >=, ==, !=), or * is, is not, and, or, not Note 1: Python documetation does not call "is" an operator. This is probably because the user can extend arbitrarily the operators such as + to user defined classes (by defining special method __add__) while "is" is autamatically available for all objects and cannot be redefined in user classes. Note 2: The author is not aware of any Python constructs which take arguments and produce a value, other than those listed above (list comprehesnsions are intentionally excluded.) Note 3: For the purposes of this paper, it would be enough to consider just collables, because for any operator or construct such as is, it is possible to define a function with the same in-out behavior, e.g. define plus(x, y): return x + y false-equivalents -- Python objects represented None, False, 0, 0.0, 0+0j, "". true-equivalents -- Python objects other than false Sets of Python objects Consider the set of all objects existing in the memory at ONE particular moment assuming that None, True and False are among them. In this paper Integer, String, TIS, LTIS, Object will mean the following. N = None -- the set contianing object None. B = Boolean -- the set contianing two objects: True and False. I = Integer -- the set of all such objects of type . S = String -- the set of all such objects of type . D = Dictionary -- the set of all such objects of type . TIS -- the set of all such objects created using tuples, integers and strings. LTIS -- the set of all such objects created using lists, tuples, integers and strings. Object -- the set of all such objects DEFINITION Let f be a Python function which takes two arguments 1. By the RELATION DETERMINED BY f we understand the binary relation Ro = { in Object x Object : x o y evaluates to a true-equivalent}. 2. Let A be a set of Python objects. We say that o DETERMINES A BINARY RELATION ON A if A x A is a subset of { in Object x Object : x o y does not raise any exception}. DEFINITION Let f be a Python function (such as id, repr, eval, hash) which takes one argument. 1. By the PARTIAL FUNCTION DETERMINED BY f we understand the binary relation Rf = { in Object x Object : f(x) evaluates to y} 2. We say that f DETERMINES A FUNCTION FROM A TO B, and write Rf: A --> B, if Rf is a function and dom(f)=A and range(f) is a subset of B. Python functions with n arguments are treated in a similar way. CONVENTION. We will overload the notation writing o instead of Ro and f instead of Rf. This means == depending on the context will stand either for Python's == or for the binary relation determined by Python's ==. Similarly for Python's is. Also, depending on the context, repr will stand either for Python's repr or for the partial function determined by Python's repr. Similarly for Python's id, eval, hash. DEFINITION Let A be a subset of Object. Let f be a Python function. Let R be a binary relation on A. We say that f PRESERVES R FOR OBJECTS IN A if * f determines an n-argument function from A to Object. * x1 R y1 and ... and x2 R y2 imply f(x1,...,xn) R f(y1,...,yn) for any x1,...,xn,y1,...,yn in A. ############################################################################## ABOUT == QUOTES "== [...] compare the values of two objects. The objects need not have the same type. If both are numbers, they are converted to a common type. Otherwise, the == and != operators always consider objects of different types to be unequal." Python 3.1.1. Reference. "some types (for example, function objects) support only a degenerate notion of comparison where any two objects of that type are unequal." "Non-identical instances of a class normally compare as non-equal unless the class defines the __eq__() method." "x==y calls x.__eq__(y) There are no swapped-argument versions of these methods (to be used when the left argument does not support the operation but the right argument does.)" Python 3.1.1. Reference. =0 PROPERY. == is always evaluated at the runtime. TRUE (probably). COMMENT This property implies that for any two expressions E1 and E2 typing in the Python interprester E1 == E2 produces the same truth value or raises the same exception as typing x1 = E1 x2 = E2 x == y COMMENT It is conceivable that in some (poor) implementation of a programming language this proeprty is not satisfied satisfied, for instance floating point (exponential notation) literals 1e500 and 2e500 both are represented as the same floating point number object as 0.0. and we have: >>> 1e500 == 2e500 False x = 1e500 y = 2e500 x == y True =1 PROPERTY. == determines a binary relation on Object. TRUE (probably). PROPERTY restated. Comparable = Object. WARNING. A programmer who is not careful can define a class whose objects do not support ==. =2 PROPERTY. reflexivity: x == x for x in Comparable TRUE (probably). WARNING. A programmer who is not careful can define a class whose objects support == but such that this property is violated. =3 PROPERTY. symmetry: x == y implies y == x for x,y in Comparable. TRUE (probably). WARNING. A programmer who is not careful can define a class whose objects support == but such that this property is violated. =4 PROPERTY. transitivity: x == y, y == z imply x == z for x,y in Comparable. TRUE (probably) WARNING. A programmer who is not careful can define a class whose objects support == but such that this property is violated. =5 PROPERTY. Python functions preserve == for objects in Comparable. ** FALSE for Python functions whose definitions use constructs "type", "isinstance" which can distinguish between equal (==) number objects such as 1 and 1.0. ** FALSE for Python functions whose definitions use "id" to distinguish between equal (==) objects such as a container object and its copy. ** FALSE for functions which maintain a state in a non-local variable. TRUE (probably) in any of the following cases: - id, type, isinstance are excluded and non-local variables are not used; - id is excluded, number objects other than integers are excluded (leaving only one number type) and non-local variables are not used. NOTE. One could define in the implmenetaion of the language a new opertor === such that 1 === 1.0 evaluates to False Both == and === would be a applicable to arbitrary objects and they would compare recursively their components. For this new operator a countetrparts of properties =1 through =5 would be TRUE (probably) in cases when just id is excluded and non-local variables are not used. Notice that defining eq as define eq(x, y): return type(x) == type(y) and x == y is not correct: although eq(1, 1.0) returns False eq([1], [1.0]) returns True ############################################################################## ABOUT repr AND eval r0 PROPERTY. repr: Object --> String TRUE (probably). QUOTE. "object.__repr__(self)¶ Called by the repr() built-in function to compute the “official string representation of an object. If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description...> should be returned. The return value must be a string object. This is typically used for debugging, so it is important that the representation is information-rich and unambiguous." Python 3.1 docs. WARNING. A programmer who is not careful can define a class whose objects do not support repr. r1 PROPERTY. x == y imply repr(x) == repr(y), for x,y in Comparable. TRUE (probably). QUESTION: do equal dictionaries have equal representation (in the same order)? NOTE: =5 implies r1. WARNING. A programmer who is not careful can define a class whose objects do support repr but violate this property. r2 PROPERTY. repr(x) == repr(y) implies x == y, for x,y in Comparable. TRUE (probably). WARNING. A programmer who is not careful can define a class whose objects do support repr but violate this property. QUOTE. "eval(source [...]) -> value Evaluate the source in the context [...]. The source may be a string representing a Python expression or a code object as returned by compile()." From help(eval). ASSUMPTION. In this paper we consider calls to Python's eval with an argument of type str and without optional arguments, so that they determine a one- argument function on String. e1 PROPERTY. eval is a function to Object. TRUE e2 PROPERTY range(repr) subset dom(eval) ** FALSE, if x is a Python function, because repr(func) not in dom(eval). QUESTION. What is Python's rationale for handling functions this way? DEFINITION: Evaluable = dom(eval). NOTE. Evaluable is a subset of String. PROPERTY restated: range(repr) is a subset Evaluable. NOTE. range(repr) = Evaluable would be too strong a condition: we want for dictionaries eval("{'a':1, 'b':2}") = eval({'b':2, 'a':1}) but this condition would imply that only one of the strings "{'a':1, 'b':2}", "{'b':2, 'a':1}" is in Evaluable. DEFINITION Representable = {x: repr(x) in Evaluable} NOTE. LTIS is a subset of Evaluable. PROPERTY restated: Representable = Object. WARNING. A programmer who is not careful can define even a simple container class whose objects do support repr but violate this property. e3 r1 == r2 implies eval(r1) == eval(r2) for r1,r2 in Evaluable. TRUE (probably) NOTE: =5 implies e3. e4 PROPERTY: eval(r1) == eval(r2) implies r1 == r2 for r1,r2 in Evaluable. ** FALSE. eval("(1,2)") == eval("(1 , 2)") also eval("{'a':1,'b':2}") = eval("{'b':2,'a':1}") e5 PROPERTY: eval(repr(x)) == x for x in Representable. TRUE (probably). QUOTES "For most object types, eval(repr(object)) == object." (Python 3.1.1 help(repr)) "The numbers 0.1 and 0.10000000000000001 and 0.1000000000000000055511151231257827021181583404541015625 are all approximated by 3602879701896397 / 2 ** 55. Since all of these decimal values share the same approximation, any one of them could be displayed while still preserving the invariant eval(repr(x)) == x." (Python 3.1.1 Tutorial, ch. 14) WARNING. A programmer who is not careful can define a class whose objects do support repr but violate this property. FACT A weaker version of r2: repr(x) == repr(y) implies x == y, for x,y in Representable is implied by e3, e5. Proof: 1. Assume repr(x) == repr(y). 2. Assume x,y in Evaluable. 3. So, repr(x), repr(y) in Evaluable, by 2. 4. So, eval(repr(x)) == eval(repr(y)) by 1, 3, e3. 5. So, x == y by 4, e5. COROLLARY r2 is implied by e2, e3, e5. Proof: Use the fact above and notice that e2 can be stated as Representable = Object. NOTE I asked myself if it would be interesting to define a function srepr (strong-repr) with properties analogous to those enjoyed by repr but such that: eval(srepr(x)) is x. Such a srepr would be somewhat as id. The idea not practical. It is feasible to define a function "give" which given object's id returns the object. It would have the property: give(id(x)) is x However if you store id(x) and then x gets garbage collected, give will not recreate it. ############################################################################## ABOUT id AND is i1 PROPERTY. id : Object --> Integer TRUE in Python 3.1 id is implemented as object's memory address. i2 PROPERTY. x is y iff id(x) == id(y) TRUE QUOTE: "The ‘is‘ operator compares the identity of two objects; the id() function returns an integer representing its identity" "The behavior of the is and is not operators cannot be customized; also they can be applied to any two objects and never raise an exception." i3 PROPERTY is is an equivalence relation on Object: refl: x is x for x in Object; symm: x is y implies y is x for x in Object; tran: x is y and y is z imply x is z for x in Object. NOTE. i2, =1, i2, i3, i4 only that is an equivalence on Comparable. i4 PROPERTY. Python functions preserve is for objects in Object. ** FALSE for functions which maintain a state in a non-local variable. TRUE (probably) if non-local variables are not used; NOTE. the property is true for the Python's == x1 is y1 and x2 is y2 and x1 = x2 implies y1 = y2. i5 PROPERTY: x is y implies x == y for x,y in Comparable TRUE (probably). NOTE: but not vice versa: x = [1,2]; x == copy.copy(x); x is not copy.copy(x). NOTE if f is a function f == f gives True. ############################################################################## ABOUT hash h1 PROPERTY: hash is a function to Integer. TRUE DEFINITION. Hashable = dom(hash) NOTE. class Hashable in module "collections" is different. PROPERTY restated. hash : Hashable --> Integer NOTE. Intuition: x in Hashable iff x is unchangeable (immutable object with immutable components at all levels.) TIS objects are hashable, LTIS are not. WARNING. A programmer who is not careful can define a class whose objects are unchangeable but do not support hash. h2 PROPERTY. x == y implies hash(x) == hash(y) for x,y in Hashable intersection Comparable. TRUE NOTE: =5 implies h1. QUOTES "Two objects with the same value have the same hash value." (help(hash)) "Numeric values that compare equal have the same hash value (even if they are of different types, as is the case for 1 and 1.0). (Python 3.1.1 Std Library reference: Functions)" WARNING. A programmer who is not careful can define a class whose objects support hash but violate this property. h3 PROPERTY. hash(x) == hash(y) implies x == y for x,y in Hashable intersection Comparable. ** FALSE (probably). QUOTE. "not necessarily true, but likely." (Python 3.1 help(hash)) NOTE: I tried to produce an argument for falsity without knowing the actual hash algorithm but I failed: Hash seems to produce integers in a finite range; if so it cannot give distinct values to infinitely many objects; however at any time there are only finitely many object in the memory. ** QUESTION: could we have a function prehash with properties h1-h3 and then define hash in terms of pre-hash so that it has a finite range? ############################################################################## ############################################################################## REFERENCES Python v3.1.2 documentation (2010), * The Python Language Reference, 3. Data model, http://docs.python.org/py3k/reference/datamodel.html 5. Expressions, http://docs.python.org/py3k/reference/expressions.html 5.14 Summary (operators), http://docs.python.org/py3k/reference/expressions.html#operator-summary * The Python Standard Library, 2. Built-in Functions, http://docs.python.org/py3k/library/functions.html 3. Built-in Constants, http://docs.python.org/py3k/library/constants.html 8. Data Types 8.3. collections -- Container datatypes 8.3.1. ABCs - abstract base classes, http://docs.python.org/py3k/library/collections.html#abcs-abstract-base-classes