Contact: fumanchu@aminus.org

Log in as guest/geniusql to create tickets

root/trunk/geniusql/logic.py

Revision 280 (checked in by fumanchu, 7 months ago)

Buglet in logic.comparison for Python 2.5.

  • Property svn:eol-style set to native
Line 
1 """First-class Expression objects.
2
3 This work, including the source code, documentation
4 and related data, is placed into the public domain.
5
6 The original author is Robert Brewer.
7
8 THIS SOFTWARE IS PROVIDED AS-IS, WITHOUT WARRANTY
9 OF ANY KIND, NOT EVEN THE IMPLIED WARRANTY OF
10 MERCHANTABILITY. THE AUTHOR OF THIS SOFTWARE
11 ASSUMES _NO_ RESPONSIBILITY FOR ANY CONSEQUENCE
12 RESULTING FROM THE USE, MODIFICATION, OR
13 REDISTRIBUTION OF THIS SOFTWARE.
14
15 Python evaluates expressions like any other language; however,
16 the expression itself cannot be 'passed around' easily--that is,
17 the expression itself is a code block, not a callable. In most cases,
18 this is not an issue: if an evaluation step needs to be 'first-class',
19 it's usually wrapped up in a function (sometimes anonymous), and
20 that function is passed. This allows lazy evaluation, for example.
21
22 In some cases, however, we wish to manipulate the actual logic of the
23 expression:
24     1. Inspection. Code might form an expression from user input,
25            then take secondary actions depending upon the operands.
26     2. Modification. For example, correction of an expression
27            if it raises an Exception.
28     3. Translation. A common case is converting Python expressions
29            into SQL.
30
31 It is possible to provide these benefits through some combination
32 of the standard modules parser and compiler, and/or via the builtins
33 eval() and exec(). However, these approaches require placing the
34 expression in a string, which introduces problems of substituting
35 user data; for example, ("x.Name == r'%s'" % user_data) will break if
36 user_data contains quote-marks. This is by far not the only example of
37 the abuses of eval(). Solutions using parser and compiler also tend to
38 be quite slow in pure Python.
39
40 This module takes the approach that the Python developer should be
41 able to form first-class Expressions directly from Python code.
42
43     "This, even if the rest were true, which it isn't, is patently
44      impossible, say the doubters."
45        -- The Restaurant at the End of the Universe, Douglas Adams
46
47 But we can come close.
48
49
50 Expression formation:
51
52     >>> import logic
53     >>> e = logic.Expression(lambda x: not (x.a == 3 and (x.b > 1 or x.b < -10)))
54     >>> e
55     logic.Expression(lambda x: not ((x.a == 3) and ((x.b > 1) or (x.b < -10))))
56     
57     You'll notice, in this first example, some extra parentheses in the final
58     lambda. The lambda has already undergone an explicit parse/deparse
59     step. These differences don't affect the logic in any way, but it's
60     impossible to guess the exact original syntax when deparsing.
61     
62     However, be advised of this IMPORTANT point. When you form an Expression
63     from a lambda, that lambda goes through a transformer which EARLY BINDS
64     everything it can. If we had included global or free variables in our
65     lambda, those would have been replaced with constants when the Expression
66     was formed. See codewalk.EarlyBinder for more details.
67     
68     We *can*, however, use and define arbitrary comparison functions,
69     such as containedby and startswith.
70
71
72 Lazy Evaluation:
73     >>> e = logic.Expression(lambda x: (x.a == 3) and (x.b > 1 or x.b < -10))
74     >>> class DumbObject(object):
75     ...     a = 3
76     ...     b = 5
77     ...     
78     >>> pass # Do some other things...
79     >>> e(DumbObject())
80     True
81     
82     When calling an Expression, it accepts any object instance(s),
83     and returns the truth value of itself, getting any named attributes
84     from the passed-in object(s). Notice that the passed-in objects do not
85     need to be instantiated prior to the construction of the Expression.
86
87
88 Late binding of arguments (lazier yet!):
89     >>> e = logic.Expression(lambda x, **kw: x.a == kw['Size'])
90     >>> class DumbObject(object):
91     ...     a = 3
92     ...     
93     >>> pass # Do some other things...
94     >>> e(DumbObject(), Size=3)
95     True
96     >>> e.bind_args(Size=3)
97     >>> e(DumbObject())
98     True
99     
100     If the lambda possesses a **kwargs argument in its signature, that
101     dictionary may be used to pass in late-bound locals. They may either
102     be passed when calling the Expression, or may be bound to the
103     Expression using the 'bind_args' method. If both are provided,
104     the passed-in kwargs will overwrite any bound kwargs.
105
106
107 Deparsing and Translation:
108     'Deparsing' is the opposite of 'parsing'. The codewalk.LambdaDeparser
109     class walks a function or code object and produces equivalent Python
110     code in a string.
111     
112     >>> e = logic.Expression(lambda x: x.a == 3 and (x.b > 1 or x.b < -10))
113     >>> e.code()
114     'lambda x: (x.a == 3) and ((x.b > 1) or (x.b < -10))'
115     
116     However, we are not limited to Python statements of our Expression!
117     Another deparser might produce our Expression in another language;
118     this example produces a WHERE clause for SQL (a declarative language!):
119     
120     >>> e = logic.Expression(lambda x: x.Group == '3' and
121                              x.Date > datetime.date(2004, 2, 14) and
122                              x.Name.endswith('_'))
123     >>> SQLServerDeparser([(None, t)], e, sqlserver.SQLServerTypeSet()).code()
124     "([Group] = '3' and [Date] > #2/14/2004#) and [Name] Like '%\\_'"
125
126 Pickling:
127     The Expression object includes custom pickling code (__getstate__ and
128     __setstate__). You might notice that the function itself is *not*
129     pickled; instead, its code() method is called, which produces a
130     string representation of the function (deparsing). This makes
131     pickled Expressions much more stable across Python versions than,
132     say, storing the function's co_code. However, this presents a problem
133     when the Expression is unpickled: the function must be eval'ed and
134     run through an EarlyBinder again. When this occurs (in __setstate__),
135     some of the free variables which were present in func_globals at the
136     time of pickling may not be present when the Expression is unpickled.
137     For example, an Expression which is built in myapp.py may include
138     a Numarray object in its co_consts. When that Expression is
139     unpickled, its function is eval'ed within this module, not within
140     myapp.py; since this module does not import Numarray, it will not
141     be included in the func_globals of the reconstituted function, and
142     codewalk.EarlyBinder will fail on LOAD_GLOBAL.
143     
144     Therefore, code which uses this module must determine which objects
145     will be referenced as Expressions are unpickled. Any that are not
146     builtins need to be added to this module's "builtins" dict, so they
147     can be referenced in eval() when the Expression is unpickled.
148 """
149
150 from compiler.consts import *
151 import sys
152 from types import CodeType, FunctionType
153
154 builtins = {}
155
156 def _init():
157     """Add standard builtins to assist in unpickling.
158     
159     If they're not present (can't be imported), that's OK--someone might
160     want to build an app which doesn't use fixedpoints, for example.
161     """
162     try:
163         import datetime
164         builtins['datetime'] = datetime
165     except ImportError:
166         pass
167    
168     try:
169         import fixedpoint
170         from fixedpoint import FixedPoint
171         builtins['fixedpoint'] = fixedpoint
172         builtins['FixedPoint'] = FixedPoint
173     except ImportError:
174         pass
175    
176     try:
177         import decimal
178         from decimal import Decimal
179         builtins['decimal'] = decimal
180         builtins['Decimal'] = Decimal
181     except ImportError:
182         pass
183 _init()
184
185
186 from geniusql import codewalk, astwalk
187
188
189 class Aggregator(codewalk.Rewriter):
190     """Combine two code objects into one."""
191    
192     def __init__(self, obj):
193         codewalk.Rewriter.__init__(self, obj)
194         self.instr_index = [None] * len(self._bytecode)
195    
196     def combine(self, obj, conjunction):
197         obj = codewalk.Rewriter(obj)
198         bytecode = map(ord, obj.co_code)
199         newtarget = len(bytecode)
200        
201         self._bytecode.pop()      # RETURN_VALUE
202         self._bytecode.extend([conjunction, newtarget & 0xFF, newtarget >> 8])
203         self._bytecode.append(1)  # POP_TOP
204         self._bytecode.extend(bytecode)
205         self.instr_index[-1:] = [obj] * (newtarget + 4)
206        
207         # Expand self.co_argcount, co_nlocals if needed.
208         self.co_argcount = max(self.co_argcount, obj.co_argcount)
209         self.co_nlocals = max(self.co_nlocals, obj.co_nlocals)
210        
211         # Expand self.co_varnames list if needed.
212         for i, name in enumerate(obj.co_varnames):
213             if i >= len(self.co_varnames):
214                 self.co_varnames.append(name)
215        
216         # Add the **kwargs flag if present
217         if obj.co_flags & CO_VARKEYWORDS:
218             self.co_flags |= CO_VARKEYWORDS
219        
220         # Add the *args flag if present
221         if obj.co_flags & CO_VARARGS:
222             self.co_flags |= CO_VARARGS
223    
224     def and_combine(self, obj):
225         self.combine(obj, 111)
226    
227     def or_combine(self, obj):
228         self.combine(obj, 112)
229    
230     def visit_LOAD_ATTR(self, lo, hi):
231         src = self.instr_index[self.cursor]
232         if src:
233             value = src.co_names[lo + (hi << 8)]
234             newindex = self.name_index(value)
235             self.newcode[-2:] = [newindex & 0xFF, newindex >> 8]
236    
237     def visit_LOAD_GLOBAL(self, lo, hi):
238         src = self.instr_index[self.cursor]
239         if src:
240             value = src.co_names[lo + (hi << 8)]
241             newindex = self.name_index(value)
242             self.newcode[-2:] = [newindex & 0xFF, newindex >> 8]
243    
244     def visit_LOAD_CONST(self, lo, hi):
245         src = self.instr_index[self.cursor]
246         if src:
247             value = src.co_consts[lo + (hi << 8)]
248             newindex = self.const_index(value)
249             self.newcode[-2:] = [newindex & 0xFF, newindex >> 8]
250    
251     def visit_LOAD_DEREF(self, lo, hi):
252         src = self.instr_index[self.cursor]
253         if src and hasattr(src, '_func'):
254             # name = self.co_freevars[lo + (hi << 8)]
255             value = src._func.func_closure[lo + (hi << 8)]
256             value = codewalk.deref_cell(value)
257             newindex = self.const_index(value)
258             self.tail(3, 'LOAD_CONST', newindex & 0xFF, newindex >> 8)
259
260
261 class Expression(object):
262     """A filter for objects (an AST)."""
263    
264     def __init__(self, func=None, kwtypes=None, earlybind=True):
265         """Expression(func, [kwtypes={}]). func(obj, [**kw]) must return bool.
266         
267         func: a function, with positional args and optional keyword args,
268             which must return bool. If func is None, it is initialized to
269             "lambda *args, **kwargs: True". If func is a dict, logic.filter
270             will be used to create an Expression from it.
271         kwtypes: a dictionary of {keyword: type} pairs.
272         earlybind: if True (the default), the given function will be
273             rewritten, binding as many constants as possible into co_consts.
274             The only reason to ever set it to False is for performance,
275             and you must be *certain* there are no global or cell refs
276             in your function.
277         """
278         if func is None:
279             self.func = lambda *args, **kw: True
280             self.ast = astwalk.AST(astwalk.ast.Const(True))
281             self.ast.star_args = "args"
282             self.ast.dstar_args = "kwargs"
283         else:
284             if isinstance(func, dict):
285                 func = _filter_func(**func)
286                 # It might be tempting to set earlybind to False, since
287                 # we've hand-generated our func, but we need to change
288                 # e.g. lambda: True from a Global to a Const so we don't
289                 # end up with the ast: Name('True').
290             self.func = func
291             lp = astwalk.LambdaParser(func, env=builtins, reduce=earlybind)
292             lp.walk()
293             self.ast = lp.ast
294        
295         # Update func_globals so self.evaluate() works.
296         self.func.func_globals.update(builtins)
297        
298         if kwtypes is None:
299             kwtypes = {}
300         self.kwtypes = kwtypes
301         self.kwargs = {}
302    
303     def code(self):
304         """Return source code for self.func."""
305         if hasattr(self, 'func'):
306             dep = astwalk.LambdaDeparser(self.ast, env=builtins)
307             return dep.code()
308         else:
309             return 'function not yet loaded'
310    
311     def __repr__(self):
312         return 'logic.Expression(%s)' % self.code()
313    
314     def __and__(self, other):
315         """Logical-and this Expression with another."""
316         if not isinstance(other, Expression):
317             other = Expression(other)
318         ag = Aggregator(self.func)
319         ag.and_combine(other.func)
320         agfunc = ag.function()
321         newkwtypes = self.kwtypes.copy()
322         newkwtypes.update(other.kwtypes)
323         return Expression(agfunc, newkwtypes)
324     __add__ = __and__
325    
326     def __or__(self, other):
327         """Logical-or this Expression with another."""
328         if not isinstance(other, Expression):
329             other = Expression(other)
330         ag = Aggregator(self.func)
331         ag.or_combine(other.func)
332         agfunc = ag.function()
333         newkwtypes = self.kwtypes.copy()
334         newkwtypes.update(other.kwtypes)
335         return Expression(agfunc, newkwtypes)
336    
337     def bind_args(self, **kwargs):
338         """Set self.kwargs to a shallow copy of the given kwargs."""
339         self.kwargs = {}
340         self.kwargs.update(kwargs)
341    
342     def evaluate(self, *args, **kwargs):
343         """Return self.func(*args, **kwargs + self.kwargs)."""
344         kw = self.kwargs.copy()
345         kw.update(kwargs)
346         return self.func(*args, **kw)
347     __call__ = evaluate
348    
349     def __getstate__(self):
350         return (self.code(), self.kwtypes, self.kwargs)
351    
352     def __setstate__(self, state):
353         if len(state) == 2:
354             # Older versions of Expression had a 2-tuple.
355             func, self.kwtypes = state
356             self.kwargs = {}
357         else:
358             func, self.kwtypes, self.kwargs = state
359        
360         # The most difficult thing about Expressions is unpickling.
361         # Any func_globals at the time of pickling are lost, so any
362         # late-bound objects must be available at this point. Any
363         # such objects need to be injected into logic.builtins
364         # if you want them to be available here.
365         self.func = eval(func, builtins)
366         lp = astwalk.LambdaParser(self.func, env=builtins, reduce=True)
367         lp.walk()
368         self.ast = lp.ast
369    
370     def is_constant(self, value):
371         """Return True if self.func == (lambda: value), False otherwise."""
372         fc = self.func.func_code
373         if fc.co_code in ('d\x00\x00S', 'd\x01\x00S'):
374             # LOAD_CONST 0/1 0 RETURN_VALUE
375             if len(fc.co_consts) == 2 and fc.co_consts[1] == value:
376                 # i.e., fc.co_consts == (docstring or None, value)
377                 return True
378         elif fc.co_code == 't\x00\x00S' and len(fc.co_consts) == 1:
379             # LOAD_CONST 0 0 RETURN_VALUE
380             return True
381         return False
382
383
384 def _filter_func(**kwargs):
385     """Return a new function object from the given kwargs.
386     
387     The new function will be of the form:
388         lambda x: k1 == v1 and k2 == v2...
389     
390     unless kwargs is empty, in which case it will be:
391         lambda x: True
392     """
393     if not kwargs:
394         return lambda x: True
395    
396     co = []
397     kwlen = len(kwargs)
398     jump_target = (16 * kwlen) - 7
399     for i in xrange(1, kwlen + 1):
400         co += [124, 0, 0,
401                105, i, 0,
402                100, i, 0,
403                106, 2, 0,
404                # Point all jump (111) instructions at len(co) - 4
405                111, jump_target - (((i - 1) * 16) + 12), 0,
406                1,
407                ]
408     if kwargs:
409         # pop extraneous final JUMP and POP_TOP.
410         del co[-4:]
411     co.append(83)
412    
413     # Python 2.5 doesn't include arguments in co_names anymore,
414     # but for some odd reason we still have to include 'x' in names.
415     names = tuple(['x'] + kwargs.keys())
416     consts = tuple([None] + kwargs.values())
417    
418     # Form code object and function.
419     # code(argcount, nlocals, stacksize, flags, codestring,
420     #      constants, names, varnames,
421     #      filename, name, firstlineno, lnotab[, freevars[, cellvars]])
422     co = CodeType(1, 1, 2, 67, ''.join(map(chr, co)),
423                   consts, names, ('x', ), '', '<lambda>', 1, '')
424     return FunctionType(co, {})
425
426 def filter(**kwargs):
427     """Form an Expression from keyword arguments.
428     
429     Allows you to write:
430         e = logic.filter(a=3, b=1)
431     ...instead of:
432         e = logic.Expression(lambda x: x.a == 3 and x.b == 1)
433     """
434     return Expression(_filter_func(**kwargs), earlybind=False)
435
436
437 def comparison(attr, cmp_op, criteria):
438     """Form an Expression lambda x: x.attr cmp_op criteria.
439     
440     Allows you to write:
441         e = logic.comparison('Size', cmp_op_index, 4)
442     ...instead of:
443         e = logic.Expression(lambda x: x.Size <= 4)
444     
445     This allows one to pass dynamic, isolated arguments, without having
446     to construct a lambda out of them first.
447     """
448     # cmp_op (from opcode):
449     # ('<', '<=', '==', '!=', '>', '>=', 'in', 'not in', 'is',
450     #  'is not', 'exception match', 'BAD')
451     if cmp_op < 0 or cmp_op > 11:
452         raise ValueError("The cmp_op argument must be between 0 and 11")
453    
454     if not isinstance(attr, str):
455         attr = str(attr)
456    
457     co_flags = codewalk.CO_NOFREE | codewalk.CO_OPTIMIZED | codewalk.CO_NEWLOCALS
458    
459     if sys.version_info >= (2, 5):
460         # 2.5 doesn't include arguments in co_names anymore,
461         idx = 0
462         names = (attr,)
463         # ...and nested_scopes are now "always on".
464         co_flags |= codewalk.CO_NESTED
465     else:
466         idx = 1
467         names = ('x', attr)
468    
469     if sys.version >= (2, 5):
470         consts = (criteria,)
471     else:
472         consts = (None, criteria)
473    
474     co = [124, 0, 0,
475           105, idx, 0,
476           100, idx, 0,
477           106, cmp_op, 0,
478           83,
479           ]
480    
481     # Form code object and function.
482     # code(argcount, nlocals, stacksize, flags, codestring,
483     #      constants, names, varnames,
484     #      filename, name, firstlineno, lnotab[, freevars[, cellvars]])
485     co = CodeType(1, 1, 2, co_flags, ''.join(map(chr, co)),
486                   consts, names, ('x',), '', '<lambda>', 1, '')
487     func = FunctionType(co, {})
488     return Expression(func, earlybind=False)
489
490
491 def combine(expr, kwargs):
492     """Return a single Expression (or None) for the given expr and kwargs.
493     
494     The supplied expr may be a lambda, an Expression, or None.
495     The kwargs may be an empty dict.
496     """
497     if expr is not None and not isinstance(expr, Expression):
498         expr = Expression(expr)
499     if kwargs:
500         f = filter(**kwargs)
501         if expr is None:
502             expr = f
503         else:
504             expr += f
505     return expr
506
Note: See TracBrowser for help on using the browser.