Contact: fumanchu@aminus.org

Log in as guest/geniusql to create tickets

root/trunk/geniusql/logic.py

Revision 190 (checked in by fumanchu, 6 years ago)

Buglets in logic.

  • Property svn:eol-style set to native
Line 
1 """First-class Expression objects.
2
3 This work, including the source code, documentation
4 and related data, is placed into the public domain.
5
6 The original author is Robert Brewer.
7
8 THIS SOFTWARE IS PROVIDED AS-IS, WITHOUT WARRANTY
9 OF ANY KIND, NOT EVEN THE IMPLIED WARRANTY OF
10 MERCHANTABILITY. THE AUTHOR OF THIS SOFTWARE
11 ASSUMES _NO_ RESPONSIBILITY FOR ANY CONSEQUENCE
12 RESULTING FROM THE USE, MODIFICATION, OR
13 REDISTRIBUTION OF THIS SOFTWARE.
14
15 Python evaluates expressions like any other language; however,
16 the expression itself cannot be 'passed around' easily--that is,
17 the expression itself is a code block, not a callable. In most cases,
18 this is not an issue: if an evaluation step needs to be 'first-class',
19 it's usually wrapped up in a function (sometimes anonymous), and
20 that function is passed. This allows lazy evaluation, for example.
21
22 In some cases, however, we wish to manipulate the actual logic of the
23 expression:
24     1. Inspection. Code might form an expression from user input,
25            then take secondary actions depending upon the operands.
26     2. Modification. For example, correction of an expression
27            if it raises an Exception.
28     3. Translation. A common case is converting Python expressions
29            into SQL.
30
31 It is possible to provide these benefits through some combination
32 of the standard modules parser and compiler, and/or via the builtins
33 eval() and exec(). However, these approaches require placing the
34 expression in a string, which introduces problems of substituting
35 user data; for example, ("x.Name == r'%s'" % user_data) will break if
36 user_data contains quote-marks. This is by far not the only example of
37 the abuses of eval(). Solutions using parser and compiler also tend to
38 be quite slow in pure Python.
39
40 This module takes the approach that the Python developer should be
41 able to form first-class Expressions directly from Python code.
42
43     "This, even if the rest were true, which it isn't, is patently
44      impossible, say the doubters."
45        -- The Restaurant at the End of the Universe, Douglas Adams
46
47 But we can come close.
48
49
50 Expression formation:
51
52     >>> import logic
53     >>> e = logic.Expression(lambda x: not (x.a == 3 and (x.b > 1 or x.b < -10)))
54     >>> e
55     logic.Expression(lambda x: not ((x.a == 3) and ((x.b > 1) or (x.b < -10))))
56     
57     You'll notice, in this first example, some extra parentheses in the final
58     lambda. The lambda has already undergone an explicit compile/decompile
59     step. These differences don't affect the logic in any way, but it's
60     impossible to guess the exact original syntax when decompiling.
61     
62     However, be advised of this IMPORTANT point. When you form an Expression
63     from a lambda, that lambda goes through a transformer which EARLY BINDS
64     everything it can. If we had included global or free variables in our
65     lambda, those would have been replaced with constants when the Expression
66     was formed. See codewalk.EarlyBinder for more details.
67     
68     We *can*, however, use and define arbitrary comparison functions,
69     such as containedby and startswith.
70
71
72 Lazy Evaluation:
73     >>> e = logic.Expression(lambda x: (x.a == 3) and (x.b > 1 or x.b < -10))
74     >>> class DumbObject(object):
75     ...     a = 3
76     ...     b = 5
77     ...     
78     >>> pass # Do some other things...
79     >>> e(DumbObject())
80     True
81     
82     When calling an Expression, it accepts any object instance(s),
83     and returns the truth value of itself, getting any named attributes
84     from the passed-in object(s). Notice that the passed-in objects do not
85     need to be instantiated prior to the construction of the Expression.
86
87
88 Late binding of arguments (lazier yet!):
89     >>> e = logic.Expression(lambda x, **kw: x.a == kw['Size'])
90     >>> class DumbObject(object):
91     ...     a = 3
92     ...     
93     >>> pass # Do some other things...
94     >>> e(DumbObject(), Size=3)
95     True
96     >>> e.bind_args(Size=3)
97     >>> e(DumbObject())
98     True
99     
100     If the lambda possesses a **kwargs argument in its signature, that
101     dictionary may be used to pass in late-bound locals. They may either
102     be passed when calling the Expression, or may be bound to the
103     Expression using the 'bind_args' method. If both are provided,
104     the passed-in kwargs will overwrite any bound kwargs.
105
106
107 Derivation (Decompilation) and Translation:
108     'Deriving' is the opposite of 'parsing'. The codewalk.LambdaDecompiler
109     class walks a function or code object and produces equivalent Python
110     code in a string.
111     
112     >>> e = logic.Expression(lambda x: x.a == 3 and (x.b > 1 or x.b < -10))
113     >>> codewalk.LambdaDecompiler(e.func).code()
114     'lambda x: not ((x.a == 3) and ((x.b > 1) or (x.b < -10)))'
115     
116     However, we are not limited to Python statements of our Expression!
117     Another decompiler might produce our Expression in another language;
118     this example produces a WHERE clause for SQL (a declarative language!):
119     
120     >>> e = logic.Expression(lambda x: x.Group == '3' and
121                              x.Date > datetime.date(2004, 2, 14) and
122                              x.Name.endswith('_'))
123     >>> ADOSQLDecompiler(e).code()
124     "([Group] = '3' and [Date] > #2/14/2004#) and [Name] Like '%\\_'"
125
126 Pickling:
127     The Expression object includes custom pickling code (__getstate__ and
128     __setstate__). You might notice that the function itself is *not*
129     pickled; instead, its code() method is called, which produces a
130     string representation of the function (decompilation). This makes
131     pickled Expressions much more stable across Python versions than,
132     say, storing the function's co_code. However, this presents a problem
133     when the Expression is unpickled: the function must be eval'ed and
134     run through an EarlyBinder again. When this occurs (in __setstate__),
135     some of the free variables which were present in func_globals at the
136     time of pickling may not be present when the Expression is unpickled.
137     For example, an Expression which is built in myapp.py may include
138     a Numarray object in its co_consts. When that Expression is
139     unpickled, its function is eval'ed within this module, not within
140     myapp.py; since this module does not import Numarray, it will not
141     be included in the func_globals of the reconstituted function, and
142     codewalk.EarlyBinder will fail on LOAD_GLOBAL.
143     
144     Therefore, code which uses this module must determine which objects
145     will be referenced as Expressions are unpickled. Any that are not
146     builtins need to be added to this module's "builtins" dict, so they
147     can be referenced in eval() when the Expression is unpickled.
148 """
149
150 from compiler.consts import *
151 import sys
152 from types import CodeType, FunctionType
153
154 builtins = {}
155
156 def _init():
157     """Add standard builtins to assist in unpickling.
158     
159     If they're not present (can't be imported), that's OK--someone might
160     want to build an app which doesn't use fixedpoints, for example.
161     """
162     try:
163         import datetime
164         builtins['datetime'] = datetime
165     except ImportError:
166         pass
167    
168     try:
169         import fixedpoint
170         from fixedpoint import FixedPoint
171         builtins['fixedpoint'] = fixedpoint
172         builtins['FixedPoint'] = FixedPoint
173     except ImportError:
174         pass
175    
176     try:
177         import decimal
178         from decimal import Decimal
179         builtins['decimal'] = decimal
180         builtins['Decimal'] = Decimal
181     except ImportError:
182         pass
183 _init()
184
185
186 from geniusql import codewalk, astwalk
187
188
189 class Aggregator(codewalk.Rewriter):
190     """Combine two code objects into one."""
191    
192     def __init__(self, obj):
193         codewalk.Rewriter.__init__(self, obj)
194         self.instr_index = [None] * len(self._bytecode)
195    
196     def combine(self, obj, conjunction):
197         obj = codewalk.Rewriter(obj)
198         bytecode = map(ord, obj.co_code)
199         newtarget = len(bytecode)
200        
201         self._bytecode.pop()      # RETURN_VALUE
202         self._bytecode.extend([conjunction, newtarget & 0xFF, newtarget >> 8])
203         self._bytecode.append(1)  # POP_TOP
204         self._bytecode.extend(bytecode)
205         self.instr_index[-1:] = [obj] * (newtarget + 4)
206        
207         # Expand self.co_argcount, co_nlocals if needed.
208         self.co_argcount = max(self.co_argcount, obj.co_argcount)
209         self.co_nlocals = max(self.co_nlocals, obj.co_nlocals)
210        
211         # Expand self.co_varnames list if needed.
212         for i, name in enumerate(obj.co_varnames):
213             if i >= len(self.co_varnames):
214                 self.co_varnames.append(name)
215        
216         # Add the **kwargs flag if present
217         if obj.co_flags & CO_VARKEYWORDS:
218             self.co_flags |= CO_VARKEYWORDS
219        
220         # Add the *args flag if present
221         if obj.co_flags & CO_VARARGS:
222             self.co_flags |= CO_VARARGS
223    
224     def and_combine(self, obj):
225         self.combine(obj, 111)
226    
227     def or_combine(self, obj):
228         self.combine(obj, 112)
229    
230     def visit_LOAD_ATTR(self, lo, hi):
231         src = self.instr_index[self.cursor]
232         if src:
233             value = src.co_names[lo + (hi << 8)]
234             newindex = self.name_index(value)
235             self.newcode[-2:] = [newindex & 0xFF, newindex >> 8]
236    
237     def visit_LOAD_GLOBAL(self, lo, hi):
238         src = self.instr_index[self.cursor]
239         if src:
240             value = src.co_names[lo + (hi << 8)]
241             newindex = self.name_index(value)
242             self.newcode[-2:] = [newindex & 0xFF, newindex >> 8]
243    
244     def visit_LOAD_CONST(self, lo, hi):
245         src = self.instr_index[self.cursor]
246         if src:
247             value = src.co_consts[lo + (hi << 8)]
248             newindex = self.const_index(value)
249             self.newcode[-2:] = [newindex & 0xFF, newindex >> 8]
250    
251     def visit_LOAD_DEREF(self, lo, hi):
252         src = self.instr_index[self.cursor]
253         if src and hasattr(src, '_func'):
254             # name = self.co_freevars[lo + (hi << 8)]
255             value = src._func.func_closure[lo + (hi << 8)]
256             value = codewalk.deref_cell(value)
257             newindex = self.const_index(value)
258             self.tail(3, 'LOAD_CONST', newindex & 0xFF, newindex >> 8)
259
260
261 class Expression(object):
262     """A filter for objects (an AST)."""
263    
264     def __init__(self, func=None, kwtypes=None, earlybind=True):
265         """Expression(func, [kwtypes={}]). func(obj, [**kw]) must return bool.
266         
267         func: a function, with positional args and optional keyword args,
268             which must return bool. If func is None, it is initialized to
269             "lambda x: True".
270         kwtypes: a dictionary of {keyword: type} pairs.
271         earlybind: if True (the default), the given function will be
272             rewritten, binding as many constants as possible into co_consts.
273             The only reason to ever set it to False is for performance,
274             and you must be *certain* there are no global or cell refs
275             in your function.
276         """
277         if func is None:
278             self.func = lambda *args, **kw: True
279             self.ast = astwalk.AST(astwalk.ast.Const(True))
280             self.ast.star_args = "args"
281             self.ast.dstar_args = "kwargs"
282         else:
283             self.func = func
284             lp = astwalk.LambdaParser(func, env=builtins, reduce=earlybind)
285             lp.walk()
286             self.ast = lp.ast
287        
288         # Update func_globals so self.evaluate() works.
289         self.func.func_globals.update(builtins)
290        
291         if kwtypes is None:
292             kwtypes = {}
293         self.kwtypes = kwtypes
294         self.kwargs = {}
295    
296     def code(self):
297         """Return source code for self.func."""
298         if hasattr(self, 'func'):
299             decom = astwalk.LambdaDeparser(self.ast, env=builtins)
300             return decom.code()
301         else:
302             return 'function not yet loaded'
303    
304     def __repr__(self):
305         return 'logic.Expression(%s)' % self.code()
306    
307     def __and__(self, other):
308         """Logical-and this Expression with another."""
309         if not isinstance(other, Expression):
310             other = Expression(other)
311         ag = Aggregator(self.func)
312         ag.and_combine(other.func)
313         agfunc = ag.function()
314         newkwtypes = self.kwtypes.copy()
315         newkwtypes.update(other.kwtypes)
316         return Expression(agfunc, newkwtypes)
317     __add__ = __and__
318    
319     def __or__(self, other):
320         """Logical-or this Expression with another."""
321         if not isinstance(other, Expression):
322             other = Expression(other)
323         ag = Aggregator(self.func)
324         ag.or_combine(other.func)
325         agfunc = ag.function()
326         newkwtypes = self.kwtypes.copy()
327         newkwtypes.update(other.kwtypes)
328         return Expression(agfunc, newkwtypes)
329    
330     def bind_args(self, **kwargs):
331         """Set self.kwargs to a shallow copy of the given kwargs."""
332         self.kwargs = {}
333         self.kwargs.update(kwargs)
334    
335     def evaluate(self, *args, **kwargs):
336         """Return self.func(*args, **kwargs + self.kwargs)."""
337         kw = self.kwargs.copy()
338         kw.update(kwargs)
339         return self.func(*args, **kw)
340     __call__ = evaluate
341    
342     def __getstate__(self):
343         return (self.code(), self.kwtypes, self.kwargs)
344    
345     def __setstate__(self, state):
346         if len(state) == 2:
347             # Older versions of Expression had a 2-tuple.
348             func, self.kwtypes = state
349             self.kwargs = {}
350         else:
351             func, self.kwtypes, self.kwargs = state
352        
353         # The most difficult thing about Expressions is unpickling.
354         # Any func_globals at the time of pickling are lost, so any
355         # late-bound objects must be available at this point. Any
356         # such objects need to be injected into logic.builtins
357         # if you want them to be available here.
358         self.func = eval(func, builtins)
359         lp = astwalk.LambdaParser(self.func, env=builtins, reduce=True)
360         lp.walk()
361         self.ast = lp.ast
362    
363     def is_constant(self, value):
364         """Return True if self.func == (lambda: value), False otherwise."""
365         fc = self.func.func_code
366         return fc.co_code == 'd\x01\x00S' and fc.co_consts == (None, value)
367
368
369 def filter(**kwargs):
370     """Form an Expression from keyword arguments.
371     
372     Allows you to write:
373         e = logic.filter(a=3, b=1)
374     ...instead of:
375         e = logic.Expression(lambda x: x.a == 3 and x.b == 1)
376     """
377     items = kwargs.items()
378     co = []
379     kwlen = len(kwargs)
380     jump_target = (16 * kwlen) - 7
381     for i in xrange(1, kwlen + 1):
382         co += [124, 0, 0,
383                105, i, 0,
384                100, i, 0,
385                106, 2, 0,
386                # Point all jump (111) instructions at len(co) - 4
387                111, jump_target - (((i - 1) * 16) + 12), 0,
388                1,
389                ]
390     if kwargs:
391         # pop extraneous final JUMP and POP_TOP.
392         del co[-4:]
393     co.append(83)
394    
395     items = [('x', None)] + items
396     names, consts = zip(*items)
397    
398     # Form code object and function.
399     # code(argcount, nlocals, stacksize, flags, codestring,
400     #      constants, names, varnames,
401     #      filename, name, firstlineno, lnotab[, freevars[, cellvars]])
402     co = CodeType(1, 1, 2, 67, ''.join(map(chr, co)),
403                   consts, names, ('x', ), '', '<lambda>', 1, '')
404     func = FunctionType(co, {})
405     return Expression(func, earlybind=False)
406
407
408 def comparison(attr, cmp_op, criteria):
409     """Form an Expression lambda x: x.attr cmp_op criteria.
410     
411     Allows you to write:
412         e = logic.comparison('Size', cmp_op_index, 4)
413     ...instead of:
414         e = logic.Expression(lambda x: x.Size <= 4)
415     
416     This allows one to pass dynamic, isolated arguments, without having
417     to construct a lambda out of them first.
418     """
419     # cmp_op (from opcode):
420     # ('<', '<=', '==', '!=', '>', '>=', 'in', 'not in', 'is',
421     #  'is not', 'exception match', 'BAD')
422     if cmp_op < 0 or cmp_op > 11:
423         raise ValueError("The cmp_op argument must be between 0 and 11")
424    
425     if not isinstance(attr, str):
426         attr = str(attr)
427    
428     co_flags = codewalk.CO_NOFREE | codewalk.CO_OPTIMIZED | codewalk.CO_NEWLOCALS
429    
430     if sys.version_info >= (2, 5):
431         # 2.5 doesn't include arguments in co_names anymore,
432         idx = 0
433         names = (attr,)
434         # It also stopped prepending None to co_consts...
435         consts = (criteria,)
436         # ...and nested_scopes are now "always on".
437         co_flags |= codewalk.CO_NESTED
438     else:
439         idx = 1
440         names = ('x', attr)
441         consts = (None, criteria)
442    
443     co = [124, 0, 0,
444           105, idx, 0,
445           100, idx, 0,
446           106, cmp_op, 0,
447           83,
448           ]
449    
450     # Form code object and function.
451     # code(argcount, nlocals, stacksize, flags, codestring,
452     #      constants, names, varnames,
453     #      filename, name, firstlineno, lnotab[, freevars[, cellvars]])
454     co = CodeType(1, 1, 2, co_flags, ''.join(map(chr, co)),
455                   consts, names, ('x',), '', '<lambda>', 1, '')
456     func = FunctionType(co, {})
457     return Expression(func, earlybind=False)
458
459
460 def combine(expr, kwargs):
461     """Return a single Expression (or None) for the given expr and kwargs.
462     
463     The supplied expr may be a lambda, an Expression, or None.
464     The kwargs may be an empty dict.
465     """
466     if expr is not None and not isinstance(expr, Expression):
467         expr = Expression(expr)
468     if kwargs:
469         f = filter(**kwargs)
470         if expr is None:
471             expr = f
472         else:
473             expr += f
474     return expr
475
Note: See TracBrowser for help on using the browser.