Contact: fumanchu@aminus.org

Log in as guest/geniusql to create tickets

root/branches/ast/geniusql/logic.py

Revision 129 (checked in by fumanchu, 6 years ago)

Reinstated Expression "and" and "or" code.

  • Property svn:eol-style set to native
Line 
1 """First-class Expression objects.
2
3 This work, including the source code, documentation
4 and related data, is placed into the public domain.
5
6 The original author is Robert Brewer.
7
8 THIS SOFTWARE IS PROVIDED AS-IS, WITHOUT WARRANTY
9 OF ANY KIND, NOT EVEN THE IMPLIED WARRANTY OF
10 MERCHANTABILITY. THE AUTHOR OF THIS SOFTWARE
11 ASSUMES _NO_ RESPONSIBILITY FOR ANY CONSEQUENCE
12 RESULTING FROM THE USE, MODIFICATION, OR
13 REDISTRIBUTION OF THIS SOFTWARE.
14
15 Python evaluates expressions like any other language; however,
16 the expression itself cannot be 'passed around' easily--that is,
17 the expression itself is a code block, not a callable. In most cases,
18 this is not an issue: if an evaluation step needs to be 'first-class',
19 it's usually wrapped up in a function (sometimes anonymous), and
20 that function is passed. This allows lazy evaluation, for example.
21
22 In some cases, however, we wish to manipulate the actual logic of the
23 expression:
24     1. Inspection. Code might form an expression from user input,
25            then take secondary actions depending upon the operands.
26     2. Modification. For example, correction of an expression
27            if it raises an Exception.
28     3. Translation. A common case is converting Python expressions
29            into SQL.
30
31 It is possible to provide these benefits through some combination
32 of the standard modules parser and compiler, and/or via the builtins
33 eval() and exec(). However, these approaches require placing the
34 expression in a string, which introduces problems of substituting
35 user data; for example, ("x.Name == r'%s'" % user_data) will break if
36 user_data contains quote-marks. This is by far not the only example of
37 the abuses of eval(). Solutions using parser and compiler also tend to
38 be quite slow in pure Python.
39
40 This module takes the approach that the Python developer should be
41 able to form first-class Expressions directly from Python code.
42
43     "This, even if the rest were true, which it isn't, is patently
44      impossible, say the doubters."
45        -- The Restaurant at the End of the Universe, Douglas Adams
46
47 But we can come close.
48
49
50 Expression formation:
51
52     >>> import logic
53     >>> e = logic.Expression(lambda x: not (x.a == 3 and (x.b > 1 or x.b < -10)))
54     >>> e
55     logic.Expression(lambda x: not ((x.a == 3) and ((x.b > 1) or (x.b < -10))))
56     
57     You'll notice, in this first example, some extra parentheses in the final
58     lambda. The lambda has already undergone an explicit compile/decompile
59     step. These differences don't affect the logic in any way, but it's
60     impossible to guess the exact original syntax when decompiling.
61     
62     However, be advised of this IMPORTANT point. When you form an Expression
63     from a lambda, that lambda goes through a transformer which EARLY BINDS
64     everything it can. If we had included global or free variables in our
65     lambda, those would have been replaced with constants when the Expression
66     was formed. See codewalk.EarlyBinder for more details.
67     
68     We *can*, however, use and define arbitrary comparison functions,
69     such as containedby and startswith.
70
71
72 Lazy Evaluation:
73     >>> e = logic.Expression(lambda x: (x.a == 3) and (x.b > 1 or x.b < -10))
74     >>> class DumbObject(object):
75     ...     a = 3
76     ...     b = 5
77     ...     
78     >>> pass # Do some other things...
79     >>> e(DumbObject())
80     True
81     
82     When calling an Expression, it accepts any object instance(s),
83     and returns the truth value of itself, getting any named attributes
84     from the passed-in object(s). Notice that the passed-in objects do not
85     need to be instantiated prior to the construction of the Expression.
86
87
88 Late binding of arguments (lazier yet!):
89     >>> e = logic.Expression(lambda x, **kw: x.a == kw['Size'])
90     >>> class DumbObject(object):
91     ...     a = 3
92     ...     
93     >>> pass # Do some other things...
94     >>> e(DumbObject(), Size=3)
95     True
96     >>> e.bind_args(Size=3)
97     >>> e(DumbObject())
98     True
99     
100     If the lambda possesses a **kwargs argument in its signature, that
101     dictionary may be used to pass in late-bound locals. They may either
102     be passed when calling the Expression, or may be bound to the
103     Expression using the 'bind_args' method. If both are provided,
104     the passed-in kwargs will overwrite any bound kwargs.
105
106
107 Derivation (Decompilation) and Translation:
108     'Deriving' is the opposite of 'parsing'. The codewalk.LambdaDecompiler
109     class walks a function or code object and produces equivalent Python
110     code in a string.
111     
112     >>> e = logic.Expression(lambda x: x.a == 3 and (x.b > 1 or x.b < -10))
113     >>> codewalk.LambdaDecompiler(e.func).code()
114     'lambda x: not ((x.a == 3) and ((x.b > 1) or (x.b < -10)))'
115     
116     However, we are not limited to Python statements of our Expression!
117     Another decompiler might produce our Expression in another language;
118     this example produces a WHERE clause for SQL (a declarative language!):
119     
120     >>> e = logic.Expression(lambda x: x.Group == '3' and
121                              x.Date > datetime.date(2004, 2, 14) and
122                              x.Name.endswith('_'))
123     >>> ADOSQLDecompiler(e).code()
124     "([Group] = '3' and [Date] > #2/14/2004#) and [Name] Like '%\\_'"
125
126 Pickling:
127     The Expression object includes custom pickling code (__getstate__ and
128     __setstate__). You might notice that the function itself is *not*
129     pickled; instead, its code() method is called, which produces a
130     string representation of the function (decompilation). This makes
131     pickled Expressions much more stable across Python versions than,
132     say, storing the function's co_code. However, this presents a problem
133     when the Expression is unpickled: the function must be eval'ed and
134     run through an EarlyBinder again. When this occurs (in __setstate__),
135     some of the free variables which were present in func_globals at the
136     time of pickling may not be present when the Expression is unpickled.
137     For example, an Expression which is built in myapp.py may include
138     a Numarray object in its co_consts. When that Expression is
139     unpickled, its function is eval'ed within this module, not within
140     myapp.py; since this module does not import Numarray, it will not
141     be included in the func_globals of the reconstituted function, and
142     codewalk.EarlyBinder will fail on LOAD_GLOBAL.
143     
144     Therefore, code which uses this module must determine which objects
145     will be referenced as Expressions are unpickled. Any that are not
146     builtins need to be added to this module's "builtins" dict, so they
147     can be referenced in eval() when the Expression is unpickled.
148 """
149
150 from compiler.consts import *
151 import sys
152 from types import CodeType, FunctionType
153
154 builtins = {}
155
156 def _init():
157     """Add standard builtins to assist in unpickling.
158     
159     If they're not present (can't be imported), that's OK--someone might
160     want to build an app which doesn't use fixedpoints, for example.
161     """
162     try:
163         import datetime
164         builtins['datetime'] = datetime
165     except ImportError:
166         pass
167    
168     try:
169         import fixedpoint
170         from fixedpoint import FixedPoint
171         builtins['fixedpoint'] = fixedpoint
172         builtins['FixedPoint'] = FixedPoint
173     except ImportError:
174         pass
175    
176     try:
177         import decimal
178         from decimal import Decimal
179         builtins['decimal'] = decimal
180         builtins['Decimal'] = Decimal
181     except ImportError:
182         pass
183 _init()
184
185
186 from geniusql import codewalk, astwalk
187
188
189 class Aggregator(codewalk.Rewriter):
190     """Combine two code objects into one."""
191    
192     def __init__(self, obj):
193         codewalk.Rewriter.__init__(self, obj)
194         self.instr_index = [None] * len(self._bytecode)
195    
196     def combine(self, obj, conjunction):
197         obj = codewalk.Rewriter(obj)
198         bytecode = map(ord, obj.co_code)
199         newtarget = len(bytecode)
200        
201         self._bytecode.pop()      # RETURN_VALUE
202         self._bytecode.extend([conjunction, newtarget & 0xFF, newtarget >> 8])
203         self._bytecode.append(1)  # POP_TOP
204         self._bytecode.extend(bytecode)
205         self.instr_index[-1:] = [obj] * (newtarget + 4)
206        
207         # Expand self.co_argcount, co_nlocals if needed.
208         self.co_argcount = max(self.co_argcount, obj.co_argcount)
209         self.co_nlocals = max(self.co_nlocals, obj.co_nlocals)
210        
211         # Expand self.co_varnames list if needed.
212         for i, name in enumerate(obj.co_varnames):
213             if i >= len(self.co_varnames):
214                 self.co_varnames.append(name)
215        
216         # Add the **kwargs flag if present
217         if obj.co_flags & CO_VARKEYWORDS:
218             self.co_flags |= CO_VARKEYWORDS
219        
220         # Add the *args flag if present
221         if obj.co_flags & CO_VARARGS:
222             self.co_flags |= CO_VARARGS
223    
224     def and_combine(self, obj):
225         self.combine(obj, 111)
226    
227     def or_combine(self, obj):
228         self.combine(obj, 112)
229    
230     def visit_LOAD_ATTR(self, lo, hi):
231         src = self.instr_index[self.cursor]
232         if src:
233             value = src.co_names[lo + (hi << 8)]
234             newindex = self.name_index(value)
235             self.newcode[-2:] = [newindex & 0xFF, newindex >> 8]
236    
237     def visit_LOAD_CONST(self, lo, hi):
238         src = self.instr_index[self.cursor]
239         if src:
240             value = src.co_consts[lo + (hi << 8)]
241             newindex = self.const_index(value)
242             self.newcode[-2:] = [newindex & 0xFF, newindex >> 8]
243
244
245 class Expression(object):
246     """A filter for objects (an AST)."""
247    
248     def __init__(self, func=None, kwtypes=None, earlybind=True):
249         """Expression(func, [kwtypes={}]). func(obj, [**kw]) must return bool.
250         
251         func: a function, with positional args and optional keyword args,
252             which must return bool. If func is None, it is initialized to
253             "lambda x: True".
254         kwtypes: a dictionary of {keyword: type} pairs.
255         earlybind: if True (the default), the given function will be
256             rewritten, binding as many constants as possible into co_consts.
257             The only reason to ever set it to False is for performance,
258             and you must be *certain* there are no global or cell refs
259             in your function.
260         """
261         if func is None:
262             self.func = lambda *args, **kw: True
263             self.ast = astwalk.AST(ast.Const(True))
264             self.ast.star_args = "args"
265             self.ast.dstar_args = "kwargs"
266         else:
267             self.func = func
268             lp = astwalk.LambdaParser(func, env=builtins, reduce=earlybind)
269             lp.walk()
270             self.ast = lp.ast
271        
272         # Update func_globals so self.evaluate() works.
273         func.func_globals.update(builtins)
274        
275         if kwtypes is None:
276             kwtypes = {}
277         self.kwtypes = kwtypes
278         self.kwargs = {}
279    
280     def code(self):
281         """Return source code for self.func."""
282         if hasattr(self, 'func'):
283             decom = astwalk.LambdaDeparser(self.ast, env=builtins)
284             return decom.code()
285         else:
286             return 'function not yet loaded'
287    
288     def __repr__(self):
289         return 'logic.Expression(%s)' % self.code()
290    
291     def __and__(self, other):
292         """Logical-and this Expression with another."""
293         if not isinstance(other, Expression):
294             other = Expression(other)
295         ag = Aggregator(self.func)
296         ag.and_combine(other.func)
297         agfunc = ag.function()
298         newkwtypes = self.kwtypes.copy()
299         newkwtypes.update(other.kwtypes)
300         return Expression(agfunc, newkwtypes)
301     __add__ = __and__
302    
303     def __or__(self, other):
304         """Logical-or this Expression with another."""
305         if not isinstance(other, Expression):
306             other = Expression(other)
307         ag = Aggregator(self.func)
308         ag.or_combine(other.func)
309         agfunc = ag.function()
310         newkwtypes = self.kwtypes.copy()
311         newkwtypes.update(other.kwtypes)
312         return Expression(agfunc, newkwtypes)
313    
314     def bind_args(self, **kwargs):
315         """Set self.kwargs to a shallow copy of the given kwargs."""
316         self.kwargs = {}
317         self.kwargs.update(kwargs)
318    
319     def evaluate(self, *args, **kwargs):
320         """Return self.func(*args, **kwargs + self.kwargs)."""
321         kw = self.kwargs.copy()
322         kw.update(kwargs)
323         return self.func(*args, **kw)
324     __call__ = evaluate
325    
326     def __getstate__(self):
327         return (self.code(), self.kwtypes, self.kwargs)
328    
329     def __setstate__(self, state):
330         if len(state) == 2:
331             # Older versions of Expression had a 2-tuple.
332             func, self.kwtypes = state
333             self.kwargs = {}
334         else:
335             func, self.kwtypes, self.kwargs = state
336        
337         # The most difficult thing about Expressions is unpickling.
338         # Any func_globals at the time of pickling are lost, so any
339         # late-bound objects must be available at this point. Any
340         # such objects need to be injected into logic.builtins
341         # if you want them to be available here.
342         self.func = eval(func, builtins)
343         lp = astwalk.LambdaParser(self.func, env=builtins, reduce=True)
344         lp.walk()
345         self.ast = lp.ast
346    
347     def is_constant(self, value):
348         """Return True if self.func == (lambda: value), False otherwise."""
349         fc = self.func.func_code
350         return fc.co_code == 'd\x01\x00S' and fc.co_consts == (None, value)
351
352
353 def filter(**kwargs):
354     """Form an Expression from keyword arguments.
355     
356     Allows you to write:
357         e = logic.filter(a=3, b=1)
358     ...instead of:
359         e = logic.Expression(lambda x: x.a == 3 and x.b == 1)
360     """
361     items = kwargs.items()
362     co = []
363     kwlen = len(kwargs)
364     jump_target = (16 * kwlen) - 7
365     for i in xrange(1, kwlen + 1):
366         co += [124, 0, 0,
367                105, i, 0,
368                100, i, 0,
369                106, 2, 0,
370                # Point all jump (111) instructions at len(co) - 4
371                111, jump_target - (((i - 1) * 16) + 12), 0,
372                1,
373                ]
374     if kwargs:
375         # pop extraneous final JUMP and POP_TOP.
376         del co[-4:]
377     co.append(83)
378    
379     items = [('x', None)] + items
380     names, consts = zip(*items)
381    
382     # Form code object and function.
383     # code(argcount, nlocals, stacksize, flags, codestring,
384     #      constants, names, varnames,
385     #      filename, name, firstlineno, lnotab[, freevars[, cellvars]])
386     co = CodeType(1, 1, 2, 67, ''.join(map(chr, co)),
387                   consts, names, ('x', ), '', '<lambda>', 1, '')
388     func = FunctionType(co, {})
389     return Expression(func, earlybind=False)
390
391
392 def comparison(attr, cmp_op, criteria):
393     """Form an Expression lambda x: x.attr cmp_op criteria.
394     
395     Allows you to write:
396         e = logic.comparison('Size', cmp_op_index, 4)
397     ...instead of:
398         e = logic.Expression(lambda x: x.Size <= 4)
399     
400     This allows one to pass dynamic, isolated arguments, without having
401     to construct a lambda out of them first.
402     """
403     # cmp_op (from opcode):
404     # ('<', '<=', '==', '!=', '>', '>=', 'in', 'not in', 'is',
405     #  'is not', 'exception match', 'BAD')
406     if cmp_op < 0 or cmp_op > 11:
407         raise ValueError("The cmp_op argument must be between 0 and 11")
408    
409     if not isinstance(attr, str):
410         attr = str(attr)
411    
412     co_flags = codewalk.CO_NOFREE | codewalk.CO_OPTIMIZED | codewalk.CO_NEWLOCALS
413    
414     if sys.version_info >= (2, 5):
415         # 2.5 doesn't include arguments in co_names anymore,
416         idx = 0
417         names = (attr,)
418         # It also stopped prepending None to co_consts...
419         consts = (criteria,)
420         # ...and nested_scopes are now "always on".
421         co_flags |= codewalk.CO_NESTED
422     else:
423         idx = 1
424         names = ('x', attr)
425         consts = (None, criteria)
426    
427     co = [124, 0, 0,
428           105, idx, 0,
429           100, idx, 0,
430           106, cmp_op, 0,
431           83,
432           ]
433    
434     # Form code object and function.
435     # code(argcount, nlocals, stacksize, flags, codestring,
436     #      constants, names, varnames,
437     #      filename, name, firstlineno, lnotab[, freevars[, cellvars]])
438     co = CodeType(1, 1, 2, co_flags, ''.join(map(chr, co)),
439                   consts, names, ('x',), '', '<lambda>', 1, '')
440     func = FunctionType(co, {})
441     return Expression(func, earlybind=False)
442
443
444 def combine(expr, kwargs):
445     """Return a single Expression (or None) for the given expr and kwargs.
446     
447     The supplied expr may be a lambda, an Expression, or None.
448     The kwargs may be an empty dict.
449     """
450     if expr is not None and not isinstance(expr, Expression):
451         expr = Expression(expr)
452     if kwargs:
453         f = filter(**kwargs)
454         if expr is None:
455             expr = f
456         else:
457             expr += f
458     return expr
459
Note: See TracBrowser for help on using the browser.