Contact: fumanchu@aminus.org

Log in as guest/dejavu to create tickets

I think I've seen this ORM somewhere before...

root/trunk/doc/modeling.html

Revision 43 (checked in by fumanchu, 8 years ago)

1. Changed UnitProperty?.hints['Size'] to 'bytes'. SM's should now assume infinite bytes unless told otherwise.
2. Abstracted adapters into db.py.
3. Adapter coerce methods now take a coltype arg.
4. Changed safe_name functions to SM.identifier methods.
5. Bugfix: COMPARE_OP now uses op indices not values.
6. Moved len() from CALL_FUNCTION to decompiler.functions.
7. Added db.ConstWrapper? to help with LOAD_CONST corner cases.
8. Added MYSQL SM and test suite.

Line 
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
2    "http://www.w3.org/TR/xhtml1/DTD/strict.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4
5 <head>
6     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
7     <title>Dejavu: Modeling your Application</title>
8     <link href='dejavu.css' rel='stylesheet' type='text/css' />
9 </head>
10
11 <body>
12
13 <h2>Application Developers: Using Dejavu to Construct a Domain Model</h2>
14
15 <h3>Units</h3>
16 <p>When constructing a Domain Model for your application, you will want
17 to distinguish between objects that will be persisted and objects that
18 will not. By registering a subclass of <tt>dejavu.Unit</tt>, you allow
19 instances of that subclass to be persisted.</p>
20
21 <p>Before you can register your Unit class, you must create it:
22 <pre>import dejavu
23 class Printer(dejavu.Unit): pass</pre>
24 This is all you need for a fully-functioning Unit class. There are
25 no methods or attributes that you are required to override; simply
26 subclass from <tt>Unit</tt>. However, this is a fairly uninteresting
27 class. It doesn't provide any functionality other than what <tt>Unit</tt>
28 already provides. The first thing we will probably want to add to our
29 new class is persistent data.</p>
30
31 <h4>UnitProperty</h4>
32 <p>Once you have defined a persistent class (by subclassing <tt>Unit</tt>),
33 you need to make another decision. Rather than persist the entire object
34 <tt>dict</tt>, you specify a subset of persistent attributes by using
35 <tt>UnitProperty</tt>, a data descriptor. If you've used Python's builtin
36 property() construct, you've used descriptors before.</p>
37
38 <p>We might enhance our Printer example thusly:
39 <pre>from dejavu import Unit, UnitProperty
40 class Printer(Unit):
41     Manufacturer = UnitProperty(unicode)
42     ColorCopies = UnitProperty(bool)
43     PPM = UnitProperty(float)</pre>
44 This adds three persistent attributes to our <tt>Printer</tt> objects,
45 each with a different datatype. In addition, every subclass of <tt>Unit</tt>
46 inherits an 'ID' property, an int.</p>
47
48 <p>When you get and set <tt>UnitProperty</tt> attributes, they behave just
49 like any other attributes:
50 <pre>>>> p = Printer()
51 >>> p.PPM = 25
52 >>> p.PPM
53 25.0</pre>
54 However, you will notice right away that the int value we provided has been
55 coerced to a float behind the scenes. This is because we specified the PPM
56 attribute as a 'float' type when we created it. The value of a Unit
57 Property is restricted to the type which you specify. The only other valid
58 value for a Unit Property is None; any Property may be None at any time,
59 and in fact, all Properties are None until you assign values to them:
60 <pre>>>> p.ColorCopies is None
61 True</pre></p>
62
63 <h4>Unit ID's</h4>
64 <p>The <tt>Unit</tt> base class possesses a single Unit Property, an int
65 named 'ID'. If you wish to use ID's of a different type, simply override
66 the ID attribute in your subclass:
67 <pre>class Printer(Unit):
68     ID = UnitProperty(unicode)</pre>
69 Every Unit must possess an ID property. This ensures that each Unit within
70 the system is unique.</p>
71
72 <h4>Creating and Populating Properties</h4>
73 <p>In addition to defining Unit Properties within your class body,
74 you can define them after the class body has been executed via
75 the classmethod <tt>set_property()</tt>. For example, the following
76 two classes are equivalent:
77 <pre>class Publication(Unit):
78     Content = UnitProperty(unicode)
79
80 class Publication(Unit): pass
81 Publication.set_property('Content', unicode)</pre>
82
83 Declarations outside of the class body allow more dynamic setting of
84 Unit properties. You can define multiple properties at once via
85 the <tt>set_properties()</tt> classmethod:
86
87 <pre>class Publication(Unit): pass
88 Publication.set_properties({'Content': unicode,
89                             'Publisher': unicode,
90                             'Year': int,
91                             })</pre>
92 </p>
93
94 <p>You also have options when populating Unit Properties. The standard way
95 is simply to reference them as normal Python instance attributes. However,
96 you may also use the <tt>adjust()</tt> method to modify multiple properties
97 at once; pass in keyword arguments which match the properties you wish to
98 modify. Keyword arguments also work when instantiating the object. For
99 example, the following three code snippets are equivalent:
100
101 <pre>pub = Publication()
102 pub.Publisher = 'Walter J. Black'
103 pub.Year = 1928
104
105 pub = Publication()
106 pub.adjust(Publisher='Walter J. Black', Year=1928)
107
108 pub = Publication(Publisher='Walter J. Black', Year=1928)</pre>
109 </p>
110
111 <h4>Unit Properties are First-Class Objects</h4>
112 <p>Like many descriptors, Unit Properties behave differently when you access
113 them from the class, rather than from an instance as above. When calling
114 them from the class, you receive the <tt>UnitProperty</tt> object itself,
115 rather than its value for a given instance. That is,
116 <pre>>>> c = Printer.ColorCopies
117 >>> c
118 &lt;dejavu.UnitProperty object at 0x01112970></pre>
119 This is significant, because it allows us to store metadata about the
120 property itself:
121 <pre>>>> c.key, c.index, c.type, c.hints
122 ('ColorCopies', False, &lt;type 'bool'>, {})</pre>
123 The <tt>key</tt> attribute is merely the property's canonical name. The
124 <tt>index</tt> value tells Storage Managers whether or not to index the
125 column. The <tt>type</tt> attribute limits property values to instances
126 of that type (or <tt>None</tt>). Finally, the <tt>hints</tt> dictionary
127 provides hints to Storage Managers to help optimize storage. A common use,
128 for example, is to inform Managers that would usually store unicode strings
129 as strings of length 255, that a particular value should be a larger object;
130 this is done with a 'bytes' mapping, such as <tt>hints = {u'bytes': 0}</tt>,
131 where 0 implies no limit.</p>
132
133 <p>When you define a UnitProperty instance, you can pass in these extra
134 attributes. The signature for UnitProperty is <tt>(type=unicode,
135 index=False, hints={}, key=None)</tt>. Supply any, all, or none of them
136 as needed.</p>
137
138 <h4>Triggers</h4>
139 <p>In addition, each UnitProperty has a <tt>pre</tt> and <tt>post</tt>
140 attribute, which default to None. If you override these with methods
141 in a subclass of <tt>UnitProperty</tt>, they will be called when setting
142 a new value for that property, either before (pre) or after (post) the
143 new value is set. For example:
144 <pre>class DatedProperty(UnitProperty):
145     def post(self, unit, value):
146         unit.Date = datetime.datetime.now().replace(microsecond=0)
147         parent = unit.first(Forum)
148         if parent:
149             parent.Date = unit.Date
150
151 class Topic(Unit):
152     Date = UnitProperty(datetime.date)
153     Content = DatedProperty()
154     ForumID = UnitProperty(int)
155
156 class Forum(Unit):
157     Date = UnitProperty(datetime.date)
158
159 associate(Topic, 'ForumID', Forum, 'ID')</pre>
160 In this example, whenever Topic().Content is set, the <tt>post</tt>
161 method will be called and the object's <tt>Date</tt> attribute will
162 be modified. Then, the Topic's parent Forum is looked up and <i>its</i>
163 <tt>Date</tt> is modified.</p>
164
165 <p>As with any trigger system, you need to be careful not to have triggers
166 called out of order. For example, if a user changes both the ForumID and
167 Content properties in a single operation (like a web page submit), the old
168 Forum will be incorrectly modified if the Content property is applied
169 first. I don't have any cool tools built into Dejavu to help you with
170 this, but I'm open to suggestions.</p>
171
172 <h4>Registration of Unit Classes</h4>
173 <p>In addition to defining your Unit class, you must also register that
174 class with your application's <tt>Arena</tt> object. Each class which
175 you want Dejavu to manage must be passed to <tt>Arena.register(cls)</tt>.
176 If you create a module with multiple classes, you can register them all
177 at once with <tt>Arena.register_all(globals())</tt>. It will grab any
178 Unit subclasses out of your module's globals() (or any other mapping
179 you pass to <tt>register_all</tt>) and register them.</p>
180
181 <h3>Sandboxes</h3>
182 <p>During the life of a client connection, your application should create
183 and use a <tt>Sandbox</tt> to manage the set of "live" Units. A Sandbox
184 manages the in-memory lifecycle of Units: creation, identity, mutation, and
185 destruction. Sandboxes route persistence operations on Units to the correct
186 Storage Manager.</p>
187
188 <p>You can create Sandbox objects directly. They take a single argument, the
189 top-level <tt>Arena</tt> object. Arenas also provide a convenience function,
190 <tt>new_sandbox</tt>, which does this for you. The following lines are
191 equivalent:
192 <pre>box = Sandbox(myArena)
193
194 box = myArena.new_sandbox()</pre>
195 You might often choose the latter when you have a reference to the Arena
196 object, and would rather avoid importing dejavu yet again just to obtain
197 the Sandbox class.</p>
198
199 <h4>Memorizing Units</h4>
200 <p>When you create a Unit instance, it exists in isolation. There is no
201 connection between that Unit and storage; your Unit will not be persisted,
202 because Dejavu doesn't yet possess a reference to your Unit. To provide
203 that link, you <i>memorize</i> your Unit (or rather, you tell your Sandbox
204 to memorize it):
205 <pre>class Publisher(Unit):
206     City = UnitProperty(unicode)
207
208 p = Publisher(ID='Walter J. Black')
209 box.memorize(p)</pre></p>
210
211 <p>Memorization does several things. First, it places your new Unit into
212 your Arena. That Unit instance will now be persisted by the appropriate
213 Storage Manager. It can be recalled from storage when needed, using the
214 built-in Expression syntax. It may have been given an ID (see
215 <u>Sequencing</u>, below). Memorization also makes your Unit
216 <i>concrete</i>; that is, your Unit will now possess a <tt>sandbox</tt>
217 attribute. Units whose <tt>sandbox</tt> attribute is not set (is None)
218 have no relationships, and their Unit Property triggers (if any) will
219 not fire.</p>
220
221 <p>You may define special methods on your Units to provide start-of-life
222 behaviors. If a Unit possesses an <tt>on_memorize</tt> method, it will
223 be called after the Unit has been 'reserved' in storage, and after the
224 Unit has ben placed in the Sandbox cache.</p>
225
226 <h4>Sequencing</h4>
227 <p>Every <tt>Unit</tt> has an <tt>ID</tt> property. The default ID property
228 is of type <tt>int</tt>; however, you can override that to whatever type
229 you like. As long as you provide your own IDs for Units, nothing will
230 break--you can memorize and recall Units without problems. However, if
231 you memorize a Unit with an ID of <tt>None</tt>, the Sandbox may attempt
232 to provide an ID for it.</p>
233
234 <p>The <tt>Unit</tt> base class possesses a <tt>sequencer</tt> attribute
235 to help Sandboxes generate new IDs. The default value is an instance of
236 <tt>UnitSequencerInteger</tt>, which examines all existing Units, finds
237 the maximum integer ID, adds 1, and uses that value for the new ID.</p>
238
239 <p>The other useful Sequencer is <tt>UnitSequencerNull</tt>, which simply
240 raises an error when asked to generate an ID. If your ID's are strings,
241 you'll probably want to make that class' <tt>.sequencer</tt> one of
242 these, and form ID values in your own code.</p>
243
244 <h4>Recalling</h4>
245 <p>Once you have memorized a Unit or two, you will probably want to
246 recall them at some point. Sandboxes possess two member functions to
247 accomplish this.</p>
248
249 <h5>recall()</h5>
250 <p>First, the appropriately named <tt>recall(cls, expr)</tt> function.
251 This is the full-blown query method. As a first argument, you pass it the
252 class (<b>not</b> the name of the class, but the actual class) of which you
253 expect to retrieve instances. The second argument should be an instance
254 of <tt>dejavu.logic.Expression</tt>, an object which encapsulates your
255 specific query (see <u>Expressions</u>, next). An example recall operation:
256 <pre>>>> e = logic.Expression(lambda x: x.Year == 1928)
257 >>> units = box.recall(Publication, e)
258 >>> [x.Title for x in units]
259 [u'The Giant Horse of Oz', u'Kai Lung Unrolls His Mat',
260  u'Tarzan, The Lord of the Jungle']
261 </pre>
262 If you do not supply an Expression, all Units of the given Unit class
263 will be retrieved. Notice that the return value is *not* a list; it is a
264 generator (or other iterable). You must iterate over it to retrieve all
265 values. By returning an iterator, we allow some Storage Managers to load
266 Units in a more lazy fashion. If this is a huge burden for you, let me
267 know; I might be convinced to add a <tt>recall_list</tt> method.</p>
268
269 <p>The <tt>recall</tt> method will take additional arguments in pairs of
270 <tt>cls</tt>, <tt>expr</tt>. This feature isn't fully developed yet.
271 It's designed to emulate JOINs, returning units which match each expr
272 and are related.</p>
273
274 <p>If your Unit class defines an <tt>on_recall()</tt> method, it will be
275 called when each Unit has been loaded from storage (at the end of the
276 recall process). Once the unit is loaded into a Sandbox, however,
277 <tt>on_recall</tt> will not be called; it's only called at the Sandbox/SM
278 boundary. If <tt>on_recall</tt> raises <tt>UnrecallableError</tt>, the
279 unit will not be yielded back to the caller, nor placed in the Sandbox
280 cache.</p>
281
282 <h5>unit()</h5>
283 <p>The <tt>recall</tt> method can be verbose. When you want a one-liner
284 and only expect a single Unit, use the <tt>unit(cls, **kw)</tt> method
285 of Sandboxes. Again, you pass the class of Units you wish to retrieve
286 as the first argument. Then, supply keyword arguments of the form
287 "property_name=value". The method will form an equivalent Expression
288 for you from the keyword args. For example:
289 <pre>>>> book = box.unit(Publication, ID=1)
290 >>> if book:
291 ...     print book.Title
292 u'Ladies in Hades'</pre>
293 If a Unit is not found that matches the criteria, None is returned.
294 If multiple Units match the criteria, only the first one is returned
295 (although the rest are probably loaded into memory).</p>
296
297 <h4>Forgetting and Repressing</h4>
298 <p>To <i>forget</i> a Unit is to destroy it forever. You have two options
299 for forgetting Units: you can call <tt>Sandbox().forget(unit)</tt> or
300 the simpler version, <tt>Unit().forget()</tt>. Either of these will clear
301 the Unit from the Sandbox' cache, and the Sandbox will tell the appropriate
302 Storage Manager to destroy the stored Unit data. If a Unit has not yet
303 been memorized, you do not need to forget it.</p>
304
305 <p>In some circumstances, you may wish to only clear the Unit from the
306 Sandbox without destroying it. You can do this by calling either
307 <tt>Sandbox().repress(unit)</tt> or the simpler version,
308 <tt>Unit().repress()</tt>.</p>
309
310 <p>You may define special methods on your Units to provide end-of-life
311 behaviors. If a Unit possesses an <tt>on_forget</tt> method, it will
312 be called after the Unit has been destroyed. If a Unit possesses an
313 <tt>on_repress</tt> method, it will be called <i>before</i> the Unit
314 has been repressed. I'm sure there was a good reason for this
315 disparity, but I've forgotten (or perhaps repressed) it.</p>
316
317 <h4>Flushing Sandboxes</h4>
318 <p>When the client connection has closed, you should <i>flush</i> the
319 Sandbox caches. In general, a single call to <tt>Sandbox().flush_all()</tt>
320 will do the trick. Notice that flushing calls <tt>repress()</tt> for each
321 Unit in the Sandbox, and any <tt>on_repress()</tt> triggers will be
322 executed.</p>
323
324
325 <h4>Aggregate Functions</h4>
326 <p>Sandboxes also provide a <tt>distinct(cls, attrs, expr=None)</tt>
327 function. This returns values, rather than Units. Put simply, it returns
328 all distinct values for the given attribute(s) of the Unit class provided.
329 If only one attribute is specified, a list of values will be returned.
330 If more than one attribute is specified, a zipped list will be returned
331 of all distinct existing combinations. Providing an expr argument (an
332 <tt>Expression</tt> object, see below) will filter the set of Units before
333 obtaining distinct values.</p>
334
335 <p>The <tt>distinct</tt> function can also be used as a <tt>count</tt>
336 function by passing attrs = ['ID']. Sandboxes provide a
337 <tt>count(cls, expr)</tt> method which does just this.</p>
338
339 <h3>Querying</h3>
340 <p>When you retrieve Units, you often don't want to load the entire set for
341 a given class. In Dejavu, you filter the set according to the UnitProperty
342 attributes for each object. Naturally, there must be a way to express
343 the filter you intend. Dejavu actually provides three ways, all in the
344 <tt>dejavu.logic</tt> module: <tt>Expression</tt>,
345 <tt>filter</tt>, and <tt>comparison</tt>.</p>
346
347 <h4>The <tt>Expression</tt> class</h4>
348 <p>Regardless of which technique you use to express your filter, you're
349 going to end up with a <tt>logic.Expression</tt> object. You can build
350 an Expression directly, passing a single lambda as an argument:
351 <pre>>>> from dejavu import logic
352 >>> import datetime
353 >>> f = lambda x: x.Date >= datetime.date(2004, 3, 1)
354 >>> e = logic.Expression(f)
355 >>> e
356 logic.Expression(lambda x: x.Date >= datetime.date(2004, 3, 1))</pre>
357 Neat, eh? I worked hard on that __repr__. ;)</p>
358
359 <p>It may be obvious, but we'll be explicit, here. The lambda which you pass
360 into an Expression must possess a single positional argument, which will
361 always be bound to a Unit instance. In the example above, it's named 'x',
362 but you can use any name you like. Using lambdas as a base means that we
363 can simply call <tt>Expression.evaluate(unit)</tt>, and receive a boolean
364 value indicating whether our Unit "passes the test". Attribute lookups on
365 our 'x' object will apply to Unit Properties for that Unit object.
366 That is, <tt>x.Date</tt> becomes <tt>unit.Date</tt>.</p>
367
368 <h4>Early binding</h4>
369 <p>What is not obvious from the above code snippet is perhaps the <b>most
370 important aspect</b> of Expressions: any globals or cell references (from
371 closures) in the supplied lambda get <b>bound early</b>. Compare the
372 following disassemblies:
373 <pre>>>> import dis
374 >>> dis.dis(f)
375   1           0 LOAD_FAST                0 (x)
376               3 LOAD_ATTR                1 (Date)
377               6 LOAD_GLOBAL              2 (datetime)
378               9 LOAD_ATTR                3 (date)
379              12 LOAD_CONST               1 (2004)
380              15 LOAD_CONST               2 (3)
381              18 LOAD_CONST               3 (1)
382              21 CALL_FUNCTION            3
383              24 COMPARE_OP               5 (>=)
384              27 RETURN_VALUE       
385 >>> dis.dis(e.func)
386   1           0 LOAD_FAST                0 (x)
387               3 LOAD_ATTR                1 (Date)
388               6 LOAD_CONST               6 (datetime.date(2004, 3, 1))
389               9 COMPARE_OP               5 (>=)
390              12 RETURN_VALUE       
391 </pre>
392 As you can see, the function itself references the global 'datetime' module.
393 Once we wrap it in the Expression, however, it becomes a constant! Thanks to
394 Raymond Hettinger for inspiring this solution <a href='#hettinger'>[1]</a>.
395 Early binding, however, implies two consequences:</p>
396
397 <p>First, any globals or cell references must be present in the lambda's
398 scope when it is passed into Expression(). This is the norm and shouldn't
399 require too much thought from you when you write Expressions. In the
400 example above, we simply imported <tt>datetime</tt> as you would expect.</p>
401
402 <p>Second, any globals or cell references must <b>also</b> be present in
403 the <tt>logic</tt> module's globals when the Expression is unpickled.
404 Pickling occurs when Expressions are sent over sockets, and also if
405 Expressions are themselves persisted to storage (for example, see
406 <u>Unit Engines</u>, below). This means your application should inject
407 globals into the <tt>logic</tt> module. Note that the <tt>logic</tt> module
408 already tries to import <tt>datetime</tt>, <tt>fixedpoint</tt> and
409 <tt>decimal</tt>.</p>
410
411 <h4>External functions within Expressions</h4>
412 <p>Dejavu provides additional functions which can be used in Expressions.
413 For example, you can construct an Expression like:
414 <pre>logic.Expression(lambda x: x.Size < 3 and x.Date > dejavu.today())</pre>
415 In this example, the <tt>today()</tt> function breaks convention and is
416 actually <b>bound late</b>. That is, if you construct this Expression now
417 and use it six months later, the value of <tt>today()</tt> will change.
418 Storage Managers "know about" these dejavu functions, and can use them
419 to build more appropriate queries. Here are the functions supplied by
420 the <tt>dejavu</tt> module:</p>
421
422 <table>
423 <tr><th>Function</th><th>Late bound?</th><th>Description</th></tr>
424 <tr>
425     <td><tt>icontains(a, b)</tt></td>
426     <td></td>
427     <td>Case-insensitive test b in a. Note the operand order.</td>
428 </tr>
429 <tr>
430     <td><tt>icontainedby(a, b)</tt></td>
431     <td></td>
432     <td>Case-insensitive test a in b. Note the operand order.</td>
433 </tr>
434 <tr>
435     <td><tt>istartswith(a, b)</tt></td>
436     <td></td>
437     <td>True if a starts with b (case-insensitive), False otherwise.</td>
438 </tr>
439 <tr>
440     <td><tt>iendswith(a, b)</tt></td>
441     <td></td>
442     <td>True if a ends with b (case-insensitive), False otherwise.</td>
443 </tr>
444 <tr>
445     <td><tt>ieq(a, b)</tt></td>
446     <td></td>
447     <td>True if a == b (case-insensitive), False otherwise.</td>
448 </tr>
449 <tr>
450     <td><tt>year(value)</tt></td>
451     <td></td>
452     <td>The year attribute of a date. If value is None, return None.</td>
453 </tr>
454 <tr>
455     <td><tt>now()</tt></td>
456     <td>Y</td>
457     <td>datetime.datetime.now()</td>
458 </tr>
459 <tr>
460     <td><tt>today()</tt></td>
461     <td>Y</td>
462     <td>datetime.date.today()</td>
463 </tr>
464 <tr>
465     <td><tt>iscurrentweek(value)</tt></td>
466     <td>Y</td>
467     <td>If value is in the current week, return True, else False.</td>
468 </tr>
469 </table>
470
471 <p>It is possible for you, the application developer, to define your
472 own external functions. However, because Storage Managers are unaware
473 of your new functions, they will not be able to optimize their use;
474 instead, they will simply retrieve a larger set of objects from storage,
475 evaluate each one against the function you provide, and return those
476 Units which match your function. This isn't necessarily a bad thing;
477 it provides the same functionality as if you wrote the test inline
478 within your own code. By making that test a logic function, you allow
479 it to be stored in Engine <i>rules</i> (see <u>Unit Engines</u>,
480 below).</p>
481
482 <h4>Combining Expressions</h4>
483 <p>Expressions are combinable; by using the <tt>&</tt> operator, the two
484 expressions are combined with an adjoining logical "and". For example:
485 <pre>>>> a = logic.Expression(lambda x: x.Size > 3)
486 >>> b = logic.Expression(lambda x: x.Size <= 15)
487 >>> c = a & b
488 >>> c
489 logic.Expression(lambda x: (x.Size > 3) and (x.Size <= 15))</pre>
490 The <tt>+</tt> operator works just like the <tt>&</tt> operator. The
491 <tt>|</tt> operator combines the two Expressions with a logical 'or'.</p>
492
493 <h4>Using <tt>filter</tt> to form Expressions</h4>
494 <p>The <tt>logic</tt> module also provides convenient methods to
495 create common types of Expression objects via the <tt>filter</tt> and
496 <tt>comparison</tt> factory functions.</p>
497
498 <p>The <tt>filter(**kwargs)</tt> function produces an Expression by taking
499 the keyword arguments you supply, and rewriting them in lambda form. The
500 only operator allowed is therefore the equals '==' operator. For example:
501 <pre>>>> logic.filter(Type='Cat', Mutation='Atomic')
502 logic.Expression(lambda x: (x.Type == 'Cat') and (x.Mutation == 'Atomic'))</pre>
503 </p>
504
505 <h4>Using <tt>comparison</tt> to form Expressions</h4>
506 <p>The <tt>comparison(attr, cmp_op, criteria)</tt> function allows you to
507 form Expressions with dynamic operators. This can come in handy when you
508 are constructing Expressions on the fly from user input. For example, a
509 search page might prompt users for an attribute name, an operator, and an
510 operand (the criteria).</p>
511
512 <p>Borrowing from <tt>opcode.cmp_op</tt>, the allowed values for our cmp_op
513 argument are as follows:</p>
514 <table>
515 <tr><th>Numeric Value (cmp_op)</th><th>Operator</th></tr>
516 <tr><td>0</td><td>&lt;</td></tr>
517 <tr><td>1</td><td>&lt;=</td></tr>
518 <tr><td>2</td><td>==</td></tr>
519 <tr><td>3</td><td>!=</td></tr>
520 <tr><td>4</td><td>&gt;</td></tr>
521 <tr><td>5</td><td>&gt;=</td></tr>
522 <tr><td>6</td><td>in</td></tr>
523 <tr><td>7</td><td>not in</td></tr>
524 <tr><td>8</td><td>is</td></tr>
525 <tr><td>9</td><td>is not</td></tr>
526 </table>
527
528 <p>Here's an example of using <tt>comparison</tt>:
529 <pre>>>> logic.comparison('Name', 3, 'Mr. Kamikaze')
530 logic.Expression(lambda x: x.Name != 'Mr. Kamikaze')</pre>
531 Although the comparison function only allows a single comparison at a time,
532 the resulting Expressions can be combined with the <tt>&</tt> and <tt>|</tt>
533 operators (described earlier) to produce more complex Expressions.</p>
534
535 <h4>Exporting the <tt>logic</tt> module</h4>
536 <p>The <tt>logic</tt> module (and <tt>codewalk</tt>, on which it is built)
537 isn't limited to Dejavu. Feel free to use it in some other framework or
538 script! The only change you may have to make (if you relocate the module
539 outside of the <tt>dejavu</tt> package) would be to the single line:
540 <tt>from dejavu import codewalk</tt>, to point to the new location.</p>
541
542 <p>In particular, <tt>logic.Expression</tt> objects can operate on <i>any</i>
543 Python object, not just dejavu <tt>Unit</tt> instances. If you wish to
544 provide additional logic functions (as dejavu does), simply inject them
545 into <tt>logic</tt>'s globals.</p>
546
547 <p>You may also find the underlying <tt>codewalk</tt> module useful for
548 other purposes on its own. The <tt>Visitor</tt> base class can be very
549 convenient for building bytecode hacks.</p>
550
551 <p>To make a long story short, Dejavu depends on <tt>logic</tt> throughout,
552 but the reverse is not true.</p>
553
554
555 <h3>Associations between Unit Classes</h3>
556 <p>Once you've put together some Unit classes, chances are you're going to
557 want to associate them. Generally, this is accomplished by creating a
558 property in the Unit_B class which stores IDs of Unit_A objects (which
559 might be called <i>foreign keys</i> in a database context).
560 <pre>class Archaeologist(Unit):
561     Height = UnitProperty(float)
562
563 class Biography(Unit):
564     ArchID = UnitProperty(int)</pre>
565 In this example, each <tt>Biography</tt> object will have an <tt>ArchID</tt>
566 attribute, which will equal the <tt>ID</tt> of some <tt>Archaeologist</tt>.
567 In Dejavu terms, we say that there is a <i>near class</i> (with a <i>near
568 key</i>) and a <i>far class</i> (with a <i>far key</i>). Associations in
569 Dejavu are not one-way, so it doesn't matter which class you choose for the
570 "near" one and which for the "far" one.</p>
571
572 <p>You could stop at this point in your design, and simply remember what
573 these keys are and how they relate, and manipulate them accordingly. But
574 Dejavu allows you to explicitly declare these associations:
575 <pre>dejavu.associate(Archaeologist, 'ID', Biography, 'ArchID')</pre>
576 You pass in the near class, the near key, the far class, and the far key.
577 </p>
578
579 <p>What does an explicit association buy for you? First, Arenas discover them
580 and fill the <tt>Arena.associations</tt> registry, so that smart consumer
581 code (like Unit Engine Rules, below) can automatically follow association
582 paths for you. Second, each Unit class has a private <tt>_associations</tt>
583 attribute, a <tt>dict</tt>. Each Unit involved in in the association gains
584 an entry in that dict: the key is the far class itself (not the class name),
585 and the value is a tuple of (near key, far key).</p>
586
587 <h4><tt>related_units</tt> methods</h4>
588 <p>In addition, each of the two Unit classes will gain a new
589 <i>related_units</i> method which simplifies looking up related instances
590 of the other class. The new method for Unit_B will have the name of Unit_A,
591 and vice-versa. In our example:
592 <pre>>>> Archaeologist.Biography
593 &lt;unbound method Archaeologist.related_units>
594 >>> Eversley = Archaeologist(Height=(6.417))
595 >>> Eversley.Biography
596 &lt;bound method Archaeologist.related_units of &lt;__main__.Archaeologist
597 object at 0x011A1930>>
598 >>> bios = Eversley.Biography()
599 >>> bios
600 &lt;listiterator object at 0x012150D0>
601 >>> list(bios)
602 []
603 </pre>
604 We haven't created any Biographies, so there aren't any to be recalled,
605 which is why we get an empty iterator at this point. At the other extreme
606 (when you have hundreds of Biographies to filter), you can pass an optional
607 <tt>Expression</tt> object to the related_units method. When you do, the
608 list of associated Units will be filtered accordingly.</p>
609
610 <p>Because the related_units method names are formed automatically, you need
611 to take care not to use the names of Unit classes for your Unit properties.
612 In our example, we used "ArchID" for the name of our "foreign key".
613 If we had used "Archaeologist" instead, we would have had problems;
614 when we associated the classes, the <i>property</i> named "Archaeologist"
615 would have collided with the <i>related_units method</i> named
616 "Archaeologist". Be careful when naming your properties, and plan for the
617 future.</p>
618
619 <p>Unlike some other ORM's, Dejavu doesn't cache far Units within the near
620 Unit. Each time you call the related_units method, the data is recalled
621 from your Sandbox. It is quite probable that those far Units are still
622 sitting in memory in the Sandbox, but they're not going to persist in
623 the near Unit itself in any way.</p>
624
625 <p>Finally, some of you may want to override the default related_units
626 methods. Feel free; <tt>associate</tt> takes two optional arguments, which
627 should be callables that create and return the new method(s). See the source
628 code of <tt>dejavu</tt> and the method <tt>dejavu.relation_factory</tt>
629 for more information.</p>
630
631 <h4><tt>Unit.first()</tt></h4>
632 <p>Associations also enable the <tt>first</tt> method of Units. It's an
633 easy way to get a single related unit. Call it with a far Class and,
634 optionally, keyword arguments. The method will look up the related
635 properties and call sandbox.unit() for you, returning either the first
636 such far Unit or None if not found.</p>
637
638 <h3>Unit Engines</h3>
639 <p>Once you've created and associated your Unit classes, you can begin to
640 write "business logic" code (mostly inside those classes, we hope), and
641 "presentation logic" code (mostly outside those classes). In most cases,
642 you will construct Expressions within your own code manually to retrieve
643 Units. Sometimes, however, you need to persist query parameters from your
644 users; in other cases, you might store a list of Units which match a query
645 (regardless of who formed the necessary Expression). Finally, you might
646 wish to manipulate lists of Units as sets: differences, intersections,
647 and unions. The <tt>engines</tt> module addresses all of these needs.</p>
648
649 <h4>Collections: Lists of Units</h4>
650 <p>The <tt>UnitCollection</tt> class provides a means of storing a list
651 of Units, or rather, a list of Unit ID's. You use its <tt>Type</tt>
652 property to indicate the class of the indexed Units. That value should be
653 the <b>name</b> of the Unit Class, <b>not</b> the class object itself
654 (this is different than most other calls in Dejavu). If you need to
655 retrieve the actual Unit class, call <tt>UnitCollection().unit_class()</tt>.</p>
656
657 <p><tt>UnitCollection</tt> itself subclasses <tt>dejavu.Unit</tt>; you can
658 therefore persist Unit Collections via Dejavu Storage Managers (most SM's,
659 anyway; it's recommended that SM's handle Unit Collections, but not
660 required. Check your SM to see if it does).</p>
661
662 <p>Each Collection has a thread lock (an RLock, actually) which you should
663 <tt>acquire()</tt> before you add an ID to the set, and <tt>release()</tt>
664 afterward. If you use the <tt>add(ID)</tt> method, this locking is done
665 for you.</p>
666
667 <p>When you need to retrieve the actual Units which are indexed by the
668 Collection, call the <tt>units(quota=None)</tt> method, which will
669 look up the Units and return them in a list. Since the Collection only
670 stores ID's, it is possible that one of the indexed Units may have been
671 destroyed since the list was built. The <tt>units</tt> method simply
672 passes over these "phantom" Units. You can inspect the full list of IDs
673 in the Collection (whether they reference existing Units or not) with
674 the <tt>ids()</tt> method.</p>
675
676 <p>Collections also provide a convenience function for grouping Units
677 by attribute: <tt>xdict(attr)</tt>. This function will look up each Unit
678 in the Collection, inspect the attribute that you specify, and return
679 a dictionary of the form <tt>{attr_val1: [Unit, Unit, ...]}</tt>.
680 Each distinct attribute value will have its own key, with a list of
681 matching Units as the value.</p>
682
683 <h4>Engines</h4>
684 <p>You can form Collections by hand, but a more powerful technique is
685 the <tt>UnitEngine</tt>, a factory for Collections. Engines are very
686 simple: they possess a set of <i>rules</i> which are executed when
687 you want to take a <i>snapshot</i> of Units. The snapshot which is
688 produced is a <tt>UnitCollection</tt> object. Whenever you call
689 <tt>take_snapshot()</tt>, the Engine will maintain an association
690 to the resulting Collection. You can access past snapshots with the
691 <tt>snapshots()</tt> method.</p>
692
693 <p>Engines are themselves Units, and can be persisted via Storage Managers.
694 The only properties they possess are: an <tt>ID</tt>, a <tt>Name</tt>,
695 an <tt>Owner</tt>, a <tt>FinalClassName</tt>, and <tt>Created</tt>,
696 the creation date of the Engine.</p>
697
698 <p>The <tt>Owner</tt> property should either be a user name, or one of the
699 reserved names: "Public" and "System". By default, the <tt>permit()</tt>
700 method allows a user read-access to the Engine if they are the Owner, or
701 the Owner is "Public" or "System". Write-access is permitted if the user
702 is the Owner, or the Owner is "Public". Feel free to override
703 <tt>permit()</tt> in a subclass to provide different behaviors.</p>
704
705 <p>The <tt>FinalClassName</tt> is set for you as you add Rules to the
706 Engine. You can use the value of this property, for example, to tell
707 your users, "Engine #23569 is an 'Armadillo' engine," when it produces
708 Collections of <tt>Armadillo</tt> Units. The only time you might want to
709 set this value is when you first create the Engine, before you have added
710 any Rules.</p>
711
712 <h4>Rules</h4>
713 <p>Just like Collections and Engines, <tt>UnitEngineRule</tt> is <i>also</i>
714 a subclass of <tt>Unit</tt>, and can be persisted via Storage Managers. All
715 three work together to provide a complete, dynamic, application-level query
716 generator.</p>
717
718 <p>Okay, so what are Rules? You might say they're a "little language",
719 with the following primitives, or "operations":</p>
720 <table>
721 <tr><th>Operation</th><th>Operand(s)</th><th>Description</th></tr>
722 <tr><th colspan='3'>Operations on a single set</th></tr>
723 <tr>
724     <td>CREATE</td>
725     <td>The classname of the new Type</td>
726     <td>Creates a new Set of the specified Type. All Units of that Type
727         are included in the new Set.</td>
728 </tr>
729 <tr>
730     <td>FILTER</td>
731     <td>A <tt>logic.Expression</tt></td>
732     <td>Removes Units from the current Set which do not match the
733         Expression.</td>
734 </tr>
735 <tr>
736     <td>FUNCTION</td>
737     <td>The name of a function in the <tt>Arena.engine_functions</tt>
738         dict</td>
739     <td>Calls the function, passing the current Set. The function
740         should modify the Set.</td>
741 </tr>
742 <tr>
743     <td>TRANSFORM</td>
744     <td>The classname of the new Type</td>
745     <td>Transform the current Set into a Set of associated Units
746         (of another Type). The association must be present in the
747         <tt>Arena.associations</tt> graph.</td>
748 </tr>
749 <tr>
750     <td>RETURN</td>
751     <td></td>
752     <td>Optional. If omitted, the last Set handled is returned as the
753         snapshot. If supplied, the ID of the Set to return.</td>
754 </tr>
755 <tr><th colspan='3'>Operations on two sets</th></tr>
756 <tr>
757     <td>COPY</td>
758     <td>The Set ID of the new Set</td>
759     <td>Copies the current Set to a new Set. The current Set is unchanged.</td>
760 </tr>
761 <tr>
762     <td>DIFFERENCE</td>
763     <td>The ID of the Set to mix in</td>
764     <td>Removes IDs from the current Set which exist in the second Set.</td>
765 </tr>
766 <tr>
767     <td>INTERSECTION</td>
768     <td>The ID of the Set to mix in</td>
769     <td>Removes IDs from the current Set which <i>do not</i> exist in the
770         second Set.</td>
771 </tr>
772 <tr>
773     <td>UNION</td>
774     <td>The ID of the Set to mix in</td>
775     <td>Adds any IDs to the current Set which exist in the second Set.</td>
776 </tr>
777 </table>
778
779 <p>Each Rule has an <tt>Operation</tt> property (a string, one of the above),
780 a <tt>SetID</tt>, and an <tt>Operand</tt>. Here's an example ruleset:</p>
781 <table>
782 <tr><th>Sequence</th><th>Operation</th><th>SetID</th><th>Operand</th></tr>
783 <tr><td>1</td><td>CREATE</td><td>1</td><td>Invoice</td></tr>
784 <tr><td>2</td><td>FILTER</td><td>1</td><td>(Expression)</td></tr>
785 <tr><td>3</td><td>CREATE</td><td>2</td><td>Inventory</td></tr>
786 <tr><td>4</td><td>FILTER</td><td>2</td><td>(Expression)</td></tr>
787 <tr><td>5</td><td>TRANSFORM</td><td>2</td><td>Invoice</td></tr>
788 <tr><td>6</td><td>DIFFERENCE</td><td>1</td><td>2</td></tr>
789 <tr><td>7</td><td>RETURN</td><td>1</td><td></td></tr>
790 </table>
791
792 <p>As you can see, every Rule operates on a <i>Set</i> of Units. The first
793 rule is always to CREATE a set, declaring it to contain a certain Type
794 of Units. In most cases, you will then FILTER that set. If you simply
795 created a set and then returned it, it would contain all Units of the
796 declared Type. When you filter a set, however, you remove Units from
797 the whole which do not match the filter's Expression.</p>
798
799 <p>In the example above, we CREATE a second Set so that we can eventually
800 obtain the DIFFERENCE between Set 1 and Set 2. The second Set contains
801 Units of a different Type than the first. Once we filter Set 2, we then
802 TRANSFORM it; for each Inventory Unit, we look up associated Invoice
803 Units. Then, we find the difference between the two Invoice sets and
804 RETURN it.</p>
805
806 <p>Rules are executed in order according to their <tt>Sequence</tt>
807 attribute (lowest first). When you use the <tt>Engine.add_rule</tt> method,
808 the next <tt>Sequence</tt> value is retrieved for you. Notice that each
809 Rule belongs to one and only one Engine; they are not shared between
810 Engines. Each Rule has its own <tt>EngineID</tt> attribute.</p>
811
812 <h4>Engine Functions</h4>
813 <p>The FUNCTION rule deserves special mention. The Operand of a FUNCTION
814 rule is a string, a key in the <tt>Arena.engine_functions</tt> dictionary.
815 When the rule is executed, that key is used to look up the function, which
816 is then called, passing <tt>(sandbox, set)</tt>. The function should
817 mutate the set directly. Use FUNCTION rules to mutate sets in ways which
818 are more complex than those provided by FILTER and TRANSFORM. For example,
819 you might provide a function which removes all but the first Unit in the
820 Set (according to some ordering algorithm).</p>
821
822
823 <h3>Analysis Tools</h3>
824 <p>Dejavu includes various tools to help you manipulate groups of Units.</p>
825
826 <h4>Sorting Units</h4>
827 <p>When you recall Units, you receive a generator, and must iterate over
828 the values in some way. Often, this is accomplished with a list
829 comprehension:
830 <pre>f = logic.Expression(lambda x: 'Aa' in x.Name)
831 people = [x for x in sandbox.recall(Person, f)]
832 </pre>
833 However, the <tt>recall</tt> method doesn't do any sorting; you must sort
834 your list in your Python code. Dejavu provides a <tt>sort(attrs,
835 descending=False)</tt> function to assist you. It returns a function, which
836 you can then use in Python's sort function. Continuing our example:
837 <pre>sorted_people = people.sort(dejavu.sort('Size', 'Name'))</pre>
838 The most important issue (and the reason we don't just use 2.4's attrgetter),
839 is that any Unit property must allow values of None, which tends to raise
840 errors when compared to values of other types. The function which
841 <tt>sort</tt> creates for you treats None as "less than" any other value.</p>
842
843 <h4>Cross-tabulation</h4>
844 <p>Cross-tabs (also called <i>aggregate tables</i> or <i>pivot tables</i>)
845 display aggregate information about objects by category. For example,
846 rather than show a list of Safari records, one row per trip, you might
847 wish to show a table where each row represents a Destination, and each
848 column shows the count of Safaris to that Destination for each distinct
849 Year. In this example, we say that the Safaris are "grouped by" their
850 Destination values, and that we "pivot" on the Year values.</p>
851
852 <p>Dejavu helps you form such a table via the <tt>CrossTab</tt> class.
853 You need to specify the group(s) you wish to use, and the pivot attribute.
854 Finally, you must specify the aggregate function. Here's a code example:
855 <pre>
856 >>> data = ["a", "b", "cc", "bddd", "a4", "b6"]
857 >>> group = lambda x: x.isalpha()
858 >>> pivot = lambda x: x[0]
859 >>> ctab = analysis.CrossTab(data, [group], pivot, dejavu.COUNT)
860 >>> data, columns = ctab.results()
861 >>> data
862 {(True,): {"a": 1, "b": 2, "c": 1},
863  (False,): {"a": 1, "b": 1}}
864 >>> columns
865 ["a", "b", "c"]</pre>
866 You may notice that we're not using Units in our example; the
867 <tt>CrossTab</tt> class is designed to work with any objects. Here's one
868 way to lay out that data:</p>
869 <table>
870 <tr><th>Is Alpha</th><th>a</th><th>b</th><th>c</th></tr>
871 <tr><td>Y</td><td>1</td><td>2</td><td>1</td></tr>
872 <tr><td>N</td><td>1</td><td>1</td><td>0</td></tr>
873 </table>
874
875 <p>The <tt>results</tt> method returns two values. First, the table
876 itself in the form of a dictionary; each key is a tuple of group values,
877 and the corresponding value is a sub-dictionary. Each sub-dict has keys
878 which are the pivot attribute, and values which equal the aggregates.
879 I know, that was confusing; look at the example. The second value to
880 be returned is a list of the pivot column values; you'll notice they're
881 sorted.</p>
882
883 <p>The groups and pivot arguments may be either strings or functions.
884 If strings, they must be the names of attributes of the source objects.
885 The final aggfunc argument defaults to COUNT, but may also be SUM.
886 More aggfuncs may arrive in the future.</p>
887
888 <h3>The Arena Object</h3>
889 <p>The topmost class in Dejavu is the <tt>Arena</tt> class. When building
890 a Dejavu application, you must first create an instance of this class,
891 and must find a way to persist this object across client connections.
892 This can be achieved in multiple ways; web applications, for example,
893 will typically create a single process to serve all requests. Desktop
894 applications will probably create a single Arena object for each
895 running instance of the program.</p>
896
897 <h4>Loading Stores</h4>
898 <p>You <b>may</b> manually set up Storage Managers by calling
899 <tt>Arena().add_store(name, store, unitClasses)</tt>. But, you
900 probably shouldn't. Instead, allow your deployers to decide for
901 themselves which storage solution(s) to use. You can do this by calling
902 <tt>load(filename)</tt>; pass it the filename of an INI-style file
903 which your deployers can tweak without screwing up your Python code.
904 The next chapter in this reference is completely devoted to educating
905 deployers; point them to it or copy/modify it in your own release docs.</p>
906
907 <h4>Registering Unit Classes</h4>
908 <p>The <tt>Arena</tt> object maintains a registry of Unit classes called a
909 <tt>roster</tt>. A roster is like a three-way map between Unit classes,
910 their names, and their assigned StorageManagers. You shouldn't manipulate
911 this structure on your own; instead, use the <tt>register</tt> or
912 <tt>register_all</tt> methods to register each Unit class.</p>
913
914 <p>The <tt>Arena</tt> object also manages the associations between Unit
915 classes in its <tt>associations</tt> attribute, which is a simple,
916 unweighted, undirected graph. Whenever you register a class, the Arena
917 will add its associations to this graph. The only other common operation
918 is to call <tt>.associations.shortest_path(start, end)</tt>, to retrieve
919 the chain of associations between two Unit classes.</p>
920
921 <hr />
922
923 <p><a name='hettinger'>[1]</a> Python Cookbook,
924 <a href='http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/277940'>Binding
925 Constants at compile time</a><br />
926 </p>
927
928 </body>
929 </html>
Note: See TracBrowser for help on using the browser.