Contact: fumanchu@aminus.org

Log in as guest/dejavu to create tickets

I think I've seen this ORM somewhere before...

root/trunk/doc/modeling.html

Revision 29 (checked in by fumanchu, 9 years ago)

1. Fixed bug in storeado with hints:Size of wrong type.
2. UnitProperty? now supplies .key from local attribute name when specified in class body (see MetaUnit? init).
3. Signature of UnitProperty? changed to (type, index, hints, key).
4. Minor bug in test_codewalk.

Line 
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
2    "http://www.w3.org/TR/xhtml1/DTD/strict.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4
5 <head>
6     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
7     <title>Dejavu: Modeling your Application</title>
8     <link href='dejavu.css' rel='stylesheet' type='text/css' />
9 </head>
10
11 <body>
12
13 <h2>Application Developers: Using Dejavu to Construct a Domain Model</h2>
14
15 <h3>Units</h3>
16 <p>When constructing a Domain Model for your application, you need
17 to have a distinction between data that will be persisted and data that
18 will not. At the most general level, you might say that some entire
19 <i>objects</i> need to be persisted. By registering a subclass of
20 <tt>dejavu.Unit</tt>, you specify a set of objects (instances of your
21 class) which will be persisted.</p>
22
23 <p>Before you can register your Unit class, you must create it:
24 <pre>import dejavu
25 class Printer(dejavu.Unit): pass</pre>
26 This is all you need for a fully-functioning Unit class. There are
27 no methods or attributes that you are required to override; simply
28 subclass from <tt>Unit</tt>. However, this is a fairly uninteresting
29 class. It doesn't provide any functionality other than what <tt>Unit</tt>
30 already provides. The first thing we will probably want to add to our
31 new class is persistent data.</p>
32
33 <h4>UnitProperty</h4>
34 <p>Once you have defined a persistent class (by subclassing <tt>Unit</tt>),
35 you need to make another decision. Rather than persist the entire object
36 <tt>dict</tt>, you specify a subset of persistent attributes by using
37 <tt>UnitProperty</tt>, a data descriptor. If you've used Python's builtin
38 property() construct, you've used descriptors before.</p>
39
40 <p>We might enhance our Printer example thusly:
41 <pre>from dejavu import Unit, UnitProperty
42 class Printer(Unit):
43     Manufacturer = UnitProperty(unicode)
44     ColorCopies = UnitProperty(bool)
45     PPM = UnitProperty(float)</pre>
46 This adds three persistent attributes to our <tt>Printer</tt> objects,
47 each with a different datatype. In addition, every subclass of <tt>Unit</tt>
48 inherits an 'ID' attribute, an int.</p>
49
50 <p>When you get and set <tt>UnitProperty</tt> attributes, they behave just
51 like any other attributes:
52 <pre>>>> p = Printer()
53 >>> p.PPM = 25
54 >>> p.PPM
55 25.0</pre>
56 However, you will notice right away that the int value we provided has been
57 coerced to a float behind the scenes. This is because we specified the PPM
58 attribute as a 'float' type when we created it. Unit Properties are
59 restricted to the types which you specify. The only other valid value
60 for a Unit Property is None; any Property may be None at any time, and
61 in fact, all Properties are None until you assign values to them:
62 <pre>>>> p.ColorCopies is None
63 True</pre></p>
64
65 <h4>Creating and Populating Properties</h4>
66 <p>In addition to defining Unit Properties within your class body,
67 you can define them after the class body has been executed via
68 the classmethod <tt>set_property()</tt>. For example, the following
69 two classes are equivalent:
70 <pre>class Publication(Unit):
71     Content = UnitProperty(unicode)
72
73 class Publication(Unit): pass
74 Publication.set_property('Content', unicode)</pre>
75
76 Declarations outside of the class body allow more dynamic setting of
77 Unit properties. You can define multiple properties at once via
78 the <tt>set_properties()</tt> classmethod:
79
80 <pre>class Publication(Unit): pass
81 Publication.set_properties({'Content': unicode,
82                             'Publisher': unicode,
83                             'Year': int,
84                             })</pre>
85 </p>
86
87 <p>You also have options when populating Unit Properties. The standard way
88 is simply to reference them as normal Python instance attributes. However,
89 you may also use the <tt>adjust()</tt> method to modify multiple properties
90 at once; pass in keyword arguments which match the properties you wish to
91 modify. Keyword arguments also work when instantiating the object. For
92 example, the following three code snippets are equivalent:
93
94 <pre>pub = Publication()
95 pub.Publisher = 'Walter J. Black'
96 pub.Year = 1928
97
98 pub = Publication()
99 pub.adjust(Publisher='Walter J. Black', Year=1928)
100
101 pub = Publication(Publisher='Walter J. Black', Year=1928)</pre>
102 </p>
103
104 <h4>Unit Properties are First-Class Objects</h4>
105 <p>Like many descriptors, Unit Properties behave differently when you access
106 them from the class, rather than from an instance as above. When calling
107 them from the class, you receive the <tt>UnitProperty</tt> object itself,
108 rather than its value for a given instance. That is,
109 <pre>>>> c = Printer.ColorCopies
110 >>> c
111 &lt;dejavu.UnitProperty object at 0x01112970></pre>
112 This is significant, because it allows us to store metadata about the
113 property itself:
114 <pre>>>> c.key, c.index, c.type, c.hints
115 ('ColorCopies', False, &lt;type 'bool'>, {})</pre>
116 The <tt>key</tt> attribute is merely the property's canonical name. The
117 <tt>index</tt> value tells Storage Managers whether or not to index the
118 column. The <tt>type</tt> attribute limits property values to instances
119 of that type (or <tt>None</tt>). Finally, the <tt>hints</tt> dictionary
120 provides hints to Storage Managers to help optimize storage. A common use,
121 for example, is to inform Managers that would usually store unicode strings
122 as strings of length 255, that a particular value should be a larger object;
123 this is done with a 'Size' mapping, such as <tt>hints = {u'Size': 0}</tt>,
124 where 0 implies no limit.</p>
125
126 <h4>Triggers</h4>
127 <p>In addition, each UnitProperty has a <tt>pre</tt> and <tt>post</tt>
128 attribute, which default to None. If you override these with methods
129 in a subclass of <tt>UnitProperty</tt>, they will be called when setting
130 a new value for that property, either before (pre) or after (post) the
131 new value is set. For example:
132 <pre>class DatedProperty(UnitProperty):
133     def post(self, unit, value):
134         unit.Date = datetime.datetime.now().replace(microsecond=0)
135
136 class Topic(Unit):
137     Date = UnitProperty(datetime.date)
138     Content = DatedProperty()</pre>
139 In this example, whenever Topic().Content is set, the <tt>post</tt>
140 method will be called and the object's <tt>Date</tt> attribute will
141 be modified.</p>
142
143 <h4>Unit ID's</h4>
144 <p>The <tt>Unit</tt> base class possesses a single Unit Property, an int
145 named 'ID'. If you wish to use ID's of a different type, simply override
146 the ID attribute in your subclass:
147 <pre>class Printer(Unit):
148     ID = UnitProperty(unicode)</pre>
149 Every Unit must possess an ID property. This ensures that each Unit within
150 the system is unique.</p>
151
152 <h4>Registration of Unit Classes</h4>
153 <p>In addition to defining your Unit class, you must also register that
154 class with your application's <tt>Arena</tt> object. Each class which
155 you want Dejavu to manage must be passed to <tt>Arena.register(cls)</tt>.
156 </p>
157
158 <h3>Sandboxes</h3>
159 <p>During the life of a client connection, your application should create
160 and use a <tt>Sandbox</tt> to manage the set of "live" Units. A Sandbox
161 manages the in-memory lifecycle of Units: creation, identity, mutation, and
162 destruction. Sandboxes route persistence operations on Units to the correct
163 Storage Manager.</p>
164
165 <p>You can create Sandbox objects directly. They take a single argument, the
166 top-level <tt>Arena</tt> object. Arenas also provide a convenience function,
167 <tt>new_sandbox</tt>, which does this for you. The following lines are
168 equivalent:
169 <pre>box = Sandbox(myArena)
170
171 box = myArena.new_sandbox()</pre>
172 You might often choose the latter when you have a reference to the Arena
173 object, and would rather avoid importing dejavu yet again just to obtain
174 the Sandbox class.</p>
175
176 <h4>Memorizing Units</h4>
177 <p>When you create a Unit instance, it exists in isolation. There is no
178 connection between that Unit and storage; your Unit will not be persisted,
179 because Dejavu doesn't yet possess a reference to your Unit. To provide
180 that link, you <i>memorize</i> your Unit (or rather, you tell your Sandbox
181 to memorize it):
182 <pre>class Publisher(Unit):
183     City = UnitProperty(unicode)
184
185 p = Publisher(ID='Walter J. Black')
186 box.memorize(p)</pre></p>
187
188 <p>Memorization does several things. First, it places your new Unit into
189 your Arena. That Unit instance will now be persisted by the appropriate
190 Storage Manager. It can be recalled from storage when needed, using the
191 built-in Expression syntax. It may have been given an ID (see
192 <u>Sequencing</u>, below). Memorization also makes your Unit
193 <i>concrete</i>; that is, your Unit will now possess a <tt>sandbox</tt>
194 attribute. Units whose <tt>sandbox</tt> attribute is not set (is None)
195 have no relationships, and their Unit Property triggers (if any) will
196 not fire.</p>
197
198 <p>You may define special methods on your Units to provide start-of-life
199 behaviors. If a Unit possesses an <tt>on_memorize</tt> method, it will
200 be called after the Unit has been 'reserved' in storage, and after the
201 Unit has ben placed in the Sandbox cache.</p>
202
203 <h4>Sequencing</h4>
204 <p>Every <tt>Unit</tt> has an <tt>ID</tt> property. The default ID property
205 is of type <tt>int</tt>; however, you can override that to whatever type
206 you like. As long as you provide your own IDs for Units, nothing will
207 break--you can memorize and recall Units without problems. However, if
208 you memorize a Unit with an ID of <tt>None</tt>, the Sandbox may attempt
209 to provide an ID for it.</p>
210
211 <p>The <tt>Unit</tt> base class possesses a <tt>sequencer</tt> attribute
212 to help Sandboxes generate new IDs. The default value is an instance of
213 <tt>UnitSequencerInteger</tt>, which examines all existing Units, finds
214 the maximum integer ID, adds 1, and uses that value for the new ID.</p>
215
216 <p>The other useful Sequencer is <tt>UnitSequencerNull</tt>, which simply
217 raises an error when asked to generate an ID. If your ID's are strings,
218 you'll probably want to make that class' <tt>.sequencer</tt> one of
219 these, and form ID values in your own code.</p>
220
221 <h4>Recalling</h4>
222 <p>Once you have memorized a Unit or two, you will probably want to
223 recall them at some point. Sandboxes possess two member functions to
224 accomplish this.</p>
225
226 <h5>recall()</h5>
227 <p>First, the appropriately named <tt>recall(cls, expr)</tt> function.
228 This is the full-blown query method. As a first argument, you pass it the
229 class (<b>not</b> the name of the class, but the actual class) of which you
230 expect to retrieve instances. The second argument should be an instance
231 of <tt>dejavu.logic.Expression</tt>, an object which encapsulates your
232 specific query (see <u>Expressions</u>, next). An example recall operation:
233 <pre>>>> e = logic.Expression(lambda x: x.Year == 1928)
234 >>> units = box.recall(Publication, e)
235 >>> [x.Title for x in units]
236 [u'The Giant Horse of Oz', u'Kai Lung Unrolls His Mat',
237  u'Tarzan, The Lord of the Jungle']
238 </pre>
239 If you do not supply an Expression, all Units of the given Unit class
240 will be retrieved. Notice that the return value is *not* a list; it is a
241 generator (or other iterable). You must iterate over it to retrieve all
242 values. By returning an iterator, we allow some Storage Managers to load
243 Units in a more lazy fashion. If this is a huge burden for you, let me
244 know; I might be convinced to add a <tt>recall_list</tt> method.</p>
245
246 <p>The <tt>recall</tt> method will take additional arguments in pairs of
247 <tt>cls</tt>, <tt>expr</tt>. This feature isn't fully developed yet.
248 It's designed to emulate JOINs, returning units which match each expr
249 and are related.</p>
250
251 <p>If your Unit class defines an <tt>on_recall()</tt> method, it will be
252 called when each Unit has been loaded from storage (at the end of the
253 recall process). Once the unit is loaded into a Sandbox, however,
254 <tt>on_recall</tt> will not be called; it's only called at the Sandbox/SM
255 boundary. If <tt>on_recall</tt> raises <tt>UnrecallableError</tt>, the
256 unit will not be yielded back to the caller, nor placed in the Sandbox
257 cache.</p>
258
259 <h5>unit()</h5>
260 <p>The <tt>recall</tt> method can be verbose. When you want a one-liner
261 and only expect a single Unit, use the <tt>unit(cls, **kw)</tt> method
262 of Sandboxes. Again, you pass the class of Units you wish to retrieve
263 as the first argument. Then, supply keyword arguments of the form
264 "property_name=value". The method will form an equivalent Expression
265 for you from the keyword args. For example:
266 <pre>>>> book = box.unit(Publication, ID=1)
267 >>> if book:
268 ...     print book.Title
269 u'Ladies in Hades'</pre>
270 If a Unit is not found that matches the criteria, None is returned.
271 If multiple Units match the criteria, only the first one is returned
272 (although the rest are probably loaded into memory).</p>
273
274 <h4>Forgetting and Repressing</h4>
275 <p>To <i>forget</i> a Unit is to destroy it forever. You have two options
276 for forgetting Units: you can call <tt>Sandbox().forget(unit)</tt> or
277 the simpler version, <tt>Unit().forget()</tt>. Either of these will clear
278 the Unit from the Sandbox' cache, and the Sandbox will tell the appropriate
279 Storage Manager to destroy the stored Unit data. If a Unit has not yet
280 been memorized, you do not need to forget it.</p>
281
282 <p>In some circumstances, you may wish to only clear the Unit from the
283 Sandbox without destroying it. You can do this by calling either
284 <tt>Sandbox().repress(unit)</tt> or the simpler version,
285 <tt>Unit().repress()</tt>.</p>
286
287 <p>You may define special methods on your Units to provide end-of-life
288 behaviors. If a Unit possesses an <tt>on_forget</tt> method, it will
289 be called after the Unit has been destroyed. If a Unit possesses an
290 <tt>on_repress</tt> method, it will be called <i>before</i> the Unit
291 has been repressed. I'm sure there was a good reason for this
292 disparity, but I've forgotten (or perhaps repressed) it.</p>
293
294 <h4>Flushing Sandboxes</h4>
295 <p>When the client connection has closed, you should <i>flush</i> the
296 Sandbox caches. In general, a single call to <tt>flush_all()</tt> will do
297 the trick. Notice that flushing calls <tt>repress()</tt> for each Unit in
298 the Sandbox, and any <tt>on_repress()</tt> triggers will be executed.</p>
299
300
301 <h4>Aggregate Functions</h4>
302 <p>Sandboxes also provide a <tt>distinct(cls, attrs, expr=None)</tt>
303 function. This returns values, rather than Units. Put simply, it returns
304 all distinct values for the given attribute(s) of the Unit class provided.
305 If only one attribute is specified, a list of values will be returned.
306 If more than one attribute is specified, a zipped list will be returned
307 of all distinct existing combinations. Providing an expr argument (an
308 <tt>Expression</tt> object, see below) will filter the set of Units before
309 obtaining distinct values.</p>
310
311 <p>The <tt>distinct</tt> function can also be used as a <tt>count</tt>
312 function by passing attrs = ['ID']. Sandboxes provide a
313 <tt>count(cls, expr)</tt> method which does just this.</p>
314
315 <h3>Querying</h3>
316 <p>When you retrieve Units, you often don't want to load the entire set for
317 a given class. In Dejavu, you filter the set according to the UnitProperty
318 attributes for each object. Naturally, there must be a way to express
319 the filter you intend. Dejavu actually provides three ways: Expressions,
320 <tt>filter</tt>, and <tt>comparison</tt>.</p>
321
322 <h4>The <tt>Expression</tt> class</h4>
323 <p>Regardless of which technique you use to express your filter, you're
324 going to end up with a <tt>logic.Expression</tt> object. You can build
325 an Expression directly, passing a single lambda as an argument:
326 <pre>>>> from dejavu import logic
327 >>> import datetime
328 >>> f = lambda x: x.Date >= datetime.date(2004, 3, 1)
329 >>> e = logic.Expression(f)
330 >>> e
331 logic.Expression(lambda x: x.Date >= datetime.date(2004, 3, 1))</pre>
332 Neat, eh? I worked hard on that __repr__. ;)</p>
333
334 <p>It may be obvious, but we'll be explicit, here. The lambda which you pass
335 into an Expression must possess a single positional argument, which will
336 always be bound to a Unit instance. In the example above, it's named 'x',
337 but you can use any name you like. Using lambdas as a base means that we
338 can simply call Expression.func(Unit), and receive a boolean value
339 indicating whether our Unit "passes the test". Attribute lookups on our
340 'x' object will apply to Unit Properties for that Unit object.
341 That is, <tt>x.Date</tt> becomes <tt>Unit.Date</tt>.</p>
342
343 <h4>Early binding</h4>
344 <p>What is not obvious from the above code snippet is perhaps the <b>most
345 important aspect</b> of Expressions: any globals or cell references (from
346 closures) in the supplied lambda get <b>bound early</b>. Compare the
347 following disassemblies:
348 <pre>>>> import dis
349 >>> dis.dis(f)
350   1           0 LOAD_FAST                0 (x)
351               3 LOAD_ATTR                1 (Date)
352               6 LOAD_GLOBAL              2 (datetime)
353               9 LOAD_ATTR                3 (date)
354              12 LOAD_CONST               1 (2004)
355              15 LOAD_CONST               2 (3)
356              18 LOAD_CONST               3 (1)
357              21 CALL_FUNCTION            3
358              24 COMPARE_OP               5 (>=)
359              27 RETURN_VALUE       
360 >>> dis.dis(e.func)
361   1           0 LOAD_FAST                0 (x)
362               3 LOAD_ATTR                1 (Date)
363               6 LOAD_CONST               6 (datetime.date(2004, 3, 1))
364               9 COMPARE_OP               5 (>=)
365              12 RETURN_VALUE       
366 </pre>
367 As you can see, the function itself references the global 'datetime' module.
368 Once we wrap it in the Expression, however, it becomes a constant! Thanks to
369 Raymond Hettinger for inspiring this solution <a href='#hettinger'>[1]</a>.
370 Early binding, however, implies two consequences:</p>
371
372 <p>First, any globals or cell references must be present in the lambda's
373 scope when it is passed into Expression(). This is the norm and shouldn't
374 require too much thought from you when you write Expressions. In the
375 example above, we simply imported <tt>datetime</tt> as you would expect.</p>
376
377 <p>Second, any globals or cell references must <b>also</b> be present in
378 the <tt>logic</tt> module's globals when the Expression is unpickled.
379 Pickling occurs when Expressions are sent over sockets, and also if
380 Expressions are themselves persisted to storage (for example, see
381 <u>Unit Engines</u>, below). This means your application should inject
382 globals into the <tt>logic</tt> module. Note that the <tt>logic</tt> module
383 already tries to import <tt>datetime</tt>, <tt>fixedpoint</tt> and
384 <tt>decimal</tt>.</p>
385
386 <h4>External functions within Expressions</h4>
387 <p>Dejavu provides additional functions which can be used in Expressions.
388 For example, you can construct an Expression like:
389 <pre>logic.Expression(lambda x: x.Size < 3 and x.Date > dejavu.today())</pre>
390 In this example, the <tt>today()</tt> function breaks convention and is
391 actually <b>bound late</b>. That is, if you construct this Expression now
392 and use it six months later, the value of <tt>today()</tt> will change.
393 Storage Managers "know about" these dejavu functions, and can use them
394 to build more appropriate queries. Here are the functions supplied by
395 the <tt>dejavu</tt> module:</p>
396
397 <table>
398 <tr><th>Function</th><th>Late bound?</th><th>Description</th></tr>
399 <tr>
400     <td><tt>icontains(a, b)</tt></td>
401     <td></td>
402     <td>Case-insensitive test b in a. Note the operand order.</td>
403 </tr>
404 <tr>
405     <td><tt>icontainedby(a, b)</tt></td>
406     <td></td>
407     <td>Case-insensitive test a in b. Note the operand order.</td>
408 </tr>
409 <tr>
410     <td><tt>istartswith(a, b)</tt></td>
411     <td></td>
412     <td>True if a starts with b (case-insensitive), False otherwise.</td>
413 </tr>
414 <tr>
415     <td><tt>iendswith(a, b)</tt></td>
416     <td></td>
417     <td>True if a ends with b (case-insensitive), False otherwise.</td>
418 </tr>
419 <tr>
420     <td><tt>ieq(a, b)</tt></td>
421     <td></td>
422     <td>True if a == b (case-insensitive), False otherwise.</td>
423 </tr>
424 <tr>
425     <td><tt>year(value)</tt></td>
426     <td></td>
427     <td>The year attribute of a date. If value is None, return None.</td>
428 </tr>
429 <tr>
430     <td><tt>now()</tt></td>
431     <td>Y</td>
432     <td>datetime.datetime.now()</td>
433 </tr>
434 <tr>
435     <td><tt>today()</tt></td>
436     <td>Y</td>
437     <td>datetime.date.today()</td>
438 </tr>
439 <tr>
440     <td><tt>iscurrentweek(value)</tt></td>
441     <td>Y</td>
442     <td>If value is in the current week, return True, else False.</td>
443 </tr>
444 </table>
445
446 <p>It is possible for you, the application developer, to define your
447 own external functions. However, because Storage Managers are unaware
448 of your new functions, they will not be able to optimize their use;
449 instead, they will simply retrieve a larger set of objects from storage,
450 evaluate each one against the function you provide, and return those
451 Units which match your function. This isn't necessarily a bad thing;
452 it provides the same functionality as if you wrote the test inline
453 within your own code. By making that test a logic function, you allow
454 it to be stored in Engine <i>rules</i> (see <u>Unit Engines</u>,
455 below).</p>
456
457 <h4>Combining Expressions</h4>
458 <p>Expressions are combinable; by using the <tt>&</tt> operator, the two
459 expressions are combined with an adjoining logical "and". For example:
460 <pre>>>> a = logic.Expression(lambda x: x.Size > 3)
461 >>> b = logic.Expression(lambda x: x.Size <= 15)
462 >>> c = a & b
463 >>> c
464 logic.Expression(lambda x: (x.Size > 3) and (x.Size <= 15))</pre>
465 The <tt>+</tt> operator works just like the <tt>&</tt> operator. The
466 <tt>|</tt> operator combines the two Expressions with a logical 'or'.</p>
467
468 <h4>Using <tt>filter</tt> to form Expressions</h4>
469 <p>The <tt>logic</tt> module also provides convenient methods to
470 create common types of Expression objects via the <tt>filter</tt> and
471 <tt>comparison</tt> factory functions.</p>
472
473 <p>The <tt>filter(**kwargs)</tt> function produces an Expression by taking
474 the keyword arguments you supply, and rewriting them in lambda form. The
475 only operator allowed is therefore the equals '==' operator. For example:
476 <pre>>>> logic.filter(Type='Cat', Mutation='Atomic')
477 logic.Expression(lambda x: (x.Type == 'Cat') and (x.Mutation == 'Atomic'))</pre>
478 </p>
479
480 <h4>Using <tt>comparison</tt> to form Expressions</h4>
481 <p>The <tt>comparison(attr, cmp_op, criteria)</tt> function allows you to
482 form Expressions with dynamic operators. This can come in handy when you
483 are constructing Expressions on the fly from user input. For example, a
484 search page might prompt users for an attribute name, an operator, and an
485 operand (the criteria).</p>
486
487 <p>Borrowing from <tt>opcode.cmp_op</tt>, the allowed values for our cmp_op
488 argument are as follows:</p>
489 <table>
490 <tr><th>Numeric Value (cmp_op)</th><th>Operator</th></tr>
491 <tr><td>0</td><td>&lt;</td></tr>
492 <tr><td>1</td><td>&lt;=</td></tr>
493 <tr><td>2</td><td>==</td></tr>
494 <tr><td>3</td><td>!=</td></tr>
495 <tr><td>4</td><td>&gt;</td></tr>
496 <tr><td>5</td><td>&gt;=</td></tr>
497 <tr><td>6</td><td>in</td></tr>
498 <tr><td>7</td><td>not in</td></tr>
499 <tr><td>8</td><td>is</td></tr>
500 <tr><td>9</td><td>is not</td></tr>
501 </table>
502
503 <p>Here's an example of using <tt>comparison</tt>:
504 <pre>>>> logic.comparison('Name', 3, 'Mr. Kamikaze')
505 logic.Expression(lambda x: x.Name != 'Mr. Kamikaze')</pre>
506 Although the comparison function only allows a single comparison at a time,
507 the resulting Expressions can be combined with the <tt>&</tt> and <tt>|</tt>
508 operators (described earlier) to produce more complex Expressions.</p>
509
510 <h4>Exporting the <tt>logic</tt> module</h4>
511 <p>The <tt>logic</tt> module (and <tt>codewalk</tt>, on which it is built)
512 isn't limited to Dejavu. Feel free to use it in some other framework or
513 script! The only change you may have to make (if you relocate the module
514 outside of the <tt>dejavu</tt> package) would be to the single line:
515 <tt>from dejavu import codewalk</tt>, to point to the new location.</p>
516
517 <p>In particular, <tt>logic.Expression</tt> objects can operate on <i>any</i>
518 Python object, not just dejavu <tt>Unit</tt> instances. If you wish to
519 provide additional logic functions (as dejavu does), simply inject them
520 into <tt>logic</tt>'s globals.</p>
521
522 <p>You may also find the underlying <tt>codewalk</tt> module useful for
523 other purposes on its own. The <tt>Visitor</tt> base class can be very
524 convenient for building bytecode hacks.</p>
525
526 <p>To make a long story short, Dejavu depends on <tt>logic</tt> throughout,
527 but the reverse is not true.</p>
528
529
530 <h3>Associations between Unit Classes</h3>
531 <p>Once you've put together some Unit classes, chances are you're going to
532 want to associate them. Generally, this is accomplished by creating a
533 property in the Unit_B class which stores IDs of Unit_A objects (which
534 might be called <i>foreign keys</i> in a database context).
535 <pre>class Archaeologist(Unit):
536     Height = UnitProperty(float)
537
538 class Biography(Unit):
539     ArchID = UnitProperty(int)</pre>
540 In this example, each <tt>Biography</tt> object will have an <tt>ArchID</tt>
541 attribute, which will equal the <tt>ID</tt> of some <tt>Archaeologist</tt>.
542 In Dejavu terms, we say that there is a <i>near class</i> (with a <i>near
543 key</i>) and a <i>far class</i> (with a <i>far key</i>). Associations in
544 Dejavu are not one-way, so it doesn't matter which class you choose for the
545 "near" one and which for the "far" one.</p>
546
547 <p>You could stop at this point in your design, and simply remember what
548 these keys are and how they relate, and manipulate them accordingly. But
549 Dejavu allows you to <i>register</i> these associations explicitly in your
550 <tt>Arena</tt>:
551 <pre>myArena.associate(Archaeologist, 'ID', Biography, 'ArchID')</pre>
552 You pass in the near class, the near key, the far class, and the far key.
553 </p>
554
555 <p>What does an explicit association buy for you? First, the <tt>associate</tt>
556 call adds an entry in the <tt>Arena.associations</tt> registry, so that
557 smart consumer code (like Unit Engine Rules, below) can automatically
558 follow association paths for you. Second, each Unit class has a private
559 <tt>_associations</tt> attribute, a <tt>dict</tt>. Each Unit involved
560 in the association gains an entry in that dict: the key is the far class
561 itself (not the class name), and the value is a tuple of (far key, near key).
562 Third, <tt>associate()</tt> can be used to register your Unit classes in
563 the Arena's <tt>roster</tt>; you don't have to call <tt>register</tt> for
564 either class if you call <tt>associate</tt> (see <u>The Arena Object</u>,
565 below).</p>
566
567 <p>In addition, each of the Unit classes will gain a new <i>synapse</i>
568 method which simplifies looking up related instances of the other class.
569 The new method for Unit_B will have the name of Unit_A, and vice-versa.
570 In our example:
571 <pre>>>> Archaeologist.Biography
572 &lt;unbound method Archaeologist.synapses>
573 >>> Eversley = Archaeologist(Height=(6.417))
574 >>> Eversley.Biography
575 &lt;bound method Archaeologist.synapses of &lt;__main__.Archaeologist
576 object at 0x011A1930>>
577 >>> bios = Eversley.Biography()
578 >>> bios
579 &lt;listiterator object at 0x012150D0>
580 >>> list(bios)
581 []
582 </pre>
583 We haven't created any Biographies, so there aren't any to be recalled,
584 which is why we get an empty iterator at this point. At the other extreme
585 (when you have hundreds of Biographies to filter), you can pass an optional
586 <tt>Expression</tt> object to the synapse method. When you do, the list of
587 associated Units will be filtered accordingly.</p>
588
589 <p>Because the synapse method names are formed automatically, you need
590 to take care not to use the names of Unit classes for your Unit properties.
591 In our example, we used "ArchID" for the name of our "foreign key".
592 If we had used "Archaeologist" instead, we would have had problems;
593 when we associated the classes, the <i>property</i> named "Archaeologist"
594 would have collided with the <i>synapse method</i> named "Archaeologist".
595 Be careful when naming your properties, and plan for the future.</p>
596
597 <p>Unlike some other ORM's, Dejavu doesn't cache far Units within
598 the near Unit. Each time you call the synapse method, the data is recalled
599 from your Sandbox. It is quite probable that those far Units are still
600 sitting in memory in the Sandbox, but they're not going to persist in
601 the near Unit itself in any way.</p>
602
603 <p>Finally, some of you may want to override the default synapse methods.
604 Feel free; <tt>Arena.associate</tt> takes two optional arguments, which
605 should be callables that return the new function(s). See the source code
606 of <tt>Arena</tt> and the private method <tt>dejavu._synapses_func</tt>
607 for more information.</p>
608
609
610 <h3>Unit Engines</h3>
611 <p>Once you've created and associated your Unit classes, you can begin to
612 write "business logic" code (mostly inside those classes, we hope), and
613 "presentation logic" code (mostly outside those classes). In most cases,
614 you will construct Expressions within your own code manually to retrieve
615 Units. Sometimes, however, you need to persist query parameters from your
616 users; in other cases, you might store a list of Units which match a query
617 (regardless of who formed the necessary Expression). Finally, you might
618 wish to manipulate lists of Units as sets: differences, intersections,
619 and unions. The <tt>engines</tt> module addresses all of these needs.</p>
620
621 <h4>Collections: Lists of Units</h4>
622 <p>The <tt>UnitCollection</tt> class provides a means of storing a list
623 of Units, or rather, a list of Unit ID's. You use its <tt>Type</tt>
624 property to indicate the class of the indexed Units. That value should be
625 the <b>name</b> of the Unit Class, <b>not</b> the class object itself
626 (this is different than most other calls in Dejavu). If you need to
627 retrieve the actual Unit class, call <tt>UnitCollection().unit_class()</tt>.</p>
628
629 <p><tt>UnitCollection</tt> itself subclasses <tt>dejavu.Unit</tt>; you can
630 therefore persist Unit Collections via Dejavu Storage Managers (most SM's,
631 anyway; it's recommended that SM's handle Unit Collections, but not
632 required. Check your SM to see if it does).</p>
633
634 <p>Each Collection has a thread lock (an RLock, actually) which you should
635 <tt>acquire()</tt> before you add an ID to the set, and <tt>release()</tt>
636 afterward. If you use the <tt>add(ID)</tt> method, this locking is done
637 for you.</p>
638
639 <p>When you need to retrieve the actual Units which are indexed by the
640 Collection, call the <tt>units(quota=None)</tt> method, which will
641 look up the Units and return them in a list. Since the Collection only
642 stores ID's, it is possible that one of the indexed Units may have been
643 destroyed since the list was built. The <tt>units</tt> method simply
644 passes over these "phantom" Units. You can inspect the full list of IDs
645 in the Collection (whether they reference existing Units or not) with
646 the <tt>ids()</tt> method.</p>
647
648 <p>Collections also provide a convenience function for grouping Units
649 by attribute: <tt>xdict(attr)</tt>. This function will look up each Unit
650 in the Collection, inspect the attribute that you specify, and return
651 a dictionary of the form <tt>{attr_val1: [Unit, Unit, ...]}</tt>.
652 Each distinct attribute value will have its own key, with a list of
653 matching Units as the value.</p>
654
655 <h4>Engines</h4>
656 <p>You can form Collections by hand, but a more powerful technique is
657 the <tt>UnitEngine</tt>, a factory for Collections. Engines are very
658 simple: they possess a set of <i>rules</i> which are executed when
659 you want to take a <i>snapshot</i> of Units. The snapshot which is
660 produced is a <tt>UnitCollection</tt> object. Whenever you call
661 <tt>take_snapshot()</tt>, the Engine will maintain an association
662 to the resulting Collection. You can access past snapshots with the
663 <tt>snapshots()</tt> method.</p>
664
665 <p>Engines are themselves Units, and can be persisted via Storage Managers.
666 The only properties they possess are: an <tt>ID</tt>, a <tt>Name</tt>,
667 an <tt>Owner</tt>, a <tt>FinalClassName</tt>, and <tt>Created</tt>,
668 the creation date of the Engine.</p>
669
670 <p>The <tt>Owner</tt> property should either be a user name, or one of the
671 reserved names: "Public" and "System". By default, the <tt>permit()</tt>
672 method allows a user read-access to the Engine if they are the Owner, or
673 the Owner is "Public" or "System". Write-access is permitted if the user
674 is the Owner, or the Owner is "Public". Feel free to override
675 <tt>permit()</tt> in a subclass to provide different behaviors.</p>
676
677 <p>The <tt>FinalClassName</tt> is set for you as you add Rules to the
678 Engine. You can use the value of this property, for example, to tell
679 your users, "Engine #23569 is an 'Armadillo' engine," when it produces
680 Collections of <tt>Armadillo</tt> Units. The only time you might want to
681 set this value is when you first create the Engine, before you have added
682 any Rules.</p>
683
684 <h4>Rules</h4>
685 <p>Just like Collections and Engines, <tt>UnitEngineRule</tt> is <i>also</i>
686 a subclass of <tt>Unit</tt>, and can be persisted via Storage Managers. All
687 three work together to provide a complete, dynamic, application-level query
688 generator.</p>
689
690 <p>Okay, so what are Rules? You might say they're a "little language",
691 with the following primitives, or "operations":</p>
692 <table>
693 <tr><th>Operation</th><th>Operand(s)</th><th>Description</th></tr>
694 <tr><th colspan='3'>Operations on a single set</th></tr>
695 <tr>
696     <td>CREATE</td>
697     <td>The classname of the new Type</td>
698     <td>Creates a new Set of the specified Type. All Units of that Type
699         are included in the new Set.</td>
700 </tr>
701 <tr>
702     <td>FILTER</td>
703     <td>A <tt>logic.Expression</tt></td>
704     <td>Removes Units from the current Set which do not match the
705         Expression.</td>
706 </tr>
707 <tr>
708     <td>FUNCTION</td>
709     <td>The name of a function in the <tt>Arena.engine_functions</tt>
710         dict</td>
711     <td>Calls the function, passing the current Set. The function
712         should modify the Set.</td>
713 </tr>
714 <tr>
715     <td>TRANSFORM</td>
716     <td>The classname of the new Type</td>
717     <td>Transform the current Set into a Set of associated Units
718         (of another Type). The association must be present in the
719         <tt>Arena.associations</tt> graph.</td>
720 </tr>
721 <tr>
722     <td>RETURN</td>
723     <td></td>
724     <td>Optional. If omitted, the last Set handled is returned as the
725         snapshot. If supplied, the ID of the Set to return.</td>
726 </tr>
727 <tr><th colspan='3'>Operations on two sets</th></tr>
728 <tr>
729     <td>COPY</td>
730     <td>The Set ID of the new Set</td>
731     <td>Copies the current Set to a new Set. The current Set is unchanged.</td>
732 </tr>
733 <tr>
734     <td>DIFFERENCE</td>
735     <td>The ID of the Set to mix in</td>
736     <td>Removes IDs from the current Set which exist in the second Set.</td>
737 </tr>
738 <tr>
739     <td>INTERSECTION</td>
740     <td>The ID of the Set to mix in</td>
741     <td>Removes IDs from the current Set which <i>do not</i> exist in the
742         second Set.</td>
743 </tr>
744 <tr>
745     <td>UNION</td>
746     <td>The ID of the Set to mix in</td>
747     <td>Adds any IDs to the current Set which exist in the second Set.</td>
748 </tr>
749 </table>
750
751 <p>Each Rule has an <tt>Operation</tt> property (a string, one of the above),
752 a <tt>SetID</tt>, and an <tt>Operand</tt>. Here's an example ruleset:</p>
753 <table>
754 <tr><th>Operation</th><th>SetID</th><th>Operand</th></tr>
755 <tr><td>CREATE</td><td>1</td><td>Invoice</td></tr>
756 <tr><td>FILTER</td><td>1</td><td>(Expression)</td></tr>
757 <tr><td>CREATE</td><td>2</td><td>Inventory</td></tr>
758 <tr><td>FILTER</td><td>2</td><td>(Expression)</td></tr>
759 <tr><td>TRANSFORM</td><td>2</td><td>Invoice</td></tr>
760 <tr><td>DIFFERENCE</td><td>1</td><td>2</td></tr>
761 <tr><td>RETURN</td><td>1</td><td></td></tr>
762 </table>
763
764 <p>As you can see, every Rule operates on a <i>Set</i> of Units. The first
765 rule is always to CREATE a set, declaring it to contain a certain Type
766 of Units. In most cases, you will then FILTER that set. If you simply
767 created a set and then returned it, it would contain all Units of the
768 declared Type. When you filter a set, howevr, you remove Units from
769 the whole which do not match the filter's Expression.</p>
770
771 <p>In the example above, we CREATE a second Set so that we can eventually
772 obtain the DIFFERENCE between Set 1 and Set 2. The second Set contains
773 Units of a different Type than the first. Once we filter Set 2, we then
774 TRANSFORM it; for each Inventory Unit, we look up associated Invoice
775 Units. Then, we find the difference between the two Invoice sets and
776 RETURN it.</p>
777
778 <p>Rules are executed in order according to their <tt>Sequence</tt>
779 attribute (lowest first). When you use the <tt>Engine.add_rule</tt> method,
780 the next <tt>Sequence</tt> value is retrieved for you. Notice that each
781 Rule belongs to one and only one Engine; they are not shared between
782 Engines. Each Rule has its own <tt>EngineID</tt> attribute.</p>
783
784 <h4>Engine Functions</h4>
785 <p>The FUNCTION rule deserves special mention. The Operand of a FUNCTION
786 rule is a string, a key in the <tt>Arena.engine_functions</tt> dictionary.
787 When the rule is executed, that key is used to look up the function, which
788 is then called, passing <tt>(sandbox, set)</tt>. The function should
789 mutate the set directly. Use FUNCTION rules to mutate sets in ways which
790 are more complex than those provided by FILTER and TRANSFORM. For example,
791 you might provide a function which removes all but the first Unit in the
792 Set (according to some ordering algorithm).</p>
793
794
795 <h3>Analysis Tools</h3>
796 <p>Dejavu includes various tools to help you manipulate groups of Units.</p>
797
798 <h4>Sorting Units</h4>
799 <p>When you recall Units, you receive a generator, and must iterate over
800 the values in some way. Often, this is accomplished with a list
801 comprehension:
802 <pre>f = logic.Expression(lambda x: 'Aa' in x.Name)
803 people = [x for x in sandbox.recall(Person, f)]
804 </pre>
805 However, the <tt>recall</tt> method doesn't do any sorting; you must sort
806 your list in your Python code. Dejavu provides a <tt>sort(attrs,
807 descending=False)</tt> function to assist you. It returns a function, which
808 you can then use in Python's sort function. Continuing our example:
809 <pre>sorted_people = people.sort(dejavu.sort('Size', 'Name'))</pre>
810 The most important issue (and the reason we don't just use 2.4's attrgetter),
811 is that any Unit property must allow values of None, which tends to raise
812 errors when compared to values of other types. The function which
813 <tt>sort</tt> creates for you treats None as "less than" any other value.</p>
814
815 <h4>Cross-tabulation</h4>
816 <p>Cross-tabs (also called <i>aggregate tables</i> or <i>pivot tables</i>)
817 display aggregate information about objects by category. For example,
818 rather than show a list of Safari records, one row per trip, you might
819 wish to show a table where each row represents a Destination, and each
820 column shows the count of Safaris to that Destination for each distinct
821 Year. In this example, we say that the Safaris are "grouped by" their
822 Destination values, and that we "pivot" on the Year values.</p>
823
824 <p>Dejavu helps you form such a table via the <tt>CrossTab</tt> class.
825 You need to specify the group(s) you wish to use, and the pivot attribute.
826 Finally, you must specify the aggregate function. Here's a code example:
827 <pre>
828 >>> data = ["a", "b", "cc", "bddd", "a4", "b6"]
829 >>> group = lambda x: x.isalpha()
830 >>> pivot = lambda x: x[0]
831 >>> ctab = analysis.CrossTab(data, [group], pivot, dejavu.COUNT)
832 >>> data, columns = ctab.results()
833 >>> data
834 {(True,): {"a": 1, "b": 2, "c": 1},
835  (False,): {"a": 1, "b": 1}}
836 >>> columns
837 ["a", "b", "c"]</pre>
838 You may notice that we're not using Units in our example; the
839 <tt>CrossTab</tt> class is designed to work with any objects. Here's one
840 way to lay out that data:</p>
841 <table>
842 <tr><th>Is Alpha</th><th>a</th><th>b</th><th>c</th></tr>
843 <tr><td>Y</td><td>1</td><td>2</td><td>1</td></tr>
844 <tr><td>N</td><td>1</td><td>1</td><td>0</td></tr>
845 </table>
846
847 <p>The <tt>results</tt> method returns two values. First, the table
848 itself in the form of a dictionary; each key is a tuple of group values,
849 and the corresponding value is a sub-dictionary. Each sub-dict has keys
850 which are the pivot attribute, and values which equal the aggregates.
851 I know, that was confusing; look at the example. The second value to
852 be returned is a list of the pivot column values; you'll notice they're
853 sorted.</p>
854
855 <p>The groups and pivot arguments may be either strings or functions.
856 If strings, they must be the names of attributes of the source objects.
857 The final aggfunc argument defaults to COUNT, but may also be SUM.
858 More aggfuncs may arrive in the future.</p>
859
860 <h3>The Arena Object</h3>
861 <p>The topmost class in Dejavu is the <tt>Arena</tt> class. When building
862 a Dejavu application, you must first create an instance of this class,
863 and must find a way to persist this object across client connections.
864 This can be achieved in multiple ways; web applications, for example,
865 will typically create a single process to serve all requests. Desktop
866 applications will probably create a single Arena object for each
867 running instance of the program.</p>
868
869 <h4>Loading Stores</h4>
870 <p>You <b>may</b> manually set up Storage Managers by calling
871 <tt>Arena().add_store(name, store, unitClasses)</tt>. But, you
872 probably shouldn't. Instead, allow your deployers to decide for
873 themselves which storage solution(s) to use. You can do this by calling
874 <tt>load(filename)</tt>; pass it the filename of an INI-style file
875 which your deployers can tweak without screwing up your Python code.
876 The next chapter in this reference is completely devoted to educating
877 deployers; point them to it or copy/modify it in your own release docs.</p>
878
879 <h4>Registering Unit Classes</h4>
880 <p>The <tt>Arena</tt> object maintains a registry of Unit classes called a
881 <tt>roster</tt>. A roster is like a three-way map between Unit classes,
882 their names, and their assigned StorageManagers. You shouldn't manipulate
883 this structure on your own; instead, use the <tt>register</tt> method to
884 register each Unit class.</p>
885
886 <p>The <tt>Arena</tt> object also manages the associations between Unit
887 classes in its <tt>associations</tt> attribute, which is a simple,
888 unweighted, undirected graph. In general, you should call
889 <tt>associate(cls, key, farClass, farKey)</tt> to add classes to this
890 graph. The only other common operation is to call
891 <tt>.associations.shortest_path(start, end)</tt>, to retrieve the
892 chain of associations between two Unit classes.</p>
893
894 <hr />
895
896 <p><a name='hettinger'>[1]</a> Python Cookbook,
897 <a href='http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/277940'>Binding
898 Constants at compile time</a><br />
899 </p>
900
901 </body>
902 </html>
Note: See TracBrowser for help on using the browser.