Contact: fumanchu@aminus.org

Log in as guest/dejavu to create tickets

I think I've seen this ORM somewhere before...

root/trunk/doc/modeling.html

Revision 19 (checked in by fumanchu, 9 years ago)

1. Changed containers.Balloon class to .warehouse function.
2. Doc updates.

Line 
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
2    "http://www.w3.org/TR/xhtml1/DTD/strict.dtd">
3 <html xmlns="http://www.w3.org/TR/xhtml1/strict" xml:lang="en" lang="en">
4
5 <head>
6     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
7     <title>Dejavu: Modeling your Application</title>
8     <link href='dejavu.css' rel='stylesheet' type='text/css' />
9 </head>
10
11 <body>
12
13 <h2>Application Developers: Using Dejavu to Construct a Domain Model</h2>
14
15 <h3>Units</h3>
16 <p>When constructing a Domain Model for your application, you need
17 to have a distinction between data that will be persisted and data that
18 will not. At the most general level, you might say that some entire
19 <i>objects</i> need to be persisted. By registering a subclass of
20 <tt>dejavu.Unit</tt>, you specify a set of objects (instances of your
21 class) which will be persisted.</p>
22
23 <p>Before you can register your Unit class, you must create it:
24 <pre>import dejavu
25 class Printer(dejavu.Unit): pass</pre>
26 This is all you need for a fully-functioning Unit class. There are
27 no methods or attributes that you are required to override; simply
28 subclass from <tt>Unit</tt>. However, this is a fairly uninteresting
29 class. It doesn't provide any functionality other than what <tt>Unit</tt>
30 already provides. The first thing we will probably want to add to our
31 new class is persistent data.</p>
32
33 <h4>UnitProperty</h4>
34 <p>Once you have defined a persistent class (by subclassing <tt>Unit</tt>),
35 you need to make another decision. Rather than persist the entire object
36 <tt>dict</tt>, you specify a subset of persistent attributes by using
37 <tt>UnitProperty</tt>, a data descriptor. If you've used Python's builtin
38 property() construct, you've used descriptors before.</p>
39
40 <p>We might enhance our Printer example thusly:
41 <pre>from dejavu import Unit, UnitProperty
42 class Printer(Unit):
43     Manufacturer = UnitProperty('Manufacturer', unicode)
44     ColorCopies = UnitProperty('ColorCopies', bool)
45     PPM = UnitProperty('PPM', float)</pre>
46 This adds three persistent attributes to our <tt>Printer</tt> objects,
47 each with a different datatype. In addition, every subclass of <tt>Unit</tt>
48 inherits an 'ID' attribute, an int.</p>
49
50 <p>When you get and set <tt>UnitProperty</tt> attributes, they behave just
51 like any other attributes:
52 <pre>>>> p = Printer()
53 >>> p.PPM = 25
54 >>> p.PPM
55 25.0</pre>
56 However, you will notice right away that the int value we provided has been
57 coerced to a float behind the scenes. This is because we specified the PPM
58 attribute as a 'float' type when we created it. Unit Properties are
59 restricted to the types which you specify. The only other valid value
60 for a Unit Property is None; any Property may be None at any time, and
61 in fact, all Properties are None until you assign values to them:
62 <pre>>>> p.ColorCopies is None
63 True</pre></p>
64
65 <h4>Creating and Populating Properties</h4>
66 <p>In addition to defining Unit Properties within your class body,
67 you can define them after the class body has been executed via
68 the classmethod <tt>set_property()</tt>. For example, the following
69 two classes are equivalent:
70 <pre>class Publication(Unit):
71     Content = UnitProperty('Content', unicode)
72
73 class Publication(Unit): pass
74 Publication.set_property('Content', unicode)</pre>
75 Declarations outside of the class body allow more dynamic setting of
76 Unit properties. You can define multiple properties at once via
77 the <tt>set_properties()</tt> classmethod:
78 <pre>class Publication(Unit): pass
79 Publication.set_properties({'Content': unicode,
80                             'Publisher': unicode,
81                             'Year': int,
82                             })</pre>
83 </p>
84
85 <p>You also have options when populating Unit Properties. The standard way
86 is simply to reference them as normal Python instance attributes. However,
87 you may also use the <tt>adjust()</tt> method to modify multiple properties
88 at once; pass in keyword arguments which match the properties you wish to
89 modify. Keyword arguments also work when instantiating the object. For
90 example, the following three code snippets are equivalent:
91 <pre>pub = Publication()
92 pub.Publisher = 'Walter J. Black'
93 pub.Year = 1928
94
95 pub = Publication()
96 pub.adjust(Publisher='Walter J. Black', Year=1928)
97
98 pub = Publication(Publisher='Walter J. Black', Year=1928)</pre>
99 </p>
100
101 <h4>Unit Properties are First-Class Objects</h4>
102 <p>Like many descriptors, Unit Properties behave differently when you access
103 them from the class, rather than from an instance as above. When calling
104 them from the class, you receive the <tt>UnitProperty</tt> object itself,
105 rather than its value for a given instance. That is,
106 <pre>>>> c = Printer.ColorCopies
107 >>> c
108 &lt;dejavu.UnitProperty object at 0x01112970></pre>
109 This is significant, because it allows us to store metadata about the
110 property itself:
111 <pre>>>> c.key, c.index, c.type, c.hints
112 ('ColorCopies', False, &lt;type 'bool'>, {})</pre>
113 The <tt>key</tt> attribute is merely the property's canonical name. The
114 <tt>index</tt> value tells Storage Managers whether or not to index the
115 column. The <tt>type</tt> attribute limits property values to instances
116 of that type (or <tt>None</tt>). Finally, the <tt>hints</tt> dictionary
117 provides hints to Storage Managers to help optimize storage. A common use,
118 for example, is to inform Managers that would usually store unicode strings
119 as strings of length 255, that a particular value should be a larger object;
120 this is done with a 'Size' mapping, such as <tt>hints = {u'Size': 0}</tt>,
121 where 0 implies no limit.</p>
122
123 <h4>Triggers</h4>
124 <p>In addition, each UnitProperty has a <tt>pre</tt> and <tt>post</tt>
125 attribute, which default to None. If you override these with methods
126 in a subclass of <tt>UnitProperty</tt>, they will be called when setting
127 a new value for that property, either before (pre) or after (post) the
128 new value is set. For example:
129 <pre>class DatedProperty(UnitProperty):
130     def post(self, unit, value):
131         unit.Date = datetime.datetime.now().replace(microsecond=0)
132
133 class Topic(Unit):
134     Date = UnitProperty(u'Date', datetime.date)
135     Content = DatedProperty(u'Content')</pre>
136 In this example, whenever Topic().Content is set, the <tt>post</tt>
137 method will be called and the object's <tt>Date</tt> attribute will
138 be modified.</p>
139
140 <h4>Unit ID's</h4>
141 <p>The <tt>Unit</tt> base class possesses a single Unit Property, an int
142 named 'ID'. If you wish to use ID's of a different type, simply override
143 the ID attribute in your subclass:
144 <pre>class Printer(Unit):
145     ID = UnitProperty('ID', unicode)</pre>
146 Every Unit must possess an ID property. This ensures that each Unit within
147 the system is unique.</p>
148
149 <h4>Registration of Unit Classes</h4>
150 <p>In addition to defining your Unit class, you must also register that
151 class with your application's <tt>Arena</tt> object. Each class which
152 you want Dejavu to manage must be passed to <tt>Arena.register(cls)</tt>.
153 </p>
154
155 <h3>Sandboxes</h3>
156 <p>During the life of a client connection, your application should create
157 and use a <tt>Sandbox</tt> to manage the set of "live" Units. A Sandbox
158 manages the in-memory lifecycle of Units: creation, identity, mutation, and
159 destruction. Sandboxes route persistence operations on Units to the correct
160 Storage Manager.</p>
161
162 <p>You can create Sandbox objects directly. They take a single argument, the
163 top-level <tt>Arena</tt> object. Arenas also provide a convenience function,
164 <tt>new_sandbox</tt>, which does this for you. The following lines are
165 equivalent:
166 <pre>box = Sandbox(myArena)
167
168 box = myArena.new_sandbox()</pre>
169 You might often choose the latter when you have a reference to the Arena
170 object, and would rather avoid importing dejavu yet again just to obtain
171 the Sandbox class.</p>
172
173 <h4>Memorizing Units</h4>
174 <p>When you create a Unit instance, it exists in isolation. There is no
175 connection between that Unit and storage; your Unit will not be persisted,
176 because Dejavu doesn't yet possess a reference to your Unit. To provide
177 that link, you <i>memorize</i> your Unit (or rather, you tell your Sandbox
178 to memorize it):
179 <pre>class Publisher(Unit):
180     City = UnitProperty('City', unicode)
181
182 p = Publisher(ID='Walter J. Black')
183 box.memorize(p)</pre></p>
184
185 <p>Memorization does several things. First, it places your new Unit into
186 your Arena. That Unit instance will now be persisted by the appropriate
187 Storage Manager. It can be recalled from storage when needed, using the
188 built-in Expression syntax. It may have been given an ID (see
189 <u>Sequencing</u>, below). Memorization also makes your Unit
190 <i>concrete</i>; that is, your Unit will now possess a <tt>sandbox</tt>
191 attribute. Units whose <tt>sandbox</tt> attribute is not set (is None)
192 have no relationships, and their Unit Property triggers (if any) will
193 not fire.</p>
194
195 <p>You may define special methods on your Units to provide start-of-life
196 behaviors. If a Unit possesses an <tt>on_memorize</tt> method, it will
197 be called after the Unit has been 'reserved' in storage, and after the
198 Unit has ben placed in the Sandbox cache.</p>
199
200 <h4>Sequencing</h4>
201 <p>Every <tt>Unit</tt> has an <tt>ID</tt> property. The default ID property
202 is of type <tt>int</tt>; however, you can override that to whatever type
203 you like. As long as you provide your own IDs for Units, nothing will
204 break--you can memorize and recall Units without problems. However, if
205 you memorize a Unit with an ID of <tt>None</tt>, the Sandbox may attempt
206 to provide an ID for it.</p>
207
208 <p>The <tt>Unit</tt> base class possesses a <tt>sequencer</tt> attribute
209 to help Sandboxes generate new IDs. The default value is an instance of
210 <tt>UnitSequencerInteger</tt>, which examines all existing Units, finds
211 the maximum integer ID, adds 1, and uses that value for the new ID.</p>
212
213 <p>The other useful Sequencer is <tt>UnitSequencerNull</tt>, which simply
214 raises an error when asked to generate an ID. If your ID's are strings,
215 you'll probably want to make that class' <tt>.sequencer</tt> one of
216 these, and form ID values in your own code.</p>
217
218 <h4>Recalling</h4>
219 <p>Once you have memorized a Unit or two, you will probably want to
220 recall them at some point. Sandboxes possess two member functions to
221 accomplish this.</p>
222
223 <p>First, the appropriately named <tt>recall(cls, expr)</tt> function.
224 This is the full-blown query method. As a first argument, you pass it the
225 class (<b>not</b> the name of the class, but the actual class) of which you
226 expect to retrieve instances. The second argument should be an instance
227 of <tt>dejavu.logic.Expression</tt>, an object which encapsulates your
228 specific query (see <u>Expressions</u>, next). An example recall operation:
229 <pre>>>> e = logic.Expression(lambda x: x.Year == 1928)
230 >>> units = box.recall(Publication, e)
231 >>> [x.Title for x in units]
232 [u'The Giant Horse of Oz', u'Kai Lung Unrolls His Mat',
233  u'Tarzan, The Lord of the Jungle']
234 </pre>
235 If you do not supply an Expression, all Units of the given Unit class
236 will be retrieved. Notice that the return value is *not* a list; it is a
237 generator (or other iterable). You must iterate over it to retrieve all
238 values. By returning an iterator, we allow some Storage Managers to load
239 Units in a more lazy fashion. If this is a huge burden for you, let me
240 know; I might be convinced to add a <tt>recall_list</tt> method.</p>
241
242 <p>The <tt>recall</tt> method will take additional arguments in pairs of
243 <tt>cls</tt>, <tt>expr</tt>. This feature isn't fully developed yet.
244 It's designed to emulate JOINs, returning units which match each expr
245 and are related.</p>
246
247 <p>The <tt>recall</tt> method can be verbose. When you want a one-liner
248 and only expect a single Unit, use the <tt>unit(cls, **kw)</tt> method
249 of Sandboxes. Again, you pass the class of Units you wish to retrieve
250 as the first argument. Then, supply keyword arguments of the form
251 "property_name=value". The method will form an equivalent Expression
252 for you from the keyword args. For example:
253 <pre>>>> book = box.unit(Publication, ID=1)
254 >>> if book:
255 ...     print book.Title
256 u'Ladies in Hades'</pre>
257 If a Unit is not found that matches the criteria, None is returned.
258 If multiple Units match the criteria, only the first one is returned
259 (although the rest are probably loaded into memory).</p>
260
261 <h4>Forgetting and Repressing</h4>
262 <p>To <i>forget</i> a Unit is to destroy it forever. You have two options
263 for forgetting Units: you can call <tt>Sandbox().forget(unit)</tt> or
264 the simpler version, <tt>Unit().forget()</tt>. Either of these will clear
265 the Unit from the Sandbox' cache, and the Sandbox will tell the appropriate
266 Storage Manager to destroy the stored Unit data. If a Unit has not yet
267 been memorized, you do not need to forget it.</p>
268
269 <p>In some circumstances, you may wish to only clear the Unit from the
270 Sandbox without destroying it. You can do this by calling either
271 <tt>Sandbox().repress(unit)</tt> or the simpler version,
272 <tt>Unit().repress()</tt>.</p>
273
274 <p>You may define special methods on your Units to provide end-of-life
275 behaviors. If a Unit possesses an <tt>on_forget</tt> method, it will
276 be called after the Unit has been destroyed. If a Unit possesses an
277 <tt>on_repress</tt> method, it will be called <i>before</i> the Unit
278 has been repressed. I'm sure there was a good reason for this
279 disparity, but I've forgotten (or perhaps repressed) it.</p>
280
281 <h4>Flushing Sandboxes</h4>
282 <p>When the client connection has closed, you should <i>flush</i> the
283 Sandbox caches. In general, a single call to <tt>flush_all()</tt> will do
284 the trick. Notice that flushing calls <tt>repress()</tt> for each Unit in
285 the Sandbox, and any <tt>on_repress()</tt> triggers will be executed.</p>
286
287 <p>Some Units may have their <tt>temporary</tt> flag set. When you use
288 a <tt>CachingProxy</tt> as a Storage Manager, temporary Units will be
289 destroyed rather than persisted. This happens independently of forgetting
290 and sandbox flushing.</p>
291
292
293 <h4>Aggregate Functions</h4>
294 <p>Sandboxes also provide a <tt>distinct(cls, attrs, expr=None)</tt>
295 function. This returns values, rather than Units. Put simply, it returns
296 all distinct values for the given attribute(s) of the Unit class provided.
297 If only one attribute is specified, a list of values will be returned.
298 If more than one attribute is specified, a zipped list will be returned
299 of all distinct existing combinations. Providing an expr argument (an
300 <tt>Expression</tt> object, see below) will filter the set of Units before
301 obtaining distinct values.</p>
302
303 <p>The <tt>distinct</tt> function can also be used as a <tt>count</tt>
304 function by passing attrs = ['ID']. Sandboxes provide a
305 <tt>count(cls, expr)</tt> method which does just this.</p>
306
307 <h3>Querying</h3>
308 <p>When you retrieve Units, you often don't want to load the entire set for
309 a given class. In Dejavu, you filter the set according to the UnitProperty
310 attributes for each object. Naturally, there must be a way to express
311 the filter you intend. Dejavu actually provides three ways: Expressions,
312 <tt>filter</tt>, and <tt>comparison</tt>.</p>
313
314 <h4>The <tt>Expression</tt> class</h4>
315 <p>Regardless of which technique you use to express your filter, you're
316 going to end up with a <tt>logic.Expression</tt> object. You can build
317 an Expression directly, passing a single lambda as an argument:
318 <pre>>>> from dejavu import logic
319 >>> import datetime
320 >>> f = lambda x: x.Date >= datetime.date(2004, 3, 1)
321 >>> e = logic.Expression(f)
322 >>> e
323 logic.Expression(lambda x: x.Date >= datetime.date(2004, 3, 1))</pre>
324 Neat, eh? I worked hard on that __repr__. ;)</p>
325
326 <p>It may be obvious, but we'll be explicit, here. The lambda which you pass
327 into an Expression must possess a single positional argument, which will
328 always be bound to a Unit instance. In the example above, it's named 'x',
329 but you can use any name you like. Using lambdas as a base means that we
330 can simply call Expression.func(Unit), and receive a boolean value
331 indicating whether our Unit "passes the test". Attribute lookups on our
332 'x' object will apply to Unit Properties for that Unit object.
333 That is, <tt>x.Date</tt> becomes <tt>Unit.Date</tt>.</p>
334
335 <h4>Early binding</h4>
336 <p>What is not obvious from the above code snippet is perhaps the <b>most
337 important aspect</b> of Expressions: any globals or cell references (from
338 closures) in the supplied lambda get <b>bound early</b>. Compare the
339 following disassemblies:
340 <pre>>>> import dis
341 >>> dis.dis(f)
342   1           0 LOAD_FAST                0 (x)
343               3 LOAD_ATTR                1 (Date)
344               6 LOAD_GLOBAL              2 (datetime)
345               9 LOAD_ATTR                3 (date)
346              12 LOAD_CONST               1 (2004)
347              15 LOAD_CONST               2 (3)
348              18 LOAD_CONST               3 (1)
349              21 CALL_FUNCTION            3
350              24 COMPARE_OP               5 (>=)
351              27 RETURN_VALUE       
352 >>> dis.dis(e.func)
353   1           0 LOAD_FAST                0 (x)
354               3 LOAD_ATTR                1 (Date)
355               6 LOAD_CONST               6 (datetime.date(2004, 3, 1))
356               9 COMPARE_OP               5 (>=)
357              12 RETURN_VALUE       
358 </pre>
359 As you can see, the function itself references the global 'datetime' module.
360 Once we wrap it in the Expression, however, it becomes a constant! Thanks to
361 Raymond Hettinger for inspiring this solution <a href='#hettinger'>[1]</a>.
362 Early binding, however, implies two consequences:</p>
363
364 <p>First, any globals or cell references must be present in the lambda's
365 scope when it is passed into Expression(). This is the norm and shouldn't
366 require too much thought from you when you write Expressions. In the
367 example above, we simply imported <tt>datetime</tt> as you would expect.</p>
368
369 <p>Second, any globals or cell references must <b>also</b> be present in
370 the <tt>logic</tt> module's globals when the Expression is unpickled.
371 Pickling occurs when Expressions are sent over sockets, and also if
372 Expressions are themselves persisted to storage (for example, see
373 <u>Unit Engines</u>, below). This means your application should inject
374 globals into the <tt>logic</tt> module. Note that the <tt>logic</tt> module
375 already tries to import <tt>datetime</tt>, <tt>fixedpoint</tt> and
376 <tt>decimal</tt>.</p>
377
378 <h4>External functions within Expressions</h4>
379 <p>Dejavu provides additional functions which can be used in Expressions.
380 For example, you can construct an Expression like:
381 <pre>logic.Expression(lambda x: x.Size < 3 and x.Date > dejavu.today())</pre>
382 In this example, the <tt>today()</tt> function breaks convention and is
383 actually <b>bound late</b>. That is, if you construct this Expression now
384 and use it six months later, the value of <tt>today()</tt> will change.
385 Storage Managers "know about" these dejavu functions, and can use them
386 to build more appropriate queries. Here are the functions supplied by
387 the <tt>dejavu</tt> module:</p>
388
389 <table>
390 <tr><th>Function</th><th>Late bound?</th><th>Description</th></tr>
391 <tr>
392     <td><tt>icontains(a, b)</tt></td>
393     <td></td>
394     <td>Case-insensitive test b in a. Note the operand order.</td>
395 </tr>
396 <tr>
397     <td><tt>icontainedby(a, b)</tt></td>
398     <td></td>
399     <td>Case-insensitive test a in b. Note the operand order.</td>
400 </tr>
401 <tr>
402     <td><tt>istartswith(a, b)</tt></td>
403     <td></td>
404     <td>True if a starts with b (case-insensitive), False otherwise.</td>
405 </tr>
406 <tr>
407     <td><tt>iendswith(a, b)</tt></td>
408     <td></td>
409     <td>True if a ends with b (case-insensitive), False otherwise.</td>
410 </tr>
411 <tr>
412     <td><tt>ieq(a, b)</tt></td>
413     <td></td>
414     <td>True if a == b (case-insensitive), False otherwise.</td>
415 </tr>
416 <tr>
417     <td><tt>year(value)</tt></td>
418     <td></td>
419     <td>The year attribute of a date. If value is None, return None.</td>
420 </tr>
421 <tr>
422     <td><tt>now()</tt></td>
423     <td>Y</td>
424     <td>datetime.datetime.now()</td>
425 </tr>
426 <tr>
427     <td><tt>today()</tt></td>
428     <td>Y</td>
429     <td>datetime.date.today()</td>
430 </tr>
431 <tr>
432     <td><tt>iscurrentweek(value)</tt></td>
433     <td>Y</td>
434     <td>If value is in the current week, return True, else False.</td>
435 </tr>
436 </table>
437
438 <p>It is possible for you, the application developer, to define your
439 own external functions. However, because Storage Managers are unaware
440 of your new functions, they will not be able to optimize their use;
441 instead, they will simply retrieve a larger set of objects from storage,
442 evaluate each one against the function you provide, and return those
443 Units which match your function. This isn't necessarily a bad thing;
444 it provides the same functionality as if you wrote the test inline
445 within your own code. By making that test a logic function, you allow
446 it to be stored in Engine <i>rules</i> (see <u>Unit Engines</u>,
447 below).</p>
448
449 <h4>Combining Expressions</h4>
450 <p>Expressions are combinable; by using the <tt>&</tt> operator, the two
451 expressions are combined with an adjoining logical "and". For example:
452 <pre>>>> a = logic.Expression(lambda x: x.Size > 3)
453 >>> b = logic.Expression(lambda x: x.Size <= 15)
454 >>> c = a & b
455 >>> c
456 logic.Expression(lambda x: (x.Size > 3) and (x.Size <= 15))</pre>
457 The <tt>+</tt> operator works just like the <tt>&</tt> operator. The
458 <tt>|</tt> operator combines the two Expressions with a logical 'or'.</p>
459
460 <h4>Using <tt>filter</tt> to form Expressions</h4>
461 <p>The <tt>logic</tt> module also provides convenient methods to
462 create common types of Expression objects via the <tt>filter</tt> and
463 <tt>comparison</tt> factory functions.</p>
464
465 <p>The <tt>filter(**kwargs)</tt> function produces an Expression by taking
466 the keyword arguments you supply, and rewriting them in lambda form. The
467 only operator allowed is therefore the equals '==' operator. For example:
468 <pre>>>> logic.filter(Type='Cat', Mutation='Atomic')
469 logic.Expression(lambda x: (x.Type == 'Cat') and (x.Mutation == 'Atomic'))</pre>
470 </p>
471
472 <h4>Using <tt>comparison</tt> to form Expressions</h4>
473 <p>The <tt>comparison(attr, cmp_op, criteria)</tt> function allows you to
474 form Expressions with dynamic operators. This can come in handy when you
475 are constructing Expressions on the fly from user input. For example, a
476 search page might prompt users for an attribute name, an operator, and an
477 operand (the criteria).</p>
478
479 <p>Borrowing from <tt>opcode.cmp_op</tt>, the allowed values for our cmp_op
480 argument are as follows:</p>
481 <table>
482 <tr><th>Numeric Value (cmp_op)</th><th>Operator</th></tr>
483 <tr><td>0</td><td>&lt;</td></tr>
484 <tr><td>1</td><td>&lt;=</td></tr>
485 <tr><td>2</td><td>==</td></tr>
486 <tr><td>3</td><td>!=</td></tr>
487 <tr><td>4</td><td>&gt;</td></tr>
488 <tr><td>5</td><td>&gt;=</td></tr>
489 <tr><td>6</td><td>in</td></tr>
490 <tr><td>7</td><td>not in</td></tr>
491 <tr><td>8</td><td>is</td></tr>
492 <tr><td>9</td><td>is not</td></tr>
493 </table>
494
495 <p>Here's an example of using <tt>comparison</tt>:
496 <pre>>>> logic.comparison('Name', 3, 'Mr. Kamikaze')
497 logic.Expression(lambda x: x.Name != 'Mr. Kamikaze')</pre>
498 Although the comparison function only allows a single comparison at a time,
499 the resulting Expressions can be combined with the <tt>&</tt> and <tt>|</tt>
500 operators (described earlier) to produce more complex Expressions.</p>
501
502 <h4>Exporting the <tt>logic</tt> module</h4>
503 <p>The <tt>logic</tt> module (and <tt>codewalk</tt>, on which it is built)
504 isn't limited to Dejavu. Feel free to use it in some other framework or
505 script! The only change you may have to make (if you relocate the module
506 outside of the <tt>dejavu</tt> package) would be to the single line:
507 <tt>from dejavu import codewalk</tt>, to point to the new location.</p>
508
509 <p>In particular, <tt>logic.Expression</tt> objects can operate on <i>any</i>
510 Python object, not just dejavu <tt>Unit</tt> instances. If you wish to
511 provide additional logic functions (as dejavu does), simply inject them
512 into <tt>logic</tt>'s globals.</p>
513
514 <p>You may also find the underlying <tt>codewalk</tt> module useful for
515 other purposes on its own. The <tt>Visitor</tt> base class can be very
516 convenient for building bytecode hacks.</p>
517
518 <p>To make a long story short, Dejavu depends on <tt>logic</tt> throughout,
519 but the reverse is not true.</p>
520
521
522 <h3>Associations between Unit Classes</h3>
523 <p>Once you've put together some Unit classes, chances are you're going to
524 want to associate them. Generally, this is accomplished by creating a
525 property in the Unit_B class which stores IDs of Unit_A objects (which
526 might be called <i>foreign keys</i> in a database context).
527 <pre>class Archaeologist(Unit):
528     Height = UnitProperty('Height', float)
529
530 class Biography(Unit):
531     ArchID = UnitProperty('ArchID', int)</pre>
532 In this example, each <tt>Biography</tt> object will have an <tt>ArchID</tt>
533 attribute, which will equal the <tt>ID</tt> of some <tt>Archaeologist</tt>.
534 In Dejavu terms, we say that there is a <i>near class</i> (with a <i>near
535 key</i>) and a <i>far class</i> (with a <i>far key</i>). Associations in
536 Dejavu are not one-way, so it doesn't matter which class you choose for the
537 "near" one and which for the "far" one.</p>
538
539 <p>You could stop at this point in your design, and simply remember what
540 these keys are and how they relate, and manipulate them accordingly. But
541 Dejavu allows you to <i>register</i> these associations explicitly in your
542 <tt>Arena</tt>:
543 <pre>myArena.associate(Archaeologist, 'ID', Biography, 'ArchID')</pre>
544 You pass in the near class, the near key, the far class, and the far key.
545 </p>
546
547 <p>What does an explicit association buy for you? First, the <tt>associate</tt>
548 call adds an entry in the <tt>Arena.associations</tt> registry, so that
549 smart consumer code (like Unit Engine Rules, below) can automatically
550 follow association paths for you. Second, each Unit class has a private
551 <tt>_associations</tt> attribute, a <tt>dict</tt>. Each Unit involved
552 in the association gains an entry in that dict: the key is the far class
553 itself (not the class name), and the value is a tuple of (far key, near key).
554 Third, <tt>associate()</tt> can be used to register your Unit classes in
555 the Arena's <tt>roster</tt>; you don't have to call <tt>register</tt> for
556 either class if you call <tt>associate</tt> (see <u>The Arena Object</u>,
557 below).</p>
558
559 <p>In addition, each of the Unit classes will gain a new <i>synapse</i>
560 method which simplifies looking up related instances of the other class.
561 The new method for Unit_B will have the name of Unit_A, and vice-versa.
562 In our example:
563 <pre>>>> Archaeologist.Biography
564 &lt;unbound method Archaeologist.synapses>
565 >>> Eversley = Archaeologist(Height=(6.417))
566 >>> Eversley.Biography
567 &lt;bound method Archaeologist.synapses of &lt;__main__.Archaeologist
568 object at 0x011A1930>>
569 >>> bios = Eversley.Biography()
570 >>> bios
571 &lt;listiterator object at 0x012150D0>
572 >>> list(bios)
573 []
574 </pre>
575 We haven't created any Biographies, so there aren't any to be recalled,
576 which is why we get an empty iterator at this point. At the other extreme
577 (when you have hundreds of Biographies to filter), you can pass an optional
578 <tt>Expression</tt> object to the synapse method. When you do, the list of
579 associated Units will be filtered accordingly.</p>
580
581 <p>Because the synapse method names are formed automatically, you need
582 to take care not to use the names of Unit classes for your Unit properties.
583 In our example, we used "ArchID" for the name of our "foreign key".
584 If we had used "Archaeologist" instead, we would have had problems;
585 when we associated the classes, the <i>property</i> named "Archaeologist"
586 would have collided with the <i>synapse method</i> named "Archaeologist".
587 Be careful when naming your properties, and plan for the future.</p>
588
589 <p>Unlike some other ORM's, Dejavu doesn't cache far Units within
590 the near Unit. Each time you call the synapse method, the data is recalled
591 from your Sandbox. It is quite probable that those far Units are still
592 sitting in memory in the Sandbox, but they're not going to persist in
593 the near Unit itself in any way.</p>
594
595 <p>Finally, some of you may want to override the default synapse methods.
596 Feel free; <tt>Arena.associate</tt> takes two optional arguments, which
597 should be callables that return the new function(s). See the source code
598 of <tt>Arena</tt> and the private method <tt>dejavu._synapses_func</tt>
599 for more information.</p>
600
601
602 <h3>Unit Engines</h3>
603 <p>Once you've created and associated your Unit classes, you can begin to
604 write "business logic" code (mostly inside those classes, we hope), and
605 "presentation logic" code (mostly outside those classes). In most cases,
606 you will construct Expressions within your own code manually to retrieve
607 Units. Sometimes, however, you need to persist query parameters from your
608 users; in other cases, you might store a list of Units which match a query
609 (regardless of who formed the necessary Expression). Finally, you might
610 wish to manipulate lists of Units as sets: differences, intersections,
611 and unions. The <tt>engines</tt> module addresses all of these needs.</p>
612
613 <h4>Collections: Lists of Units</h4>
614 <p>The <tt>UnitCollection</tt> class provides a means of storing a list
615 of Units, or rather, a list of Unit ID's. You use its <tt>Type</tt>
616 property to indicate the class of the indexed Units. That value should be
617 the <b>name</b> of the Unit Class, <b>not</b> the class object itself
618 (this is different than most other calls in Dejavu). If you need to
619 retrieve the actual Unit class, call <tt>UnitCollection().unit_class()</tt>.</p>
620
621 <p><tt>UnitCollection</tt> itself subclasses <tt>dejavu.Unit</tt>; you can
622 therefore persist Unit Collections via Dejavu Storage Managers (most SM's,
623 anyway; it's recommended that SM's handle Unit Collections, but not
624 required. Check your SM to see if it does).</p>
625
626 <p>Each Collection has a thread lock (an RLock, actually) which you should
627 <tt>acquire()</tt> before you add an ID to the set, and <tt>release()</tt>
628 afterward. If you use the <tt>add(ID)</tt> method, this locking is done
629 for you.</p>
630
631 <p>When you need to retrieve the actual Units which are indexed by the
632 Collection, call the <tt>units(quota=None)</tt> method, which will
633 look up the Units and return them in a list. Since the Collection only
634 stores ID's, it is possible that one of the indexed Units may have been
635 destroyed since the list was built. The <tt>units</tt> method simply
636 passes over these "phantom" Units. You can inspect the full list of IDs
637 in the Collection (whether they reference existing Units or not) with
638 the <tt>ids()</tt> method.</p>
639
640 <p>Collections also provide a convenience function for grouping Units
641 by attribute: <tt>xdict(attr)</tt>. This function will look up each Unit
642 in the Collection, inspect the attribute that you specify, and return
643 a dictionary of the form <tt>{attr_val1: [Unit, Unit, ...]}</tt>.
644 Each distinct attribute value will have its own key, with a list of
645 matching Units as the value.</p>
646
647 <h4>Engines</h4>
648 <p>You can form Collections by hand, but a more powerful technique is
649 the <tt>UnitEngine</tt>, a factory for Collections. Engines are very
650 simple: they possess a set of <i>rules</i> which are executed when
651 you want to take a <i>snapshot</i> of Units. The snapshot which is
652 produced is a <tt>UnitCollection</tt> object. Whenever you call
653 <tt>take_snapshot()</tt>, the Engine will maintain an association
654 to the resulting Collection. You can access past snapshots with the
655 <tt>snapshots()</tt> method.</p>
656
657 <p>Engines are themselves Units, and can be persisted via Storage Managers.
658 The only properties they possess are: an <tt>ID</tt>, a <tt>Name</tt>,
659 an <tt>Owner</tt>, a <tt>FinalClassName</tt>, and <tt>Created</tt>,
660 the creation date of the Engine.</p>
661
662 <p>The <tt>Owner</tt> property should either be a user name, or one of the
663 reserved names: "Public" and "System". By default, the <tt>permit()</tt>
664 method allows a user read-access to the Engine if they are the Owner, or
665 the Owner is "Public" or "System". Write-access is permitted if the user
666 is the Owner, or the Owner is "Public". Feel free to override
667 <tt>permit()</tt> in a subclass to provide different behaviors.</p>
668
669 <p>The <tt>FinalClassName</tt> is set for you as you add Rules to the
670 Engine. You can use the value of this property, for example, to tell
671 your users, "Engine #23569 is an 'Armadillo' engine," when it produces
672 Collections of <tt>Armadillo</tt> Units. The only time you might want to
673 set this value is when you first create the Engine, before you have added
674 any Rules.</p>
675
676 <h4>Rules</h4>
677 <p>Just like Collections and Engines, <tt>UnitEngineRule</tt> is <i>also</i>
678 a subclass of <tt>Unit</tt>, and can be persisted via Storage Managers. All
679 three work together to provide a complete, dynamic, application-level query
680 generator.</p>
681
682 <p>Okay, so what are Rules? You might say they're a "little language",
683 with the following primitives, or "operations":</p>
684 <table>
685 <tr><th>Operation</th><th>Operand(s)</th><th>Description</th></tr>
686 <tr><th colspan='3'>Operations on a single set</th></tr>
687 <tr>
688     <td>CREATE</td>
689     <td>The classname of the new Type</td>
690     <td>Creates a new Set of the specified Type. All Units of that Type
691         are included in the new Set.</td>
692 </tr>
693 <tr>
694     <td>FILTER</td>
695     <td>A <tt>logic.Expression</tt></td>
696     <td>Removes Units from the current Set which do not match the
697         Expression.</td>
698 </tr>
699 <tr>
700     <td>FUNCTION</td>
701     <td>The name of a function in the <tt>Arena.engine_functions</tt>
702         dict</td>
703     <td>Calls the function, passing the current Set. The function
704         should modify the Set.</td>
705 </tr>
706 <tr>
707     <td>TRANSFORM</td>
708     <td>The classname of the new Type</td>
709     <td>Transform the current Set into a Set of associated Units
710         (of another Type). The association must be present in the
711         <tt>Arena.associations</tt> graph.</td>
712 </tr>
713 <tr>
714     <td>RETURN</td>
715     <td></td>
716     <td>Optional. If omitted, the last Set handled is returned as the
717         snapshot. If supplied, the ID of the Set to return.</td>
718 </tr>
719 <tr><th colspan='3'>Operations on two sets</th></tr>
720 <tr>
721     <td>COPY</td>
722     <td>The Set ID of the new Set</td>
723     <td>Copies the current Set to a new Set. The current Set is unchanged.</td>
724 </tr>
725 <tr>
726     <td>DIFFERENCE</td>
727     <td>The ID of the Set to mix in</td>
728     <td>Removes IDs from the current Set which exist in the second Set.</td>
729 </tr>
730 <tr>
731     <td>INTERSECTION</td>
732     <td>The ID of the Set to mix in</td>
733     <td>Removes IDs from the current Set which <i>do not</i> exist in the
734         second Set.</td>
735 </tr>
736 <tr>
737     <td>UNION</td>
738     <td>The ID of the Set to mix in</td>
739     <td>Adds any IDs to the current Set which exist in the second Set.</td>
740 </tr>
741 </table>
742
743 <p>Each Rule has an <tt>Operation</tt> property (a string, one of the above),
744 a <tt>SetID</tt>, and an <tt>Operand</tt>. Here's an example ruleset:</p>
745 <table>
746 <tr><th>Operation</th><th>SetID</th><th>Operand</th></tr>
747 <tr><td>CREATE</td><td>1</td><td>Invoice</td></tr>
748 <tr><td>FILTER</td><td>1</td><td>(Expression)</td></tr>
749 <tr><td>CREATE</td><td>2</td><td>Inventory</td></tr>
750 <tr><td>FILTER</td><td>2</td><td>(Expression)</td></tr>
751 <tr><td>TRANSFORM</td><td>2</td><td>Invoice</td></tr>
752 <tr><td>DIFFERENCE</td><td>1</td><td>2</td></tr>
753 <tr><td>RETURN</td><td>1</td><td></td></tr>
754 </table>
755
756 <p>As you can see, every Rule operates on a <i>Set</i> of Units. The first
757 rule is always to CREATE a set, declaring it to contain a certain Type
758 of Units. In most cases, you will then FILTER that set. If you simply
759 created a set and then returned it, it would contain all Units of the
760 declared Type. When you filter a set, howevr, you remove Units from
761 the whole which do not match the filter's Expression.</p>
762
763 <p>In the example above, we CREATE a second Set so that we can eventually
764 obtain the DIFFERENCE between Set 1 and Set 2. The second Set contains
765 Units of a different Type than the first. Once we filter Set 2, we then
766 TRANSFORM it; for each Inventory Unit, we look up associated Invoice
767 Units. Then, we find the difference between the two Invoice sets and
768 RETURN it.</p>
769
770 <p>Rules are executed in order according to their <tt>Sequence</tt>
771 attribute (lowest first). When you use the <tt>Engine.add_rule</tt> method,
772 the next <tt>Sequence</tt> value is retrieved for you. Notice that each
773 Rule belongs to one and only one Engine; they are not shared between
774 Engines. Each Rule has its own <tt>EngineID</tt> attribute.</p>
775
776 <h4>Engine Functions</h4>
777 <p>The FUNCTION rule deserves special mention. The Operand of a FUNCTION
778 rule is a string, a key in the <tt>Arena.engine_functions</tt> dictionary.
779 When the rule is executed, that key is used to look up the function, which
780 is then called, passing <tt>(sandbox, set)</tt>. The function should
781 mutate the set directly. Use FUNCTION rules to mutate sets in ways which
782 are more complex than those provided by FILTER and TRANSFORM. For example,
783 you might provide a function which removes all but the first Unit in the
784 Set (according to some ordering algorithm).</p>
785
786
787 <h3>Analysis Tools</h3>
788 <p>Dejavu includes various tools to help you manipulate groups of Units.</p>
789
790 <h4>Sorting Units</h4>
791 <p>When you recall Units, you receive a generator, and must iterate over
792 the values in some way. Often, this is accomplished with a list
793 comprehension:
794 <pre>f = logic.Expression(lambda x: 'Aa' in x.Name)
795 people = [x for x in sandbox.recall(Person, f)]
796 </pre>
797 However, the <tt>recall</tt> method doesn't do any sorting; you must sort
798 your list in your Python code. Dejavu provides a <tt>sort(attrs,
799 descending=False)</tt> function to assist you. It returns a function, which
800 you can then use in Python's sort function. Continuing our example:
801 <pre>sorted_people = people.sort(dejavu.sort('Size', 'Name'))</pre>
802 The most important issue (and the reason we don't just use 2.4's attrgetter),
803 is that any Unit property must allow values of None, which tends to raise
804 errors when compared to values of other types. The function which
805 <tt>sort</tt> creates for you treats None as "less than" any other value.</p>
806
807 <h4>Cross-tabulation</h4>
808 <p>Cross-tabs (also called <i>aggregate tables</i> or <i>pivot tables</i>)
809 display aggregate information about objects by category. For example,
810 rather than show a list of Safari records, one row per trip, you might
811 wish to show a table where each row represents a Destination, and each
812 column shows the count of Safaris to that Destination for each distinct
813 Year. In this example, we say that the Safaris are "grouped by" their
814 Destination values, and that we "pivot" on the Year values.</p>
815
816 <p>Dejavu helps you form such a table via the <tt>CrossTab</tt> class.
817 You need to specify the group(s) you wish to use, and the pivot attribute.
818 Finally, you must specify the aggregate function. Here's a code example:
819 <pre>
820 >>> data = ["a", "b", "cc", "bddd", "a4", "b6"]
821 >>> group = lambda x: x.isalpha()
822 >>> pivot = lambda x: x[0]
823 >>> ctab = analysis.CrossTab(data, [group], pivot, dejavu.COUNT)
824 >>> data, columns = ctab.results()
825 >>> data
826 {(True,): {"a": 1, "b": 2, "c": 1},
827  (False,): {"a": 1, "b": 1}}
828 >>> columns
829 ["a", "b", "c"]</pre>
830 You may notice that we're not using Units in our example; the
831 <tt>CrossTab</tt> class is designed to work with any objects. Here's one
832 way to lay out that data:</p>
833 <table>
834 <tr><th>Is Alpha</th><th>a</th><th>b</th><th>c</th></tr>
835 <tr><td>Y</td><td>1</td><td>2</td><td>1</td></tr>
836 <tr><td>N</td><td>1</td><td>1</td><td>0</td></tr>
837 </table>
838
839 <p>The <tt>results</tt> method returns two values. First, the table
840 itself in the form of a dictionary; each key is a tuple of group values,
841 and the corresponding value is a sub-dictionary. Each sub-dict has keys
842 which are the pivot attribute, and values which equal the aggregates.
843 I know, that was confusing; look at the example. The second value to
844 be returned is a list of the pivot column values; you'll notice they're
845 sorted.</p>
846
847 <p>The groups and pivot arguments may be either strings or functions.
848 If strings, they must be the names of attributes of the source objects.
849 The final aggfunc argument defaults to COUNT, but may also be SUM.
850 More aggfuncs may arrive in the future.</p>
851
852 <h3>The Arena Object</h3>
853 <p>The topmost class in Dejavu is the <tt>Arena</tt> class. When building
854 a Dejavu application, you must first create an instance of this class,
855 and must find a way to persist this object across client connections.
856 This can be achieved in multiple ways; web applications, for example,
857 will typically create a single process to serve all requests. Desktop
858 applications will probably create a single Arena object for each
859 running instance of the program.</p>
860
861 <h4>Loading Stores</h4>
862 <p>You <b>may</b> manually set up Storage Managers by calling
863 <tt>Arena().add_store(name, store, unitClasses)</tt>. But, you
864 probably shouldn't. Instead, allow your deployers to decide for
865 themselves which storage solution(s) to use. You can do this by calling
866 <tt>load(filename)</tt>; pass it the filename of an INI-style file
867 which your deployers can tweak without screwing up your Python code.
868 The next chapter in this reference is completely devoted to educating
869 deployers; point them to it or copy/modify it in your own release docs.</p>
870
871 <h4>Registering Unit Classes</h4>
872 <p>The <tt>Arena</tt> object maintains a registry of Unit classes called a
873 <tt>roster</tt>. A roster is like a three-way map between Unit classes,
874 their names, and their assigned StorageManagers. You shouldn't manipulate
875 this structure on your own; instead, use the <tt>register</tt> method to
876 register each Unit class.</p>
877
878 <p>The <tt>Arena</tt> object also manages the associations between Unit
879 classes in its <tt>associations</tt> attribute, which is a simple,
880 unweighted, undirected graph. In general, you should call
881 <tt>associate(cls, key, farClass, farKey)</tt> to add classes to this
882 graph. The only other common operation is to call
883 <tt>.associations.shortest_path(start, end)</tt>, to retrieve the
884 chain of associations between two Unit classes.</p>
885
886 <hr />
887
888 <p><a name='hettinger'>[1]</a> Python Cookbook,
889 <a href='http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/277940'>Binding
890 Constants at compile time</a><br />
891 </p>
892
893 </body>
894 </html>
Note: See TracBrowser for help on using the browser.