Contact: fumanchu@aminus.org

Log in as guest/dejavu to create tickets

I think I've seen this ORM somewhere before...

root/trunk/dejavu/doc/advanced.html

Revision 484 (checked in by fumanchu, 6 years ago)

Doc updates.

Line 
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
2    "http://www.w3.org/TR/xhtml1/DTD/strict.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4
5 <head>
6     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
7     <title>Dejavu: Advanced Topics</title>
8     <link href='dejavu.css' rel='stylesheet' type='text/css' />
9 </head>
10
11 <body>
12
13 <h2>Advanced Topics</h2>
14 <p>As with all frameworks, Dejavu can't cover every need out-of-the-box.
15 However, Dejavu has been specifically designed to be hackable. In particular,
16 the creation of new Storage Managers is a well-defined process. Read on if
17 there's a feature you need that you might consider building yourself.</p>
18
19 <h3>Subclassing Sandbox</h3>
20 <p>Okay, I lied. There's not much I can think of that you'd want to do with
21 Sandboxes. Most things I <i>can</i> think of would be better implemented
22 as Storage Manager middleware. But if you think of any, let me know.</p>
23
24 <h3>Store Hacking</h3>
25 <p>The most common modification to a <tt>StorageManager</tt> object is to
26 use it as a dumping-ground for other application data. Since the store
27 should persist for the lifetime of the application's process, it can serve
28 as a decent top-level Application namespace. Since the only mandatory
29 argument to <tt>Sandbox.__init__</tt> is a store, you can pass Sandboxes
30 around in your code and always have access to the root store. If you use
31 another application framework for your front-end, just stick a reference
32 to it in your store and vice-versa. Python dynamic attributes to the rescue
33 again!</p>
34
35 <h3>Passing through SQL</h3>
36 <p>If you <i>want</i> to keep writing SQL, there's nothing stopping you
37 from doing so. If nothing else, it can be a handy way to prototype or
38 migrate an application, and then replace the SQL with Dejavu API calls
39 later on. You'll need to make your deployers aware that you're using SQL
40 directly (and which DB's your SQL runs on), so they don't try to deploy
41 your application with an unsupported store.</p>
42
43 <p>To avoid stale data, you should probably flush any sandboxes before
44 running your query, especially if it updates data. You should also run
45 the following snippet to flush CachingProxy or BurnedProxy SM's:</p>
46
47 <pre>store.sweep_all()</pre>
48
49 <p>Then, use the extension methods built into StorageManager classes' "db"
50 attribute to get data:</p>
51
52 <pre>>>> rows, cols = s.db.fetch("SELECT djvFields.Value, Count(djvCity.ID) AS NumCities "
53                             "FROM djvFields LEFT JOIN djvCity ON djvFields.ID = "
54                             "djvCity.Field GROUP BY djvFields.Value")
55 >>> cols     # [(name, type), (name, type), ...]
56 [(u'Value', 202), (u'NumCities', 3)]
57 >>> rows
58 [(u'Baja California', 3), (u'Ciudad Juarez', 1), (u'Puerto Penasco', 0), (u'Yucatan Peninsula', 1)]
59 </pre>
60
61 <p>...or update:</p>
62
63 <pre>>>> s.db.execute("UPDATE djvFields SET ShortCode = Left(Value, 1) WHERE ShortCode Is Null;")</pre>
64
65 <p>There are a <b>lot</b> of other things you can do with the builtin
66 Database, Schema, Table, Column, and SQLDecompiler classes. Feel free to open
67 them up in an interactive session and explore. All of the RDBMS Storage Managers
68 are built on top of an independent SQL layer (called
69 <a href='http://projects.amor.org/geniusql'>Geniusql</a>) that knows nothing
70 about units or Storage Managers (but it does understand Expressions).</p>
71
72 <h3>Custom Storage Managers</h3>
73 <p>In most cases, you will add new functionality to Dejavu itself by
74 creating a custom Storage Manager, whether for a new backend, or a custom
75 middleware component. Storage Managers must conform to a simple interface
76 for creating, destroying, and recalling Units. They are free to implement
77 that functionality however they like.</p>
78
79 <p>As you can see in the code, the <tt>storage.StorageManager</tt> base
80 class requires you to override most of its methods.</p>
81 <ul>
82     <li><tt>__init__(self, allOptions={}):</tt> Optional.
83         Place any startup code here, as each SM should only be instantiated
84         once (at app startup). Any additional arguments should be passed
85         in the allOptions dictionary (rather than modifying the signature).
86         You should expect the keys and values of allOptions to be strings.
87         </li>
88     <li><tt>reserve(self, unit):</tt> Required. Take the supplied Unit
89         instance and "make a space" for it in storage. The Unit does
90         not need to be fully populated. If the Unit has an ID when
91         passed to <tt>reserve</tt>, use it. If not, supply it using
92         the class' UnitSequencer. If your database provides equivalent
93         sequencing to dejavu Sequencers, feel free to use it. If not,
94         grab all existing distinct ID's (which you are storing),
95         and pass them to
96         <tt>unit.sequencer.assign(unit, ids)</tt>, which should assign the
97         next ID in the sequence to the Unit. Remember that when we say "ID"
98         we mean a tuple of identifier (UnitProperty attribute) values.
99         You should probably lock this whole method in a
100         <tt>threading.Lock</tt>.</li>
101     <li><tt>save(self, unit)</tt>: If <tt>not unit.dirty()</tt>, you can exit. Otherwise,
102         iterate through the Unit's properties and persist each value,
103         using an Adapter to coerce each value to the type expected by your
104         database. If you are able to persist all values,
105         call <tt>unit.cleanse()</tt> to mark the Unit as no longer dirty.
106         You are not required to persist any Unit attributes other than
107         UnitProperties.</li>
108     <li><tt>destroy(self, unit):</tt> Required. Remove the Unit's data
109         from storage permanently. For databases, this means
110         <tt>"DELETE FROM %s WHERE %s;" %
111         (table.qname, table.id_clause(**unit.properties))</tt>.</li>
112     <li><tt>recall(self, unitClass, expr=None):</tt> Required.
113         This method must return an iterable which yields fully-populated
114         Unit objects. The Units must be of the supplied unitClass, and
115         must match the Expression, if supplied. If an Expression is not
116         supplied, all stored Units of the specified class must be returned.
117         </li>
118     <li><tt>create_storage(self, unitClass):</tt> Optional. If you do not
119         override this method, it simply passes. If your Storage Manager
120         needs to set up tables or other structures per unitClass (and
121         almost all do at install time), use this method to do so.</li>
122     <li><tt>shutdown(self):</tt> Optional. If you do not override this
123         method, it simply passes. If your Storage Manager needs to be
124         explicitly closed when the application shuts down, add code to
125         do that here.</li>
126     <li><tt>distinct(self, cls, fields, expr=None):</tt> Recommended.
127         This method must return an iterable of (tuples of) distinct
128         UnitProperty values for the given field(s). The Units from which
129         the values are drawn must be of the supplied class (cls), and must
130         match the Expression (expr), if supplied. If an Expression is not
131         supplied, all stored Units of the specified class must be examined.
132         </li>
133     <li><tt>multirecall(self, classes, expr):</tt> Recommended.
134         The 'classes' argument will be a UnitJoin and its children.
135         This method must return an iterable of lists; each item in each list
136         will be a Unit. The Units must be of the supplied classes, in order
137         (see the UnitJoin.classes method), and must all match
138         expr(*resultset) together.
139         </li>
140 </ul>
141
142 <h4>Generic Database Wrappers</h4>
143 <p>Writing a Storage Manager for a database is relatively straightforward,
144 mostly because Dejavu doesn't have complicated storage interfaces or
145 demands. If you find your application depends heavily upon using advanced
146 features of a particular database, or upon hand-crafted SQL, then Dejavu
147 is not for you or your application. A Dejavu SM module for a database
148 usually includes:</p>
149 <ol>
150     <li>Adapters, which coerce values from Python types to database types
151         and back again. Base classes for DB Adapters can be found in
152         <tt>dejavu.storage.db</tt>.</li>
153     <li>An SQLDecompiler, which converts dejavu <tt>Expression</tt> objects
154         (essentially, Python lambdas) into SQL.</li>
155     <li>A subclass of <tt>geniusql.Database</tt>, which handles requests to
156         SELECT data using the above two components, as well as ALTER TABLE,
157         etc.</li>
158 </ol>
159
160 <h5>Adapters</h5>
161 <p>Generally, you will end up with three kinds of Adapters (subclasses of
162 <tt>storage.Adapter</tt>): one for converting Dejavu types to your database
163 types, another for the reverse (<tt>storage.db.AdapterFromDB</tt>), and
164 probably a third to insert Dejavu values (with proper quoting, etc.) into
165 SQL statements for your database (<tt>storage.db.AdapterToSQL</tt>).
166 The Adapter class provides a single public method, <tt>coerce(self, value,
167 dbtype, pytype)</tt>, which takes any value and attempts to return a new
168 value.</p>
169
170 <p><tt>adapter.coerce()</tt> handles a request by calling a sibling method (that
171 is, a method of the same subclass). Therefore, you need to add methods to
172 your Adapter for each Python type you wish to support. For example, if you
173 wish to coerce Python ints to INTEGER, you need to add the following method
174 to your Adapter subclass:
175 <pre>    def coerce_int_to_any(self, value):
176         return str(value)</pre>
177 Methods are named <tt>coerce_type1_to_type2</tt>, where 'type1' and 'type2'
178 are type names, one of them a Python type and the other a database type.
179 If your type name has dots in it, they will be converted to underscores.
180 If either of the type names is 'any', that method will be used if no
181 more-specific coercion method exists. Again, you can most likely use
182 methods in the base Adapter classes provided.</p>
183
184 <p>Your coercion method should receive a single value and return that value,
185 coerced to a type. An outbound adapter coerces from Python types to database
186 types. You supply a Dejavu UnitProperty value to <tt>coerce</tt>, and the
187 appropriate coercion method will be selected based upon the <tt>type()</tt>
188 of that value. An inbound adapter, on the other hand, coerces from DB types
189 to Python types. Call <tt>coerce</tt> with your database value <i>and</i>
190 the <tt>valuetype</tt> argument, which is then used to call the appropriate
191 coercion method. That method returns the value, coerced to
192 <tt>type(valuetype)</tt>, which the UnitProperty expects.
193
194 <p>If <tt>coerce</tt> cannot find a method for the appropriate Python type,
195 it errors, and rightly so. Don't let these errors pass silently! An earlier
196 version of Dejavu had a "default" coercion method, which was a Bad Idea.
197 Don't replicate it.</p>
198
199 <h5>Decompiler</h5>
200 <p>The SQLDecompiler is the tricky bit of any Storage Manager. You must
201 receive a Unit class and an Expression, and produce valid SQL for your
202 database from both. For example, given:
203 <pre>unitClass = Things
204 expr = logic.Expression(lambda x: x.Group == 3)</pre>
205 ...your decompiler should produce something like:
206 <pre>"SELECT * FROM [djvThings] WHERE [djvThings].[Group] = 3"</pre></p>
207
208 <p>The above example may seem trivial to you, but add in proper quoting,
209 diverse datatypes (like dates and decimals), complex operators (like 'in',
210 'Like', and 'ieq'), logic functions (like today() and iscurrentweek()),
211 null queries, and just-in-time keyword args, and it becomes complex very
212 quickly. You are, in effect, writing a mini-ORM.</p>
213
214 <p><b>But</b>, don't despair. Dejavu provides you with tools to make this
215 task easier:</p>
216 <ol>
217     <li>The most important tool is <tt>geniusql.select.SQLDecompiler</tt>, a complete
218         base class. You should be able to tweak it for most databases
219         with a couple of SQL syntax changes.</li>
220     <li><tt>SQLDecompiler</tt> is built on a simple Visitor-style base
221         class, <tt>codewalk.LambdaDecompiler</tt>. More complicated
222         extensions are easily added to this base class; each bytecode in
223         the Expression (Python lambda) gets its own method call.</li>
224     <li>You don't have to handle globals or cell references within the
225         lambda--when the lambda gets wrapped in an Expression, all free
226         variables are converted to constants.</li>
227     <li>You aren't <i>forced</i> to handle every possible operator, function,
228         or term in SQL. The base SQLDecompiler doesn't; when it encounters a
229         function it can't handle, for example, it punts by flagging the SQL
230         as <i>imperfect</i>. This signals the Storage Manager to run each
231         Unit through the lambda (in pure Python) before yielding it back to
232         the caller. In fact, you can start writing your Storage Manager
233         without a decompiler at all! Just return all stored Units of the
234         given class and use the Expression to filter whole Units. Then,
235         when your SM works, add a decompiler.</li>
236 </ol>
237
238
239 <h5>Database/Table/IndexSet</h5>
240 <p>You'll need a subclass of <tt>geniusql.Database</tt>. Override the
241 container methods (like <tt>__setitem__</tt> and <tt>__delitem__</tt>).
242 For most popular databases, these are pretty straightforward. Some notes:</p>
243 <ul>
244     <li><b>select/where</b>: If no Expression is supplied, return all Units.
245         Otherwise, use a decompiler to produce SQL which you can then use
246         to grab Unit data from storage. Use each row to populate a Unit
247         (use an Adapter for type coercion), and yield each Unit back to the
248         caller. In general, it's faster to slurp all the data in at once
249         than to make a separate call for each row.</li>
250     <li><b>_get_tables/_columns/_indices</b>: use your database's
251         schema-inspection tools to tell Dejavu the names, datatypes,
252         and other metadata that actually exists in each deployed
253         database.</li>
254 </ul>
255
256 <p>Database SM's should also define the methods <tt>create_database()</tt>
257 and <tt>drop_database()</tt>, if possible.</p>
258
259 <p>Use <tt>dejavu/test/zoo_fixture.py</tt> to test your new Storage
260 Manager. Copy one of the (<i>very</i> short) test_store* modules for the
261 other SM's, and make the necessary changes for your SM. All of the heavy
262 lifting of the tests is done in zoo_fixture.</p>
263
264 <h4>Legacy Database Wrappers</h4>
265 <p>Sometimes you do not have complete control over the database you want to
266 reference. In that case, you should probably still write a custom Storage
267 Manager, Adapters, and a Decompiler. Often, you can get away with providing
268 a simple column-to-Unit mapping to use as you decompile. I've built one, for
269 example, to wrap <a href='http://www.blackbaud.com/solutions/raisersedge.asp'>The
270 Raiser's Edge</a> (third-party fundraising software). My Dejavu model manages
271 directory records and income without regard for the underlying database;
272 a custom Storage Manager maps between that ideal model and the Raiser's
273 Edge API. This allows me to integrate data from RE with our custom
274 inventory, invoice, and scheduling software.</p>
275
276 <p>One of the more important parts of wrapping existing tables is getting
277 your pretty Python names mapped to ugly database names. Do this by making
278 a custom Database: override the <tt>_column_name</tt> and
279 <tt>table_name</tt> methods to do the mapping.</p>
280
281 <h4>Other Serialization Mechanisms</h4>
282 <h5>sockets</h5>
283 <p>There's a <tt>sockets</tt> module in the <tt>storage</tt> package.
284 It does simple serialization of Units across a socket, so you can run
285 Dejavu in its own process, separate from your front end. I had to do this
286 with a third-party database, which couldn't handle web-traffic threading
287 models. Here's a snippet of how to use it (from that app):
288 <pre>def query(self, cmd, unitType='', data=None):
289     if isinstance(data, dejavu.Unit):
290         data = stream(data)
291     elif data is None:
292         data = ''
293     else:
294         data = pickle.dumps(data)
295     response = self.socket.query(":".join((cmd, unitType, data)))
296     return response
297 </pre>
298 </p>
299
300 </body>
301 </html>
Note: See TracBrowser for help on using the browser.