Contact: fumanchu@aminus.org

Log in as guest/dejavu to create tickets

I think I've seen this ORM somewhere before...

root/tags/1.4.0/doc/advanced.html

Revision 123 (checked in by fumanchu, 6 years ago)

Fix for #36 (column names not correctly escaped). Changes to StorageManagerDB (and its subclasses):

  1. identifier(*atoms) method changed to sql_name(name, quoted=True).
  2. New column_name(classname, name, full=False, quoted=True) method.
  3. SQLDecompiler now calls the StorageManager's column_name method (so decompilers now take the SM as a constructor arg).
  4. identifier_length is now sql_name_max_length.
  5. identifier_caseless is now sql_name_caseless.
Line 
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
2    "http://www.w3.org/TR/xhtml1/DTD/strict.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4
5 <head>
6     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
7     <title>Dejavu: Advanced Topics</title>
8     <link href='dejavu.css' rel='stylesheet' type='text/css' />
9 </head>
10
11 <body>
12
13 <h2>Advanced Topics</h2>
14 <p>As with all frameworks, Dejavu can't cover every need out-of-the-box.
15 However, Dejavu has been specifically designed to be hackable. In particular,
16 the creation of new Storage Managers is a well-defined process. Read on if
17 there's a feature you need that you might consider building yourself.</p>
18
19 <h3>Subclassing Sandbox</h3>
20 <p>Okay, I lied. There's not much I can think of that you'd want to do with
21 Sandboxes. Most things I <i>can</i> think of would be better implemented
22 as Storage Manager middleware. But if you think of any, let me know.</p>
23
24 <h3>Arena Hacking</h3>
25 <p>The most common modification to an <tt>Arena</tt> object is to use it
26 as a dumping-ground for other application data. Since the <tt>Arena</tt>
27 should persist for the lifetime of the application's process, it can serve
28 as a decent top-level Application namespace. Since the only mandatory
29 argument to <tt>Sandbox.__init__</tt> is an Arena, you can pass Sandboxes
30 around in your code and always have access to the global Arena. If you use
31 another application framework for your front-end, just stick a reference
32 to it in your Arena and vice-versa. Python dynamic attributes to the rescue
33 again!</p>
34
35 <h3>Passing through SQL</h3>
36 <p>If you <i>want</i> to keep writing SQL, there's nothing stopping you
37 from doing so. If nothing else, it can be a handy way to prototype or
38 migrate an application, and then replace the SQL with Dejavu API calls
39 later on. You'll need to make your deployers aware that you're using SQL
40 directly (and which DB's your SQL runs on), so they don't try to deploy
41 your application with an unsupported store.</p>
42
43 <p>To avoid stale data, you should probably flush any sandboxes before
44 running your query, especially if it updates data. You should also run
45 the following snippet to flush CachingProxy or BurnedProxy SM's:</p>
46
47 <pre>for store in arena.stores.itervalues():
48     if hasattr(store, "sweep_all"):
49         store.sweep_all()</pre>
50
51 <p>Once you've got a clean slate, obtain a reference to the StorageManager.
52 It should always be a subclass of <tt>dejavu.storage.db.StorageManagerDB</tt>:</p>
53
54 <pre>>>> s = arena.stores['MControl']    # or...
55 >>> s = arena.stores.itervalues[0]  # or...
56 >>> s = arena.storage(cls)
57 >>> s.__class__.__mro__
58 (&lt;class 'dejavu.storage.storeado.StorageManagerADO_MSAccess'>,
59  &lt;class 'dejavu.storage.storeado.StorageManagerADO'>,
60  &lt;class 'dejavu.storage.db.StorageManagerDB'>,
61  &lt;class 'dejavu.storage.StorageManager'>,
62  &lt;type 'object'>)
63 </pre>
64
65 <p>Then, use the extension methods (built into db StorageManager classes)
66 to get data:</p>
67
68 <pre>>>> rows, cols = s.fetch("SELECT djvFields.Value, Count(djvCity.ID) AS NumCities "
69                          "FROM djvFields LEFT JOIN djvCity ON djvFields.ID = "
70                          "djvCity.Field GROUP BY djvFields.Value")
71 >>> coldefs     # [(name, type), (name, type), ...]
72 [(u'Value', 202), (u'NumCities', 3)]
73 >>> rowlist
74 [(u'Baja California', 3), (u'Ciudad Juarez', 1), (u'Puerto Penasco', 0), (u'Yucatan Peninsula', 1)]
75 </pre>
76
77 <p>...or update:</p>
78
79 <pre>>>> s.execute("UPDATE djvFields SET ShortCode = Left(Value, 1) WHERE ShortCode Is Null;")</pre>
80
81 <p>There are a <b>lot</b> of other things you can do with the builtin
82 StorageManagerDB and SQLDecompiler classes. Feel free to open them up
83 in an interactive session and explore.</p>
84
85 <h3>Custom Storage Managers</h3>
86 <p>In most cases, you will add new functionality to Dejavu itself by
87 creating a custom Storage Manager, whether for a new backend, or a custom
88 middleware component. Storage Managers must conform to a simple interface
89 for creating, destroying, and recalling Units. They are free to implement
90 that functionality however they like.</p>
91
92 <p>As you can see in the code, the <tt>storage.StorageManager</tt> base
93 class requires you to override most of its methods.</p>
94 <ul>
95     <li><tt>__init__(self, name, arena, allOptions={}):</tt> Optional.
96         Place any startup code here, as each SM should only be instantiated
97         once (at Arena startup). Any additional arguments should be passed
98         in the allOptions dictionary (rather than modifying the signature).
99         You should expect the keys and values of allOptions to be strings.
100         </li>
101     <li><tt>reserve(self, unit):</tt> Required. Take the supplied Unit
102         instance and "make a space" for it in storage. The Unit does
103         not need to be fully populated. If the Unit has an ID when
104         passed to <tt>reserve</tt>, use it. If not, supply it using
105         the class' UnitSequencer. If your database provides equivalent
106         sequencing to dejavu Sequencers, feel free to use it.</li>
107     <li><tt>save(self, unit, forceSave=False):</tt> Required. Take the
108         supplied Unit instance and persist its properties to storage.
109         You are not required to persist any Unit attributes other than
110         UnitProperties.</li>
111     <li><tt>destroy(self, unit):</tt> Required. Remove the Unit's data
112         from storage permanently.</li>
113     <li><tt>recall(self, unitClass, expr=None):</tt> Required.
114         This method must return an iterable which yields fully-populated
115         Unit objects. The Units must be of the supplied unitClass, and
116         must match the Expression, if supplied. If an Expression is not
117         supplied, all stored Units of the specified class must be returned.
118         </li>
119     <li><tt>create_storage(self, unitClass):</tt> Optional. If you do not
120         override this method, it simply passes. If your Storage Manager
121         needs to set up tables or other structures per unitClass (and
122         almost all do at install time), use this method to do so.</li>
123     <li><tt>shutdown(self):</tt> Optional. If you do not override this
124         method, it simply passes. If your Storage Manager needs to be
125         explicitly closed when the application shuts down, add code to
126         do that here.</li>
127     <li><tt>distinct(self, cls, fields, expr=None):</tt> Recommended.
128         This method must return an iterable of (tuples of) distinct
129         UnitProperty values for the given field(s). The Units from which
130         the values are drawn must be of the supplied class (cls), and must
131         match the Expression (expr), if supplied. If an Expression is not
132         supplied, all stored Units of the specified class must be examined.
133         </li>
134     <li><tt>multirecall(self, classes, expr):</tt> Recommended.
135         The 'classes' argument will be a UnitJoin and its children.
136         This method must return an iterable of lists; each item in each list
137         will be a Unit. The Units must be of the supplied classes, in order
138         (see the UnitJoin.classes method), and must all match
139         expr(*resultset) together.
140         </li>
141 </ul>
142
143 <h4>Generic Database Wrappers</h4>
144 <p>Writing a Storage Manager for a database is relatively straightforward,
145 mostly because Dejavu doesn't have complicated storage interfaces or
146 demands. If you find your application depends heavily upon using advanced
147 features of a particular database, or upon hand-crafted SQL, then Dejavu
148 is not for you or your application. A Dejavu SM module for a database
149 usually consists of:</p>
150 <ol>
151     <li>Adapters, which coerce values from Python types to database types
152         and back again. Base classes for DB Adapters can be found in
153         <tt>dejavu.storage.db</tt>.</li>
154     <li>An SQLDecompiler, which converts dejavu <tt>Expression</tt> objects
155         (essentially, Python lambdas) into SQL.</li>
156     <li>A subclass of <tt>StorageManager</tt>, which handles requests to
157         create, find, save, and destroy Unit data using the above two
158         components.</li>
159 </ol>
160
161 <h5>Adapters</h5>
162 <p>Generally, you will end up with three kinds of Adapters (subclasses of
163 <tt>storage.Adapter</tt>): one for converting Dejavu types to your database
164 types, another for the reverse (<tt>storage.db.AdapterFromDB</tt>), and
165 probably a third to insert Dejavu values (with proper quoting, etc.) into
166 SQL statements for your database (<tt>storage.db.AdapterToSQL</tt>).
167 The Adapter class provides a single public method, <tt>coerce(self, value,
168 valuetype=None)</tt>, which takes any value and attempts to return a new
169 value.</p>
170
171 <p><tt>.coerce()</tt> handles a request by calling a sibling method (that
172 is, a method of the same subclass). Therefore, you need to add methods to
173 your Adapter for each Python type you wish to support. For example, if you
174 wish to coerce Python ints to strings, you need to add the following method
175 to your Adapter subclass:
176 <pre>    def coerce_int(self, value):
177         return str(value)</pre>
178 Methods are named <tt>coerce_</tt>, plus the Python type name. Again, you
179 can most likely use methods in the base Adapter classes provided.</p>
180
181 <p>Your coercion method should receive a single value and return that value,
182 coerced to a type. An outbound adapter coerces from Dejavu types to database
183 types. You supply a Dejavu UnitProperty value to <tt>coerce</tt>, and the
184 appropriate coercion method will be selected based upon the <tt>type()</tt>
185 of that value. An inbound adapter, on the other hand, coerces from DB types
186 to Dejavu types. Call <tt>coerce</tt> with your database value <i>and</i>
187 the <tt>valuetype</tt> argument, which is then used to call the appropriate
188 coercion method. That method returns the value, coerced to
189 <tt>type(valuetype)</tt>, which the UnitProperty expects. In both cases,
190 the name of each coercion method takes the Dejavu/Python type name, not the
191 database type name.</p>
192
193 <p>If <tt>coerce</tt> cannot find a method for the appropriate Python type,
194 it errors, and rightly so. Don't let these errors pass silently! An earlier
195 version of Dejavu had a "default" coercion method, which was a Bad Idea.
196 Don't replicate it.</p>
197
198 <h5>Decompiler</h5>
199 <p>The SQLDecompiler is the tricky bit of any Storage Manager. You must
200 receive a Unit class and an Expression, and produce valid SQL for your
201 database from both. For example, given:
202 <pre>unitClass = Things
203 expr = logic.Expression(lambda x: x.Group == 3)</pre>
204 ...your decompiler should produce something like:
205 <pre>"SELECT * FROM [djvThings] WHERE [djvThings].[Group] = 3"</pre></p>
206
207 <p>The above example may seem trivial to you, but add in proper quoting,
208 diverse datatypes (like dates and decimals), complex operators (like 'in',
209 'Like', and 'ieq'), logic functions (like today() and iscurrentweek()),
210 null queries, and just-in-time keyword args, and it becomes complex very
211 quickly. You are, in effect, writing a mini-ORM.</p>
212
213 <p><b>But</b>, don't despair. Dejavu provides you with tools to make this
214 task easier:</p>
215 <ol>
216     <li>The most important tool is <tt>db.SQLDecompiler</tt>, a complete
217         base class. You should be able to tweak it for most databases
218         with a couple of SQL syntax changes.</li>
219     <li><tt>SQLDecompiler</tt> is built on a simple Visitor-style base
220         class, <tt>codewalk.LambdaDecompiler</tt>. More complicated
221         extensions are easily added to this base class; each bytecode in
222         the Expression (Python lambda) gets its own method call.</li>
223     <li>You don't have to handle globals or cell references within the
224         lambda--when the lambda gets wrapped in an Expression, all free
225         variables are converted to constants.</li>
226     <li>You aren't <i>forced</i> to handle every possible operator, function,
227         or term in SQL. The base SQLDecompiler doesn't; when it encounters a
228         function it can't handle, for example, it punts by flagging the SQL
229         as <i>imperfect</i>. This signals the Storage Manager to run each
230         Unit through the lambda (in pure Python) before yielding it back to
231         the caller. In fact, you can start writing your Storage Manager
232         without a decompiler at all! Just return all stored Units of the
233         given class and use the Expression to filter whole Units. Then,
234         when your SM works, add a decompiler.</li>
235 </ol>
236
237
238 <h5>StorageManager</h5>
239 <p>You'll need a subclass of <tt>storage.StorageManager</tt>. Override the
240 methods described at the top of this section (__init__,  reserve, save,
241 destroy, recall, create_storage, and shutdown). For databases, these are
242 <i>very</i> straightforward. Some notes:</p>
243 <ul>
244     <li><b>__init__</b>: You should call the superclass at some point, via
245         <tt>storage.StorageManager.__init__(self, name, arena,
246         allOptions)</tt>. Set up threading, connection pools, or mutexes
247         here. You are free to specify your own items in the allOptions
248         dictionary; you are responsible to communicate custom options to
249         people who deploy using your SM.</li>
250     <li><b>reserve</b>: If the supplied Unit doesn't have an ID, you need
251         to create one (unless this is a pass-through SM). Grab all existing
252         distinct ID's (which you are storing), and pass them to
253         <tt>unit.sequencer.assign(unit, ids)</tt>, which should assign the
254         next ID in the sequence to the Unit. Remember that when we say "ID"
255         we mean a tuple of identifier (UnitProperty attribute) values.
256         You should probably lock this whole method in a
257         <tt>threading.Lock</tt>.</li>
258     <li><b>save</b>: If <tt>not unit.dirty()</tt>, you can exit. Otherwise,
259         iterate through the Unit's properties() and persist each value,
260         using an Adapter to coerce each value to the type expected by your
261         database (or not, it's up to you; storeado special-cases
262         UnitCollection, for example). If you are able to persist all values,
263         call <tt>unit.cleanse()</tt> to mark the Unit as no longer dirty.
264         </li>
265     <li><b>destroy</b>: <tt>"DELETE FROM [%s] WHERE %s;" %
266         (self.tablename(unit), self.id_clause(unit))</tt>. That's all.</li>
267     <li><b>recall</b>: If no Expression is supplied, return all Units.
268         Otherwise, use a decompiler to produce SQL which you can then use
269         to grab Unit data from storage. Use each row to populate a Unit
270         (use an Adapter for type coercion), and yield each Unit back to the
271         caller. In general, it's faster to slurp all the data in at once
272         than to make a separate call for each row.</li>
273     <li><b>create_storage</b>: <tt>CREATE TABLE</tt> using
274         <tt>Unit.properties()</tt>, especially the <tt>UnitProperty.type</tt>
275         attribute. If <tt>UnitProperty.index == True</tt>, make a
276         <tt>CREATE INDEX</tt> call.</li>
277     <li><b>shutdown</b>: Close any connections.</li>
278 </ul>
279
280 <p>Database SM's should also define the methods <tt>create_database()</tt>
281 and <tt>drop_database()</tt>, if possible.</p>
282
283 <p>Use <tt>dejavu/test/zoo_fixture.py</tt> to test your new Storage
284 Manager. Copy one of the (<i>very</i> short) test_store* modules for the
285 other SM's, and make the necessary changes for your SM. All of the heavy
286 lifting of the tests is done in zoo_fixture.</p>
287
288 <h4>Legacy Database Wrappers</h4>
289 <p>Sometimes you do not have complete control over the database you want to
290 reference. In that case, you should probably still write a custom Storage
291 Manager, Adapters, and a Decompiler. Often, you can get away with providing
292 a simple column-to-Unit mapping to use as you decompile. I've built one, for
293 example, to wrap <a href='http://www.blackbaud.com/solutions/raisersedge.asp'>The
294 Raiser's Edge</a> (third-party fundraising software). My Dejavu model manages
295 directory records and income without regard for the underlying database;
296 a custom Storage Manager maps between that ideal model and the Raiser's
297 Edge API. This allows me to integrate data from RE with our custom
298 inventory, invoice, and scheduling software.</p>
299
300 <p>One of the more important parts of wrapping existing tables is getting
301 your pretty Python names mapped to ugly database names. Do this by making
302 a custom StorageManager: override the <tt>column_name</tt> and
303 <tt>table_name</tt> methods to do the mapping.</p>
304
305 <h4>Other Serialization Mechanisms</h4>
306 <h5>sockets</h5>
307 <p>There's a <tt>sockets</tt> module in the <tt>storage</tt> package.
308 It does simple serialization of Units across a socket, so you can run
309 Dejavu in its own process, separate from your front end. I had to do this
310 with a third-party database, which couldn't handle web-traffic threading
311 models. Here's a snippet of how to use it (from that app):
312 <pre>def query(self, cmd, unitType='', data=None):
313     if isinstance(data, dejavu.Unit):
314         data = stream(data)
315     elif data is None:
316         data = ''
317     else:
318         data = pickle.dumps(data)
319     response = self.socket.query(":".join((cmd, unitType, data)))
320     return response
321 </pre>
322 </p>
323
324 </body>
325 </html>
Note: See TracBrowser for help on using the browser.