| 1 |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" |
|---|
| 2 |
"http://www.w3.org/TR/xhtml1/DTD/strict.dtd"> |
|---|
| 3 |
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> |
|---|
| 4 |
|
|---|
| 5 |
<head> |
|---|
| 6 |
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> |
|---|
| 7 |
<title>Dejavu: Advanced Topics</title> |
|---|
| 8 |
<link href='dejavu.css' rel='stylesheet' type='text/css' /> |
|---|
| 9 |
</head> |
|---|
| 10 |
|
|---|
| 11 |
<body> |
|---|
| 12 |
|
|---|
| 13 |
<h2>Advanced Topics</h2> |
|---|
| 14 |
<p>As with all frameworks, Dejavu can't cover every need out-of-the-box. |
|---|
| 15 |
However, Dejavu has been specifically designed to be hackable. In particular, |
|---|
| 16 |
the creation of new Storage Managers is a well-defined process. Read on if |
|---|
| 17 |
there's a feature you need that you might consider building yourself.</p> |
|---|
| 18 |
|
|---|
| 19 |
<h3>Subclassing Sandbox</h3> |
|---|
| 20 |
<p>Okay, I lied. There's not much I can think of that you'd want to do with |
|---|
| 21 |
Sandboxes. Most things I <i>can</i> think of would be better implemented |
|---|
| 22 |
as Storage Manager middleware. But if you think of any, let me know.</p> |
|---|
| 23 |
|
|---|
| 24 |
<h3>Store Hacking</h3> |
|---|
| 25 |
<p>The most common modification to a <tt>StorageManager</tt> object is to |
|---|
| 26 |
use it as a dumping-ground for other application data. Since the store |
|---|
| 27 |
should persist for the lifetime of the application's process, it can serve |
|---|
| 28 |
as a decent top-level Application namespace. Since the only mandatory |
|---|
| 29 |
argument to <tt>Sandbox.__init__</tt> is a store, you can pass Sandboxes |
|---|
| 30 |
around in your code and always have access to the root store. If you use |
|---|
| 31 |
another application framework for your front-end, just stick a reference |
|---|
| 32 |
to it in your store and vice-versa. Python dynamic attributes to the rescue |
|---|
| 33 |
again!</p> |
|---|
| 34 |
|
|---|
| 35 |
<h3>Passing through SQL</h3> |
|---|
| 36 |
<p>If you <i>want</i> to keep writing SQL, there's nothing stopping you |
|---|
| 37 |
from doing so. If nothing else, it can be a handy way to prototype or |
|---|
| 38 |
migrate an application, and then replace the SQL with Dejavu API calls |
|---|
| 39 |
later on. You'll need to make your deployers aware that you're using SQL |
|---|
| 40 |
directly (and which DB's your SQL runs on), so they don't try to deploy |
|---|
| 41 |
your application with an unsupported store.</p> |
|---|
| 42 |
|
|---|
| 43 |
<p>To avoid stale data, you should probably flush any sandboxes before |
|---|
| 44 |
running your query, especially if it updates data. You should also run |
|---|
| 45 |
the following snippet to flush CachingProxy or BurnedProxy SM's:</p> |
|---|
| 46 |
|
|---|
| 47 |
<pre>store.sweep_all()</pre> |
|---|
| 48 |
|
|---|
| 49 |
<p>Then, use the extension methods built into StorageManager classes' "db" |
|---|
| 50 |
attribute to get data:</p> |
|---|
| 51 |
|
|---|
| 52 |
<pre>>>> rows, cols = s.db.fetch("SELECT djvFields.Value, Count(djvCity.ID) AS NumCities " |
|---|
| 53 |
"FROM djvFields LEFT JOIN djvCity ON djvFields.ID = " |
|---|
| 54 |
"djvCity.Field GROUP BY djvFields.Value") |
|---|
| 55 |
>>> cols # [(name, type), (name, type), ...] |
|---|
| 56 |
[(u'Value', 202), (u'NumCities', 3)] |
|---|
| 57 |
>>> rows |
|---|
| 58 |
[(u'Baja California', 3), (u'Ciudad Juarez', 1), (u'Puerto Penasco', 0), (u'Yucatan Peninsula', 1)] |
|---|
| 59 |
</pre> |
|---|
| 60 |
|
|---|
| 61 |
<p>...or update:</p> |
|---|
| 62 |
|
|---|
| 63 |
<pre>>>> s.db.execute("UPDATE djvFields SET ShortCode = Left(Value, 1) WHERE ShortCode Is Null;")</pre> |
|---|
| 64 |
|
|---|
| 65 |
<p>There are a <b>lot</b> of other things you can do with the builtin |
|---|
| 66 |
Database, Schema, Table, Column, and SQLDecompiler classes. Feel free to open |
|---|
| 67 |
them up in an interactive session and explore. All of the RDBMS Storage Managers |
|---|
| 68 |
are built on top of an independent SQL layer (called |
|---|
| 69 |
<a href='http://projects.amor.org/geniusql'>Geniusql</a>) that knows nothing |
|---|
| 70 |
about units or Storage Managers (but it does understand Expressions).</p> |
|---|
| 71 |
|
|---|
| 72 |
<h3>Custom Storage Managers</h3> |
|---|
| 73 |
<p>In most cases, you will add new functionality to Dejavu itself by |
|---|
| 74 |
creating a custom Storage Manager, whether for a new backend, or a custom |
|---|
| 75 |
middleware component. Storage Managers must conform to a simple interface |
|---|
| 76 |
for creating, destroying, and recalling Units. They are free to implement |
|---|
| 77 |
that functionality however they like.</p> |
|---|
| 78 |
|
|---|
| 79 |
<p>As you can see in the code, the <tt>storage.StorageManager</tt> base |
|---|
| 80 |
class requires you to override most of its methods.</p> |
|---|
| 81 |
<ul> |
|---|
| 82 |
<li><tt>__init__(self, allOptions={}):</tt> Optional. |
|---|
| 83 |
Place any startup code here, as each SM should only be instantiated |
|---|
| 84 |
once (at app startup). Any additional arguments should be passed |
|---|
| 85 |
in the allOptions dictionary (rather than modifying the signature). |
|---|
| 86 |
You should expect the keys and values of allOptions to be strings. |
|---|
| 87 |
</li> |
|---|
| 88 |
<li><tt>reserve(self, unit):</tt> Required. Take the supplied Unit |
|---|
| 89 |
instance and "make a space" for it in storage. The Unit does |
|---|
| 90 |
not need to be fully populated. If the Unit has an ID when |
|---|
| 91 |
passed to <tt>reserve</tt>, use it. If not, supply it using |
|---|
| 92 |
the class' UnitSequencer. If your database provides equivalent |
|---|
| 93 |
sequencing to dejavu Sequencers, feel free to use it. If not, |
|---|
| 94 |
grab all existing distinct ID's (which you are storing), |
|---|
| 95 |
and pass them to |
|---|
| 96 |
<tt>unit.sequencer.assign(unit, ids)</tt>, which should assign the |
|---|
| 97 |
next ID in the sequence to the Unit. Remember that when we say "ID" |
|---|
| 98 |
we mean a tuple of identifier (UnitProperty attribute) values. |
|---|
| 99 |
You should probably lock this whole method in a |
|---|
| 100 |
<tt>threading.Lock</tt>.</li> |
|---|
| 101 |
<li><tt>save(self, unit)</tt>: If <tt>not unit.dirty()</tt>, you can exit. Otherwise, |
|---|
| 102 |
iterate through the Unit's properties and persist each value, |
|---|
| 103 |
using an Adapter to coerce each value to the type expected by your |
|---|
| 104 |
database. If you are able to persist all values, |
|---|
| 105 |
call <tt>unit.cleanse()</tt> to mark the Unit as no longer dirty. |
|---|
| 106 |
You are not required to persist any Unit attributes other than |
|---|
| 107 |
UnitProperties.</li> |
|---|
| 108 |
<li><tt>destroy(self, unit):</tt> Required. Remove the Unit's data |
|---|
| 109 |
from storage permanently. For databases, this means |
|---|
| 110 |
<tt>"DELETE FROM %s WHERE %s;" % |
|---|
| 111 |
(table.qname, table.id_clause(**unit.properties))</tt>.</li> |
|---|
| 112 |
<li><tt>recall(self, unitClass, expr=None):</tt> Required. |
|---|
| 113 |
This method must return an iterable which yields fully-populated |
|---|
| 114 |
Unit objects. The Units must be of the supplied unitClass, and |
|---|
| 115 |
must match the Expression, if supplied. If an Expression is not |
|---|
| 116 |
supplied, all stored Units of the specified class must be returned. |
|---|
| 117 |
</li> |
|---|
| 118 |
<li><tt>create_storage(self, unitClass):</tt> Optional. If you do not |
|---|
| 119 |
override this method, it simply passes. If your Storage Manager |
|---|
| 120 |
needs to set up tables or other structures per unitClass (and |
|---|
| 121 |
almost all do at install time), use this method to do so.</li> |
|---|
| 122 |
<li><tt>shutdown(self):</tt> Optional. If you do not override this |
|---|
| 123 |
method, it simply passes. If your Storage Manager needs to be |
|---|
| 124 |
explicitly closed when the application shuts down, add code to |
|---|
| 125 |
do that here.</li> |
|---|
| 126 |
<li><tt>distinct(self, cls, fields, expr=None):</tt> Recommended. |
|---|
| 127 |
This method must return an iterable of (tuples of) distinct |
|---|
| 128 |
UnitProperty values for the given field(s). The Units from which |
|---|
| 129 |
the values are drawn must be of the supplied class (cls), and must |
|---|
| 130 |
match the Expression (expr), if supplied. If an Expression is not |
|---|
| 131 |
supplied, all stored Units of the specified class must be examined. |
|---|
| 132 |
</li> |
|---|
| 133 |
<li><tt>multirecall(self, classes, expr):</tt> Recommended. |
|---|
| 134 |
The 'classes' argument will be a UnitJoin and its children. |
|---|
| 135 |
This method must return an iterable of lists; each item in each list |
|---|
| 136 |
will be a Unit. The Units must be of the supplied classes, in order |
|---|
| 137 |
(see the UnitJoin.classes method), and must all match |
|---|
| 138 |
expr(*resultset) together. |
|---|
| 139 |
</li> |
|---|
| 140 |
</ul> |
|---|
| 141 |
|
|---|
| 142 |
<h4>Generic Database Wrappers</h4> |
|---|
| 143 |
<p>Writing a Storage Manager for a database is relatively straightforward, |
|---|
| 144 |
mostly because Dejavu doesn't have complicated storage interfaces or |
|---|
| 145 |
demands. If you find your application depends heavily upon using advanced |
|---|
| 146 |
features of a particular database, or upon hand-crafted SQL, then Dejavu |
|---|
| 147 |
is not for you or your application. A Dejavu SM module for a database |
|---|
| 148 |
usually includes:</p> |
|---|
| 149 |
<ol> |
|---|
| 150 |
<li>Adapters, which coerce values from Python types to database types |
|---|
| 151 |
and back again. Base classes for DB Adapters can be found in |
|---|
| 152 |
<tt>dejavu.storage.db</tt>.</li> |
|---|
| 153 |
<li>An SQLDecompiler, which converts dejavu <tt>Expression</tt> objects |
|---|
| 154 |
(essentially, Python lambdas) into SQL.</li> |
|---|
| 155 |
<li>A subclass of <tt>geniusql.Database</tt>, which handles requests to |
|---|
| 156 |
SELECT data using the above two components, as well as ALTER TABLE, |
|---|
| 157 |
etc.</li> |
|---|
| 158 |
</ol> |
|---|
| 159 |
|
|---|
| 160 |
<h5>Adapters</h5> |
|---|
| 161 |
<p>Generally, you will end up with three kinds of Adapters (subclasses of |
|---|
| 162 |
<tt>storage.Adapter</tt>): one for converting Dejavu types to your database |
|---|
| 163 |
types, another for the reverse (<tt>storage.db.AdapterFromDB</tt>), and |
|---|
| 164 |
probably a third to insert Dejavu values (with proper quoting, etc.) into |
|---|
| 165 |
SQL statements for your database (<tt>storage.db.AdapterToSQL</tt>). |
|---|
| 166 |
The Adapter class provides a single public method, <tt>coerce(self, value, |
|---|
| 167 |
dbtype, pytype)</tt>, which takes any value and attempts to return a new |
|---|
| 168 |
value.</p> |
|---|
| 169 |
|
|---|
| 170 |
<p><tt>adapter.coerce()</tt> handles a request by calling a sibling method (that |
|---|
| 171 |
is, a method of the same subclass). Therefore, you need to add methods to |
|---|
| 172 |
your Adapter for each Python type you wish to support. For example, if you |
|---|
| 173 |
wish to coerce Python ints to INTEGER, you need to add the following method |
|---|
| 174 |
to your Adapter subclass: |
|---|
| 175 |
<pre> def coerce_int_to_any(self, value): |
|---|
| 176 |
return str(value)</pre> |
|---|
| 177 |
Methods are named <tt>coerce_type1_to_type2</tt>, where 'type1' and 'type2' |
|---|
| 178 |
are type names, one of them a Python type and the other a database type. |
|---|
| 179 |
If your type name has dots in it, they will be converted to underscores. |
|---|
| 180 |
If either of the type names is 'any', that method will be used if no |
|---|
| 181 |
more-specific coercion method exists. Again, you can most likely use |
|---|
| 182 |
methods in the base Adapter classes provided.</p> |
|---|
| 183 |
|
|---|
| 184 |
<p>Your coercion method should receive a single value and return that value, |
|---|
| 185 |
coerced to a type. An outbound adapter coerces from Python types to database |
|---|
| 186 |
types. You supply a Dejavu UnitProperty value to <tt>coerce</tt>, and the |
|---|
| 187 |
appropriate coercion method will be selected based upon the <tt>type()</tt> |
|---|
| 188 |
of that value. An inbound adapter, on the other hand, coerces from DB types |
|---|
| 189 |
to Python types. Call <tt>coerce</tt> with your database value <i>and</i> |
|---|
| 190 |
the <tt>valuetype</tt> argument, which is then used to call the appropriate |
|---|
| 191 |
coercion method. That method returns the value, coerced to |
|---|
| 192 |
<tt>type(valuetype)</tt>, which the UnitProperty expects. |
|---|
| 193 |
|
|---|
| 194 |
<p>If <tt>coerce</tt> cannot find a method for the appropriate Python type, |
|---|
| 195 |
it errors, and rightly so. Don't let these errors pass silently! An earlier |
|---|
| 196 |
version of Dejavu had a "default" coercion method, which was a Bad Idea. |
|---|
| 197 |
Don't replicate it.</p> |
|---|
| 198 |
|
|---|
| 199 |
<h5>Decompiler</h5> |
|---|
| 200 |
<p>The SQLDecompiler is the tricky bit of any Storage Manager. You must |
|---|
| 201 |
receive a Unit class and an Expression, and produce valid SQL for your |
|---|
| 202 |
database from both. For example, given: |
|---|
| 203 |
<pre>unitClass = Things |
|---|
| 204 |
expr = logic.Expression(lambda x: x.Group == 3)</pre> |
|---|
| 205 |
...your decompiler should produce something like: |
|---|
| 206 |
<pre>"SELECT * FROM [djvThings] WHERE [djvThings].[Group] = 3"</pre></p> |
|---|
| 207 |
|
|---|
| 208 |
<p>The above example may seem trivial to you, but add in proper quoting, |
|---|
| 209 |
diverse datatypes (like dates and decimals), complex operators (like 'in', |
|---|
| 210 |
'Like', and 'ieq'), logic functions (like today() and iscurrentweek()), |
|---|
| 211 |
null queries, and just-in-time keyword args, and it becomes complex very |
|---|
| 212 |
quickly. You are, in effect, writing a mini-ORM.</p> |
|---|
| 213 |
|
|---|
| 214 |
<p><b>But</b>, don't despair. Dejavu provides you with tools to make this |
|---|
| 215 |
task easier:</p> |
|---|
| 216 |
<ol> |
|---|
| 217 |
<li>The most important tool is <tt>geniusql.select.SQLDecompiler</tt>, a complete |
|---|
| 218 |
base class. You should be able to tweak it for most databases |
|---|
| 219 |
with a couple of SQL syntax changes.</li> |
|---|
| 220 |
<li><tt>SQLDecompiler</tt> is built on a simple Visitor-style base |
|---|
| 221 |
class, <tt>codewalk.LambdaDecompiler</tt>. More complicated |
|---|
| 222 |
extensions are easily added to this base class; each bytecode in |
|---|
| 223 |
the Expression (Python lambda) gets its own method call.</li> |
|---|
| 224 |
<li>You don't have to handle globals or cell references within the |
|---|
| 225 |
lambda--when the lambda gets wrapped in an Expression, all free |
|---|
| 226 |
variables are converted to constants.</li> |
|---|
| 227 |
<li>You aren't <i>forced</i> to handle every possible operator, function, |
|---|
| 228 |
or term in SQL. The base SQLDecompiler doesn't; when it encounters a |
|---|
| 229 |
function it can't handle, for example, it punts by flagging the SQL |
|---|
| 230 |
as <i>imperfect</i>. This signals the Storage Manager to run each |
|---|
| 231 |
Unit through the lambda (in pure Python) before yielding it back to |
|---|
| 232 |
the caller. In fact, you can start writing your Storage Manager |
|---|
| 233 |
without a decompiler at all! Just return all stored Units of the |
|---|
| 234 |
given class and use the Expression to filter whole Units. Then, |
|---|
| 235 |
when your SM works, add a decompiler.</li> |
|---|
| 236 |
</ol> |
|---|
| 237 |
|
|---|
| 238 |
|
|---|
| 239 |
<h5>Database/Table/IndexSet</h5> |
|---|
| 240 |
<p>You'll need a subclass of <tt>geniusql.Database</tt>. Override the |
|---|
| 241 |
container methods (like <tt>__setitem__</tt> and <tt>__delitem__</tt>). |
|---|
| 242 |
For most popular databases, these are pretty straightforward. Some notes:</p> |
|---|
| 243 |
<ul> |
|---|
| 244 |
<li><b>select/where</b>: If no Expression is supplied, return all Units. |
|---|
| 245 |
Otherwise, use a decompiler to produce SQL which you can then use |
|---|
| 246 |
to grab Unit data from storage. Use each row to populate a Unit |
|---|
| 247 |
(use an Adapter for type coercion), and yield each Unit back to the |
|---|
| 248 |
caller. In general, it's faster to slurp all the data in at once |
|---|
| 249 |
than to make a separate call for each row.</li> |
|---|
| 250 |
<li><b>_get_tables/_columns/_indices</b>: use your database's |
|---|
| 251 |
schema-inspection tools to tell Dejavu the names, datatypes, |
|---|
| 252 |
and other metadata that actually exists in each deployed |
|---|
| 253 |
database.</li> |
|---|
| 254 |
</ul> |
|---|
| 255 |
|
|---|
| 256 |
<p>Database SM's should also define the methods <tt>create_database()</tt> |
|---|
| 257 |
and <tt>drop_database()</tt>, if possible.</p> |
|---|
| 258 |
|
|---|
| 259 |
<p>Use <tt>dejavu/test/zoo_fixture.py</tt> to test your new Storage |
|---|
| 260 |
Manager. Copy one of the (<i>very</i> short) test_store* modules for the |
|---|
| 261 |
other SM's, and make the necessary changes for your SM. All of the heavy |
|---|
| 262 |
lifting of the tests is done in zoo_fixture.</p> |
|---|
| 263 |
|
|---|
| 264 |
<h4>Legacy Database Wrappers</h4> |
|---|
| 265 |
<p>Sometimes you do not have complete control over the database you want to |
|---|
| 266 |
reference. In that case, you should probably still write a custom Storage |
|---|
| 267 |
Manager, Adapters, and a Decompiler. Often, you can get away with providing |
|---|
| 268 |
a simple column-to-Unit mapping to use as you decompile. I've built one, for |
|---|
| 269 |
example, to wrap <a href='http://www.blackbaud.com/solutions/raisersedge.asp'>The |
|---|
| 270 |
Raiser's Edge</a> (third-party fundraising software). My Dejavu model manages |
|---|
| 271 |
directory records and income without regard for the underlying database; |
|---|
| 272 |
a custom Storage Manager maps between that ideal model and the Raiser's |
|---|
| 273 |
Edge API. This allows me to integrate data from RE with our custom |
|---|
| 274 |
inventory, invoice, and scheduling software.</p> |
|---|
| 275 |
|
|---|
| 276 |
<p>One of the more important parts of wrapping existing tables is getting |
|---|
| 277 |
your pretty Python names mapped to ugly database names. Do this by making |
|---|
| 278 |
a custom Database: override the <tt>_column_name</tt> and |
|---|
| 279 |
<tt>table_name</tt> methods to do the mapping.</p> |
|---|
| 280 |
|
|---|
| 281 |
<h4>Other Serialization Mechanisms</h4> |
|---|
| 282 |
<h5>sockets</h5> |
|---|
| 283 |
<p>There's a <tt>sockets</tt> module in the <tt>storage</tt> package. |
|---|
| 284 |
It does simple serialization of Units across a socket, so you can run |
|---|
| 285 |
Dejavu in its own process, separate from your front end. I had to do this |
|---|
| 286 |
with a third-party database, which couldn't handle web-traffic threading |
|---|
| 287 |
models. Here's a snippet of how to use it (from that app): |
|---|
| 288 |
<pre>def query(self, cmd, unitType='', data=None): |
|---|
| 289 |
if isinstance(data, dejavu.Unit): |
|---|
| 290 |
data = stream(data) |
|---|
| 291 |
elif data is None: |
|---|
| 292 |
data = '' |
|---|
| 293 |
else: |
|---|
| 294 |
data = pickle.dumps(data) |
|---|
| 295 |
response = self.socket.query(":".join((cmd, unitType, data))) |
|---|
| 296 |
return response |
|---|
| 297 |
</pre> |
|---|
| 298 |
</p> |
|---|
| 299 |
|
|---|
| 300 |
</body> |
|---|
| 301 |
</html> |
|---|