Introduction

Dejavu is a thread-safe Object-Relational Mapper for Python applications. It is designed to provide the "Model" third of an MVC application. When you build an application using Dejavu, you must supply the Controller(s) and View(s) yourself. Dejavu does not provide these, and does its best to not limit your choices regarding them.

If you're familiar with Martin Fowler's work [1], you can think of Dejavu as providing a Data Source layer, plus the tools to write your own Domain layer. For the Presentation layer, you're on your own. ;) It primarily uses a generic Data Mapper architecture (as opposed to the more tightly-coupled Active Record architecture).

Basic Structure

Developers build their Model by creating classes which subclass dejavu.Unit; in RDBMS terminology, each Unit subclass corresponds to a table; instances of the class correspond to the rows. Each subclass possesses a set of attributes known as "properties", which you can think of as columns in your database table. These attributes are generally formed from a UnitProperty descriptor. Any Unit data which needs to be persisted ought to be contained in a Unit Property. However, Unit classes can also possess arbitrary methods and attributes which aid their use within the application.

Unit classes can be associated to other Unit classes. This means that one of the properties of UnitA maps to one of the properties of UnitB. Related objects may then be looked up more easily.

Units are managed in memory by Sandbox objects, which function as "Identity Maps" [1]: in-memory caches of Units which keep commit conflicts to a minimum. Unit objects can be "memorized" and "recalled" from a Sandbox, using pure Python lambda expressions [2] as a query language. The lambda is wrapped in an Expression object to make it portable.

Sandboxes persist Unit data by StorageManager objects. Each persistence mechanism has its own subclass of the StorageManager class; for example, persisting Unit data to a Microsoft SQL Server database requires a StorageManagerADO_SQLServer object. When recalling data, Storage Managers receive Expression objects; database SM's, for example, will typically examine these Expressions and produce SQL statements from them, which they then use to retrieve data. Storage Managers also handle the creation of new Units, and their destruction.

Finally, Dejavu provides a core Arena class which you should be able to leverage for any sort of application you are building. The Arena object functions as a top-level "Application" object, collecting the global settings for an application into one place. It doles out Sandboxes, maintains a registry of Units and their associations, and manages startup and shutdown operations.

Simple Example

Since a block of code is often worth a thousand words, here's a minimal example of a Dejavu application:

zookeeper.py
import dejavu

class Zoo(dejavu.Unit):
    Name = dejavu.UnitProperty()
    Size = dejavu.UnitProperty(int)
    
    def total_legs(self):
        return sum([x.Legs for x in self.Animal()])

class Animal(dejavu.Unit):
    Legs = dejavu.UnitProperty(int, default=4)

Animal.set_properties({"Name": unicode,
                       "ZooID": int,
                       })
Animal.many_to_one('ZooID', Zoo, 'ID')

# Set up a global Arena object.
arena = dejavu.Arena()
conf = {u'Connect': r"PROVIDER=MICROSOFT.JET.OLEDB.4.0;DATA SOURCE=C:\zookeeper.mdb;"}
arena.add_store("main", "access", conf)
arena.register_all(globals())

The above creates the model for the zookeeper application. There are three basic things happening:

  1. The Zoo and Animal classes, which subclass dejavu.Unit. These will correspond to the Zoo and Animal tables within the database. Notice the two different methods of declaring Unit properties. Each class also inherits an 'ID' property (an int) from dejavu.Unit.
  2. The association between the Animal class and the Zoo class (many-to-one).
  3. The setup of a dejavu Arena object, including a Storage Manager which uses a Microsoft Access (Jet) database.

Here's a simple interactive session which uses the above (assume that tables have been created and populated elsewhere):

>>> import zookeeper
>>> box = zookeeper.arena.new_sandbox()
>>> box.recall(zookeeper.Animal)
[<zookeeper.Animal object at 0x013281F0>, <zookeeper.Animal object at 0x01328150>,
 <zookeeper.Animal object at 0x01328130>, <zookeeper.Animal object at 0x01328230>]
>>> box.recall(zookeeper.Zoo)
[]
>>> zoo = zookeeper.Zoo(Name='San Diego Zoo', Size='38')
>>> box.memorize(zoo)
>>> zoo.ID
1
>>> box.unit(zookeeper.Zoo, ID=1) is zoo
True
>>> for creature in box.recall(zookeeper.Animal):
        zoo.add(creature)
>>> len(zoo.Animal())
4

Design Goals

Dejavu is designed to function in environments with complex integration needs, and tends to separate concerns as much as possible. In particular, Dejavu tries to avoid making decisions in the framework which are better left to developers. Some of those decisions are:

In the same way, Dejavu tries to avoid having developers make decisions which are better left to deployers. Some of those decisions are:

Unlike most generic storage wrappers, Dejavu does not require you to have complete control of your back end. For example, consider Mission Control, the first application built on Dejavu. Mission Control required an ORM which transparently supported two very different backends. Half of the data was to be stored in an MS Access database, over which the application developers had full control. But half of the data was stored in a third-party application, "The Raiser's Edge" (RE) from Blackbaud. RE provides read-only database access; all writes must go through their object-oriented API. Further, reading via that API was found to be too slow. Therefore, a custom Storage Manager (about 2500 lines of code) was developed, which searches for and loads objects via SQL, but writes Unit data via the REAPI. Dejavu allows the application logic to be completely ignorant of this complex mass of storage details. If Blackbaud closed its doors tomorrow, the solution could be quickly migrated to another data store; business downtime is reduced in the face of inevitable change.

Obtaining and Installing

You can obtain Dejavu from its Subversion repository at http://projects.amor.org/dejavu/svn/trunk. Dejavu is designed to be installed in site-packages/dejavu or some other root python path.

Dejavu was built using Python 2.3.2. You should probably use at least 2.3; Dejavu depends upon the datetime module. Although Dejavu supports additional modules like fixedpoint and decimal, it does not require them.

Dejavu uses bytecode hacks, and therefore requires CPython [2].

Compared To Other Database Wrappers

SQLObject

No matter what project I start on, odds are I'll discover that Ian Bicking has already done the same thing, usually better.
See http://blog.ianbicking.org/another-less-sleepy-alternative-to-hibernate.html
Which was a reply to Ruby's ActiveRecord: http://www.loudthinking.com/arc/000297.html
Which was a reply to Java's Hibernate: http://informit.com/guides/content.asp?g=java&seqNum=127&f1=rss

Using dejavu, the application developer supplies the following code to define the Units and their relationships:

from dejavu import *
import fixedpoint   # or decimal, for Python 2.4+
import datetime

class Book(Unit):
    # The ID field is already set to 'int' for all Unit subclasses.
    title = UnitProperty(str)
    price = UnitProperty(fixedpoint.Fixedpoint)
    publishDate = UnitProperty(datetime.datetime)
    publisher = UnitProperty(int)
    
    def addAuthor(self, author):
        a = Authorship(authorID=author.ID, bookID=self.ID)
        self.sandbox.memorize(a)
    
    def author_names(self):
        names = []
        for authorship in self.Authorship():
            author = authorship.Author()
            if author:
                names.append(author.name)
        return u', '.join(names)

class Publisher(Unit):
    name = UnitProperty(str)

class Author(Unit):
    name = UnitProperty(str)

class Authorship(Unit):
    authorID = UnitProperty(int)
    bookID = UnitProperty(int)

Book.many_to_one('publisher', Publisher, 'ID')
Authorship.many_to_one('bookID', Book, 'ID')
Authorship.many_to_one('authorID', Author, 'ID')

arena = Arena()
arena.register_all(globals())

The deployer would write in a .conf file:

[Books]
Class: dejavu.storage.storepypgsql.StorageManagerPgSQL
Connect: host=localhost dbname=bookstore user=postgres password=****

To create the tables:

for cls in (Author, Publisher, Book):
    arena.create_storage(cls)

The app developer's runtime code reads as follows:

box = arena.new_sandbox()
ppython = Book(title='Programming Python', price=20,
               publishDate=datetime.datetime(2001, 3, 1))
# This next line is redundant; all properties default to None.
# But explicitness is rarely a bad thing.
ppython.publisher = None
box.memorize(ppython)

print ppython.title # output: 'Programming Python'

mlutz = Author(name = 'Mark Lutz')
box.memorize(mlutz) # give mlutz an ID
ppython.addAuthor(mlutz)

print len(ppython.Authorship()) # output: 1
print ppython.author_names() # output: 'Mark Lutz'

oreilly = Publisher(name="O'Reilly")
box.memorize(oreilly) # give oreilly an ID

ppython.publisher = oreilly.ID
print ppython.Publisher().name # output: "O'Reilly"

print len(oreilly.Book()) # output: 1

print 'Hi,', oreilly.Book().author_names() # output: "Hi, Mark Lutz"


[1] Fowler, Patterns of Enterprise Application Architecture.
[2] Dejavu relies upon bytecode hacking to achieve its clean lambda syntax for data queries. Therefore, it is CPython-specific. In addition, the bytecode of Python may change from one version of Python to another; if you find your version of Python does not work with Dejavu's codewalk and logic modules, please let me know.