The Dejavu Object-Relational Mapper
Abstract
--------
Dejavu is a framework for developing and deploying applications with
persistent storage.
CONTENTS
========
1. Introduction
1. Purpose
2. Requirements
3. Terminology
4. Overview
2. Models
1. Units
2. Unit Properties
3. Unit Associations
4. Unit Joins
3. Storage
1. Data Definition
2. Registration
3. Data Manipulation
4. Transactions
5. Environment
6. Conflicts
4. Sandboxes
5. Partitioning
1. Introduction
Dejavu is an Object-Relational Mapper system for Python applications.
1.1 Purpose
This specification defines the composition and interaction of Dejavu
components.
1.2 Requirements
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.
See http://www.ietf.org/rfc/rfc2119.txt.
An implementation is not compliant if it fails to satisfy one or more of
the MUST or REQUIRED level requirements for the protocols it implements.
An implementation that satisfies all the MUST or REQUIRED level and all the
SHOULD level requirements for its protocols is said to be "unconditionally
compliant"; one that satisfies all the MUST level requirements but not all
the SHOULD level requirements for its protocols is said to be "conditionally
compliant."
1.3 Terminology
persistent
The quality of retaining state across an environmental boundary
(scope). Such scopes include, but are not limited to, process,
thread, request, and session scopes.
unit
A Python object instance with data (properties) that should be
persisted across scope boundaries.
property
The attributes of a unit which should be persisted.
store
A persistent repository for unit data.
StorageManager
An adapter which provides a Dejavu interface for a store.
expression
A series of stateless operations which returns a value. In Dejavu's
logic implementation, expressions always return True or False.
Expressions are used to query stores and filter the results.
1.4 Overview
Dejavu works by hiding the vagaries of various databases (and their even
more capricious Python interfaces) behind a single interface that talks
in terms of "storage" instead of "databases". Databases are simply one
of many means of persisting data between requests, threads, or processes.
By using a generic storage vocabulary and Pure python syntax, developers
can create, query, and manage persistable objects as easily as they do
any others.
2. Models
2.1 Units
Units are defined by classes. Data attributes which are to be persisted
MUST be attributes of the class, and MUST be Unit Properties (section 2.2).
A Unit class MUST contain the following attributes:
2.1.1 identifiers
Unit classes MUST possess an "identifiers" attribute, a tuple of strings
where each string is the name of an existing UnitProperty for that Unit
class.
2.1.2 identity
All Unit classes MUST possess an "identity" method which takes no arguments.
It MUST return a tuple containing the unique identity of the Unit instance.
In addition, Unit classes MAY possess other attributes and methods which
consumers can use to obtain and set calculated and related data.
2.2 Unit Properties
2.3 Unit Associations
3. Storage
Dejavu depends on objects which conform to the StorageManager API. Although
every StorageManager MUST provide the methods below, it is understood that
different implementations may be forced to do so inefficiently, due either
to impedance mismatch between the Dejavu interface and the underlying storage
interface, or to limitations of the underlying storage. However, such
limitations do not absolve the conforming StorageManager from implementing
any of the requirements. In the case where a given implementation is known
to perform some operations slowly, they SHOULD emit a StorageWarning
to that effect.
3.1. Data Definition
All StorageManagers MUST possess the following methods which manipulate
the underlying data structures, bringing them into line with a Dejavu
model.
3.1.1 create_database(conflicts='error')
StorageManagers MUST possess a "create_database" method which prepares a
persistent container for all the StorageManager's contained Unit classes
(see section 3.2 Registration). For some implementations, this MAY be a
no-op (that is, pass and do nothing); however, if the deployer is required
to perform manual steps to prepare the back end (because such steps are not
scriptable), this method MUST raise NotImplemented when called, and SHOULD
provide a short message describing how to manually prepare the container.
If the given backend has already been prepared, this method MUST resolve
the conflict according to the 'conflicts' argument (see section 3.6).
This method SHOULD NOT perform the same work as the create_storage method
(see section 3.1.3). That is, this method SHOULD prepare the container as
a whole, and SHOULD NOT attempt to iterate over the items to be placed in
that container.
3.1.2 drop_database(conflicts='error')
StorageManagers MUST possess a "drop_database" method which destroys any
persistent storage for the contained Unit classes. If any contained data
needs to be dropped before the database itself can be dropped, this method
MUST do so instead of erroring. If the deployer is required to perform
manual steps to drop the back end data or structures (because such steps
are not scriptable), this method MUST raise NotImplemented when called,
and SHOULD provide a short message describing how to manually drop the
database. If the given database has already been dropped or does not
otherwise exist, this method MUST resolve the conflict according to the
'conflicts' argument (see section 3.6).
3.1.3 create_storage(cls, conflicts='error')
StorageManagers MUST possess a "create_storage" method which prepares a
persistent container for a single Unit class, given as the only positional
argument. For some implementations, this MAY be a no-op (that is, pass and
do nothing); however, if the deployer is required to perform manual steps
to prepare the back end (because such steps are not scriptable), this
method MUST raise NotImplemented when called, and SHOULD provide a short
message describing how to manually prepare the container.
If the store has already been prepared, this method MUST resolve the
conflict according to the 'conflicts' argument (see section 3.6).
3.1.4 has_storage(cls)
StorageManagers MUST possess a "has_storage" method which, when called with
a Unit class as the only positional argument, returns True if a persistent
container already exists for that class, False otherwise.
3.1.5 drop_storage(cls, conflicts='error')
StorageManagers MUST possess a "drop_storage" method which destroys a
persistent container for a single Unit class, given as the only positional
argument. If any contained data needs to be dropped before the container
itself can be dropped, this method MUST do so instead of erroring. If the
deployer is required to perform manual steps to drop the back end data or
structures (because such steps are not scriptable), this method MUST raise
NotImplemented when called, and SHOULD provide a short message describing
how to manually drop the storage. If the given container has already been
dropped or does not otherwise exist, this method MUST resolve the conflict
according to the 'conflicts' argument (see section 3.6).
3.1.6 add_property(cls, name, conflicts='error')
StorageManagers MUST possess an "add_property" method which modifies any
existing persistent containers for the given Unit class, adding a new
property whose name is given as the second positional argument. The name
MUST reference an existing and available UnitProperty instance of the
given class (so that types and hints can be drawn from that UnitProperty).
For some implementations, this MAY be a no-op (that is, pass and do
nothing); however, if the deployer is required to perform manual steps
to prepare the back end (because such steps are not scriptable), this
method MUST raise NotImplemented when called, and SHOULD provide a short
message describing how to manually add the property.
If the given store already possesses the given property, this method MUST
resolve the conflict according to the 'conflicts' argument (see section 3.6).
3.1.7 has_property(cls, name)
StorageManagers MUST possess a "has_property" method which, when called
with a Unit class and UnitProperty name as positional arguments,
returns True if the back end already possesses data structures for
persisting UnitProperties for that class and name, False otherwise.
3.1.8 drop_property(cls, name, conflicts='error')
StorageManagers MUST possess a "drop_property" method which modifies
any persistent containers for a single Unit class, removing the
named property and destroying any data referenced by that property.
If any contained data needs to be dropped before the property itself
can be dropped, this method MUST do so instead of erroring. If the
deployer is required to perform manual steps to prepare the back end
(because such steps are not scriptable), this method MUST raise
NotImplemented when called, and SHOULD provide a short message
describing how to manually drop the property. If the given property
has already been dropped or does not otherwise exist, this method
MUST resolve the conflict according to the 'conflicts' argument
(see section 3.6).
3.1.9 rename_property(cls, oldname, newname, conflicts='error')
StorageManagers MUST possess a "rename_property" method which modifies
any persistent containers for a single Unit class, renaming the
property named by 'oldname' to 'newname'. This method MUST NOT
delete any data existing under the old name; instead, that data
MUST be available using the new name.
If the store already possesses structures under the new name, or if
the data cannot be reliably moved from the old name to the new name,
this method MUST resolve the conflict according to the 'conflicts'
argument (see section 3.6).
3.1.10 add_index(cls, name, conflicts='error')
StorageManagers MUST possess an "add_index" method which modifies any
existing persistent containers for the given Unit class, adding a new
index if possible on the property whose name is given as the second
positional argument. The name MUST reference an existing and available
UnitProperty instance of the given class (so that types and hints can
be drawn from that UnitProperty).
Indexes, however, are an optimization, and not all StorageManagers are
required to use them. Those which do not MAY choose to do nothing when
this method is called. However, they MUST record the request to add an
index in order that they will reply correctly to the has_index method
below.
If the deployer is required to perform manual steps to prepare the back
end (because such steps are not scriptable), this method MUST raise
NotImplemented when called, and SHOULD provide a short message describing
how to manually add the index.
If the given index already exists, this method MUST resolve the conflict
according to the 'conflicts' argument (see section 3.6).
3.1.11 has_index(cls, name)
StorageManagers MUST possess a "has_index" method which, when called with a
Unit class and UnitProperty name as positional arguments, returns True if
the back end already possesses an index for UnitProperties for that class
and name, False otherwise.
3.1.12 drop_index(cls, name, conflicts='error')
StorageManagers MUST possess a "drop_index" method which modifies any
persistent containers for a single Unit class, removing any single index
on the named property; indexes on multiple properties are not affected by
this method. If the given index has been dropped or does not otherwise
exist, or its parent property has been dropped or does not otherwise exist,
this method MUST resolve the conflict according to the 'conflicts' argument
(see section 3.6).
3.2 Registration
3.2.1 classes
StorageManagers MUST possess a "classes" attribute, a set of zero or more
Unit classes which the StorageManager is capable of storing. When any method
is attempted on a StorageManager which takes one or more Unit classes as
an argument, that method MUST raise KeyError if any of the supplied classes
are not registered with the StorageManager.
3.2.2 register(cls)
StorageManagers MUST possess a "register" method. When called with a single
Unit class as the only positional argument, this method MUST add the given
class to its "classes" set.
3.2.3 register_all(globals)
StorageManagers MUST possess a "register_all" method. When called with a
dictionary as the only positional argument, this method MUST add each
subclass of Unit which is contained in the dict's values() to its
"classes" set. The method SHOULD return a list of the Unit classes
which were registered during the call.
3.2.4 class_by_name(classname)
StorageManagers MUST possess a "class_by_name" method. When called with a
single Unit class name as the only positional argument, this method MUST
return the Unit class for the given name (unless no Unit class of that
name has been registered with the StorageManager, in which case this
method MUST raise KeyError).
3.2.5 map(classes, conflicts='error')
StorageManagers MUST possess a "map" method. When called with a sequence of
Unit classes as the first positional argument, this method SHOULD attempt
to find or create pathways between the given classes and any necessary data
structures the implementation requires for other StorageManager calls to
succeed. For example, a database StorageManager might attempt to find
database tables, columns, and indexes which match each given Unit class.
If discrepancies exist between the given Unit classes and the underlying
data structures, this method MUST resolve the conflict according to the
'conflicts' argument (see section 3.6).
3.2.6 map_all(conflicts='error')
StorageManagers MUST possess a "map_all" method. When called, this method
SHOULD attempt to find or create pathways between all registered classes
and any necessary data structures the implementation requires for other
StorageManager calls to succeed.
If discrepancies exist between the given Unit classes and the underlying
data structures, this method MUST resolve the conflict according to the
'conflicts' argument (see section 3.6).
3.3 Data Manipulation
3.3.1 reserve(unit)
StorageManagers MUST possess a "reserve" method, which reserves space in
storage for the supplied Unit. If any of the given Unit's identifiers (see
2.1.1 identifiers) are None, the StorageManager MAY attempt to provide
identifiers for the Unit instance during this call. StorageManagers
designed as "leaf nodes" (do not call other StorageManagers) SHOULD
do so.
3.3.2 save(unit, forceSave=False)
StorageManagers MUST possess a "save" method, which stores the unit's
property values in a persistent fashion. As with all Dejavu stores,
the degree of "persistence" is up to the implementation. This method
MUST examine the Unit's "dirty" attribute; if True, the method MUST
copy the value of each UnitProperty to storage. In order to avoid
synchronization problems, this operation MUST NOT store references
to each datum; instead, it MUST store copies (or serialized versions)
of each datum.
If unit.dirty is False, this method SHOULD NOT waste time attempting
to save data which has not been modified, unless the optional "forceSave"
argument is True, in which case this method MUST update storage for all
UnitProperties of the given Unit.
3.3.3 destroy(unit)
StorageManagers MUST possess a "destroy" method which deletes all stored
values for the unit. This method SHOULD also free any resources used by
the unit, such as disk space and RAM, although that may be done lazily.
3.3.4 xrecall(classes, expr=None, order=None, limit=None, offset=None)
StorageManagers MUST possess an "xrecall" method, which returns an iterable
of Units of the given class(es) which match the given expression.
If the 'classes' argument is a UnitJoin (itself possibly containing nested
UnitJoin instances (see 2.4 Unit Joins)), this method MUST return an iterable
of lists, where each yielded list contains a set of Unit instances which
together match the expression. The order of Unit instances in each yielded
list MUST be the same as the order of classes in the UnitJoin; that is,
the same order as [cls for cls in UnitJoin].
The expression MUST be an instance of logic.Expression, a lambda, a dict, or
None. If None, all Units will be returned. If the underlying storage has no
native means of perfectly applying the expression, but is able to return a
superset of matching Units (perhaps ALL of them), it MUST notify the xrecall
method that this is the case, and the xrecall method MUST test expr(unit) before
yielding any retrieved Unit instances to the caller. If the superset is large,
the StorageManager SHOULD emit a StorageWarning. The order of arguments in
a lambda MUST be the same as the order of classes in the UnitJoin; that is,
the same order as list(UnitJoin).
The returned iterable is required to be accessible lazily; however,
whether the data is retrieved from storage lazily or not is undefined
and implementation-specific.
The "order" argument MUST be one of the following:
* None, in which case the sort order is undefined,
* A sequence of property names for a single Unit class, or a sequence of
such sequences if the 'classes' argument is a UnitJoin; each name MAY
include a " ASC" or " DESC" direction indicator suffix. The order of
name-sequences MUST be the same as the order of classes in the UnitJoin.
* An Expression or lambda which returns a tuple or list of UnitProperties;
for example, "lambda user, address: [reversed(user.Age), address.City]".
The "reversed" function is used to indicate a descending sort order in
this case. The order of arguments in the function signature MUST be the
same as the order of classes in the UnitJoin.
The "limit" argument MUST be None or a positive integer. None indicates no
limit; when "limit" is a positive integer, the StorageManager MUST return
a number of units equal to that number, or less if the set is exhausted.
The "offset" argument MUST be None or a positive integer. None indicates no
offset; when "offset" is a positive integer, the StorageManager MUST ignore
a number of units equal to that number before beginning to yield any units.
If no "order" argument is supplied, the output of successive calls with
limit or offset arguments is undefined.
3.3.5 recall(classes, expr=None, order=None, limit=None, offset=None)
StorageManagers MUST possess a "recall" method. It MUST behave equivalently
to "list(self.xrecall(...))" with the same arguments.
3.3.6 xview(query, order=None, limit=None, offset=None, distinct=False)
StorageManagers MUST possess an "xview" method, which returns an
iterable of lists; each list containing a subset of data for Units
of the given classes which match the given expression. The "query" argument
MUST be either an instance of dejavu.Query, or a tuple of the three arguments
(relation, attributes, restriction) to construct such a Query object.
The "relation" MUST be specified as an instance of Unit or of UnitJoin.
The order of Unit instances in each yielded list MUST be the
same as the order of classes in the UnitJoin; that is, the same order
as [cls for cls in UnitJoin].
The "attributes" argument MUST be one of the following:
* If the relation is a single Unit, this value may be a sequence of
property names for that Unit.
* If the relation is a UnitJoin, this may be a sequence of sequences
of property names. The order of sequences MUST match the order of Unit
classes given in the relation.
* A lambda (or Expression) which returns the attributes as a sequence;
e.g. "lambda t1, t2: (t1.a, t1.b - now(), t1.c + t2.a)". This form
allows access to binary operations and builtin functions. The order of
arguments in the lambda signature MUST be the same as the order of
classes in the relation.
The "restriction" MUST be an instance of logic.Expression, a lambda, or None.
If None, all Units will be returned. If the underlying storage has no native
means of perfectly applying the restriction, but is able to return a superset
of matching Units (perhaps ALL of them), it MUST notify the xview method that
this is the case, and the xview method MUST test expr(unit) before yielding
any retrieved Unit instances to the caller. Naturally, this requires that any
store returning a superset of Units MUST pass complete Units to the xview
method, in order to satisfy arbitrary restriction expressions. The order
of arguments in a lambda MUST be the same as the order of classes in the
UnitJoin.
The returned iterable is required to be accessible lazily; however,
whether the data is retrieved from storage lazily or not is undefined
and implementation-specific.
The "order" argument MUST be one of the following:
* None, in which case the sort order is undefined,
* A sequence of property names for a single Unit class, or a sequence of
such sequences if the 'classes' argument is a UnitJoin; each name MAY
include a " ASC" or " DESC" direction indicator suffix. The order of
name-sequences MUST be the same as the order of classes in the UnitJoin.
* An Expression or lambda which returns a tuple or list of UnitProperties;
for example, "lambda user, address: [reversed(user.Age), address.City]".
The "reversed" function is used to indicate a descending sort order in
this case. The order of arguments in the function signature MUST be the
same as the order of classes in the UnitJoin.
The "limit" argument MUST be None or a positive integer. None indicates no
limit; when "limit" is a positive integer, the StorageManager MUST return
a number of units equal to that number, or less if the set is exhausted.
The "offset" argument MUST be None or a positive integer. None indicates no
offset; when "offset" is a positive integer, the StorageManager MUST ignore
a number of units equal to that number before beginning to yield any units.
If no "order" argument is supplied, the output of successive calls with
limit or offset arguments is undefined.
The "distinct" argument, if provided, MUST be True, False, or None. If True,
the returned Unit data MUST NOT contain duplicate sequences. If False or None,
no test for duplicates is performed.
3.3.7 view(query, order=None, limit=None, offset=None, distinct=False)
StorageManagers MUST possess a "view" method. It MUST behave equivalently
to "list(self.xview(...))" with the same arguments.
3.3.8 unit(cls, **kwargs)
StorageManagers MUST possess a "unit" method, which returns a single
Unit of the given class which matches the given keyword arguments, or
None if no matching Unit can be found. Each keyword argument MUST be
applied as the expression predicate "...and getattr(unit, key) == value".
Proxy StorageManagers SHOULD propagate this call to the "unit" method
of their proxied store(s) wherever possible. This will allow key-value
stores (e.g. memcached) to optimize the call.
3.3.9 count(cls, expr=None)
StorageManagers MUST possess a "count" method, which returns the number of
Units of the given class which match the given expression. StorageManagers
SHOULD use an index, their own "view" method, or some native means of
acquiring this value without loading all of the data for all involved Units.
3.3.10 range(cls, attr, expr=None, **kwargs)
StorageManagers MUST possess a "range" method, which returns a sequence of
values for the named attribute of the given class. The sequence MUST consist
of distinct values and MUST NOT include None. It SHOULD be ordered and
continuous if the property type allows it.
If the given attribute is of a known discrete, ordered type (like int, long,
or datetime.date), this method MUST return the closed interval:
[min(values), ..., max(values)]
That is, all possible values will be output between min and max, even if
they do not appear in the dataset.
If the given attribute is not reasonably discrete (e.g., str, unicode, or
float) then all distinct, non-None values MUST be returned; the values SHOULD
be sorted, if possible.
3.4. Transactions
3.4.1. start
3.4.2. rollback
3.4.3. commit
3.5 Environment
3.5.1 shutdown(conflicts='error')
StorageManagers MUST possess a "shutdown" method which closes all connections
to underlying storage in preparation for closing the application process.
Some StorageManagers (such as in-memory databases) MAY interpret this event
as the boundary of their scope of persistence, and destroy all data.
If connections cannot be closed (e.g. they are locked by another caller),
this method MUST resolve the conflict according to the 'conflicts' argument
(see section 3.6).
3.5.2 version()
StorageManagers MUST possess a "version" method which returns a string
containing version information for either the StorageManager itself,
its underlying storage libraries and engines, or both.
3.5.3 log(message)
StorageManagers MUST possess a "log" method which takes a single "message"
argument. It is expected that deployers will provide their own log methods
in practice.
3.6 Conflicts
Each StorageManager method which takes a 'conflicts' argument SHOULD use
the value of the argument to resolve conflicts between the model and the
store as follows:
* If 'error', raise MappingError upon the first conflict.
* If 'warn', raise StorageWarning instead of an error for each issue,
but attempt to proceed without changing the store or model.
* If 'repair', attempt to change the store to match the model. Not all
calls support this mode for all errors; any which do not support this
mode MUST raise MappingError instead.
* If 'ignore', silently ignore any conflicts.
4. Sandboxes
5. Partitioning