What's New in Dejavu 1.5 (RC 1)
This document is complete to rev [400].
Sandbox improvements
Using kwargs to recall
The sandbox methods unit, recall, xrecall, view, distinct, count all have signatures now which end with expr=None, **kwargs). If you supply expr, it may be an Expression object (as always), or in 1.5 you may now pass a bare Python lambda, and it will be wrapped in an Expression object for you. If you pass **kwargs, they will be wrapped up in an Expression via logic.filter(**kwargs). If you pass both, they will be combined as expr + filter (a logical "and").
New methods
- New sandbox.range(cls, attr, expr, **kw) function. Returns the closed interval [min(attr), ..., max(attr)].
- New sandbox.sum(cls, attr, expr, **kw) function. Returns the sum of all non-None values for the given cls.attr.
Multiple and custom Associations
Relationships between Unit classes are declared inside each Unit class in an internal _associations dict. By default, the far class name is used as the dict key. For some time now, you could make multiple, custom associations by hand-entering other UnitAssociations? into this dict under different keys; however, the storage layer did not have a way for you to declare you wanted to use these "custom association paths" when using multirecall. Now it does: when forming your UnitJoins? for a multirecall, set the path attribute on each one to use an association other than the default. For example, a Country may have many Citizens which are represented by a default relationship (probably Citizen.CountryID = Country.ID); however, it may have a special relationship defined for its President (perhaps Country.PresidentID = Citizen.ID). The current president can be retrieved by explicitly creating the 'President' association once and then referring to it by name whenever needed:
Country._associations['President'] = p = ToOne('PresidentID', Citizen, 'ID') p.nearClass = Country ... join = Country << Citizen join.path = 'President' pres = sandbox.multirecall(join)
Performance
Much attention was given to the performance of recall and friends. Unnecessary codewalks were eliminated, log and debug flags were externalized, adapter methods were memoized, and many loops were optimized.
In particular, the ADO code was heavily optimized, removing much of the COM dynamic overhead, and resulting in roughly a 700% speedup in fetch!
The logic.Expression object also got some love. If you are certain your lambda has no global or cell references, you can call logic.Expression(..., earlybind=False), which will skip a codewalk. This might be the case, for example, when reconstituting cached Expressions, or when creating Expressions in very controlled conditions.
Encoding and coercion improvements
Ticket #45 showed that property values such as (u'äbc', ) were not being correctly stored. In a no-holds-barred smackdown, the encoding of stored values was analyzed, tested, fixed, escaped, and extended. The test suite now tests more mixed-type values against more native-store encodings. In particular, MySQL is now tested with the 'latin1' and 'utf8' encodings, and PostgreSQL with 'SQL_ASCII' and 'UNICODE' encodings.
All values retrieved from storage also now pass through Unit.coerce, which includes the quantization of decimal objects (according to hints['scale']. In addition, all UnitProperty defaults are now coerced when set.
logic and codewalk improvements
- New bytecode support for dict creation: BUILD_MAP, DUP_TOP, ROT_THREE, ROT_TWO, and STORE_SUBSCR. See [171] and [285].
- Expression "and" and "or" now merge kwtypes in addition to bytecode. See [218].
- Support for Python 2.5 (which changed co_names, co_consts and co_flags; see [287]).
- LambdaDecompiler? and other Visitors now accept methods as well as functions.
Improvements to recur
The recur module had several corner-case bugs fixed, mostly having to do with handling datetimes when date objects were expected, and vice-versa.
The Worker class was split into separate Worker and Scheduler classes. This allows a single Scheduler to cycle over serveral Workers with a single thread (instead of one thread per Worker).
There's also a new eachweekday function whch can generate recurrence values for tasks that run once a week.
New json module
Christian Wyglendowski contributed a new json.py module for encoding and decoding Dejavu Units in JSON format.
Schema improvements
- New Schema.assert_version method, which makes sure the deployed version is equal to the latest version.
Storage improvements
RAM Storage Manager
There's a new RAMStorage class inside storage/storeram.py. This works much like shelve, only without the filesystem I/O overhead. For temporary datasets that only need to persist for the life of the process, this can be a huge performance-enhancer.
Short SM class names
You can now use short names in your config to declare which back end you're using. For example, instead of writing Class: dejavu.storage.storeado.StorageManagerADO_MSAccess you can now write Class: access. The full dotted-class name will still work if you need custom classes. The short names (and their referents) can be found in dejavu.storage.managers, a dict.
Database improvements
- drop_property now supports dropping indices as needed.
- Native sequencing: Beginning in [175], StorageManagers now detect ID properties that use UnitSequencerInteger and auto-generate ID's for them. This includes MS Access, MS SQL Server, Postgres, MySQL, and SQLite > 3.1.
- Expressions now support "is None" and "is not None".
- New awareness and testing of numeric precision, scale, and byte limits.
- Native support for dejavu.year, .month, and .day logic functions for all databases.
geniusql
In 1.4, all of the database adaptation was done in storage/db.py, mostly in StorageManagerDB and subclasses of it. In Dejavu 1.5, the database layer has been abstracted and is now modeled independently in storage/geniusql.py. In particular, the new module allows SQL decompilation and inbound coercion to know the database types of each column, and adjust to them more accurately. The Connection pooling strategies, the SQLDecompiler base class, and the three Adapter classes (in, out, type) have all moved into geniusql, as well.
Geniusql does all the database stuff in an isolated way. It knows nothing about Units; all it does is database stuff. It models the Database, plus each Table, Column, and Index. The Database is a dict of (name, Table) pairs. Each Table is a set of (name, Column) pairs, and each Table.indices attribute is an IndexSet? of (name, Index) pairs.
The inbound and outbound adapters, therefore, now allow M x N adaptation in both directions. Their method names changed from "coerce_str" to "coerce_str_to_TEXT", for example. If you want a single method to handle any type, use "any" on one side of the name; for example, "coerce_NUMERIC_to_any".
All of this means that you have increased access to low-level database operations when you need them. There's also the possibility to build layers on top of geniusql that operate differently from Dejavu's Unit philosophy.
Database introspection
On first access of a table, the geniusql Columns are automatically synced with Unit properties. The database types are inspected, and appropriate adapters can then be inferred. The sync process also pulls index, default value, primary key, precision and scale, and other metadata from the database.
In addition, the StorageManagerDB has a new Modeler class with make_class and make_source methods, which generate a Unit class (or the textual source for a Unit class) from a given database table. It also has corresponding all_classes and all_source methods.
Transactions
With the arrival of geniusql, Dejavu now also has support for transactions: start, rollback, and commit. Every StorageManager? that supports transactions has these three methods (those which do not have start=None, etc. instead). The geniusql.Database object also has these methods.
The StorageManager? option 'implicit_trans' is False by default, and in this case you must explicitly call start, rollback, and commit. If you set 'implicit_trans' to True, then you do not need to call start()--it will be called for you when you first use a connection. Also, if you are using a Sandbox, its flush_all method will automatically call commit for you.
However, you still must call rollback manually; frameworks which use Dejavu should probably provide a way to call these automatically, but the decision of when and how to do that is highly application-dependent and therefore Dejavu does not guess for you. There is one exception to the "no guess" rule, and that is available starting with Python 2.5's "with" statement. Each Sandbox instance is its own context manager, so you can have boxes automatically flush themselves when you're done, and automatically rollback on error. Example:
# __future__ only needed for Python 2.5, not 2.6+ from __future__ import with_statement with arena.new_sandbox() as box: WAP = box.unit(Zoo, Name='Wild Animal Park') WAP.Opens = now
Transactions are isolated by thread ID by default. Replace the Database.transaction_key method with some other function if you need different isolation boundaries.
SQLite
- New support for using the sqlite3 module built into Python 2.5.
- Support for autoincrement if 3.1 or greater.
- Much new schema support, including (slow) equivalents for ALTER TABLE that were not available until SQLite 3.1 and 3.2.
- Much better handling of "database is locked" issues.
- Support for typed and typeless SQLite.
- Support for :memory: databases.
PostgreSQL
- New support for psycopg2.
- New "quote_all" option for legacy databases that don't use quoted identifiers.
- Preserves microseconds in datetimes. See #41.
- Bulletproof round-trip strings via octal escapes.
- Support for CREATE DATABASE WITH ENCODING. Tested with SQL_ASCII and UNICODE.
MySQL
- Support for CREATE DATABASE CHARACTER SET. Tested with latin1 and utf8.
- Support for CREATE TABLE CHARACTER SET.
- Fixed a MySQLdb bug where lock conflicts were silenced (and an empty result set returned).
ADO
- New support for the CURRENCY datatype.
- Improved string comparisons using Convert and StrComp?.
- MUCH faster DB access due to streamlining of COM calls.
- Improved escaping of wildcard characters in LIKE expressions.
- New shutdowntimeout attribute for finicky connections.
Firebird
There's a new Firebird StorageManager in storage/storefirebird.py. Tested against Firebird 1.5 (Windows).
Filesystem
There's a new filesystem StorageManager in storage/storefs.py. It creates a subfolder for each Unit (whose name is the Unit.ID), and a separate file for each UnitProperty. In this way, binary data can be persisted in native formats (for example, you can have an "image" property that gets saved to "image.jpg" for each Unit).
