This discussion applies specifically to Dejavu 2, which is still in development. However, most of the concepts are true for 1.5.
Dejavu has support for sharding that is similar to Hibernate. The difference is that Dejavu was designed from day one to support sharding; Hibernate needs sharding-aware subclasses of its major interfaces.
Dejavu mediates multiple databases by allowing a layer of Storage Manager objects (in an arbitrary graph) between the business object layer (Units) and the final storage managers. For example, at Amor Ministries we had the following arrangement:
Dejavu provides the colored bits: caches are green, RAM stores are yellow, the Arena class is red, and the blue "leaf nodes" talk to actual databases (using Geniusql). The "raisersedge" Storage Manager was written to hide the fact that we had no direct write access to the Raisers' Edge tables.
There are two different components in the above graph that "shard" the data. First is the Arena object, which Dejavu provides. It exists to mediate multiple stores into a single interface when the partitioning is based entirely on table boundaries; that is, some tables are in one database, and some tables are in another. You can manually register which class goes where, or you can use a config file which the Arena knows how to parse.
The second object which shards data (above) is the "Transaction Mixer" (see StorageMixer for its code); this object separates data based on values in the data row: if the "Expense" attribute is True, it's read from and written to the Access database. Otherwise, it uses the raisersedge module to be persisted in the SQL Server database.
Limitations
Cross-Shard Joins
Like Hibernate, Dejavu doesn't yet support recalling joins across shards. You can certainly define an association across them, and use it, for example, to load multiple objects in store B that are related to a single object in store A (since this only needs to retrieve data from store B). In keeping with Dejavu's design philosophy, it really should fall back to returning correct answers even if not optimized, but merging joins manually can be catastrophically slow if you end up loading the wrong billion-row table entirely into memory. The optimization of joins is provably NP-hard, and functional decomposition of restriction expressions (WHERE clauses) isn't far behind. We expect to use a divide-and-conquer approach to provide optimized solutions for common use cases in this area, probably in Dejavu 2.1 or later.
The "correct" answer to this (even in the ideal future) is "don't join across shards". Design your shard boundaries so that joins occur within a single store.
Distributed Transactions
This should be available in Dejavu 2. It's not done yet in trunk. But you should do yourself a favor and try another approach; distributed transactions are rarely worth the effort.
Attachments
- smgraph.gif (9.7 kB) -
Graph of Amor SMs
, added by fumanchu on 07/21/07 18:31:41.
