Application Designers: Using Geniusql to Construct a Domain Model

Geniusql Component Diagram

The Database Object

The topmost class in Geniusql is the Database class. When building a Geniusql application, you must first create an instance of it. In general, you should ask geniusql to create a Database object for you by using the top-level db(provider, **opts) function. This allows you to pick a backend by name from providers.registry (a map from short names to class names). For example, you can open an SQLite database via:

import geniusql
db = geniusql.db("sqlite", name=':memory:')
db.create()

In addition to create(), all Databases also have a drop() method.

Schemas

Closely tied to the Database object are one or more corresponding Schema objects. Usually, you ask the Database for a compatible Schema object via its schema(name=None) method. For example:

schema = db.schema()
schema.create()

Some database systems, like Firebird and SQLite, use a separate file on disk for each Schema, and require you to refer to the database you want by passing the name of the file. Consequently, they need a separate Database object for each Schema. For such systems, their Database object possesses a multischema attribute which is False. Many other providers allow you to use a single Database with multiple schemas; these have their multischema attribute set to True.

Automatic Table Classes

When you create your first Geniusql model, you might be forming it to match some existing database schema. If so, Geniusql has methods to help you. The schema.discover(tablename, conn=None) method will create a Table class for the named database table, or, you can discover all Tables at once via discover_all(conn=None). These methods automatically attach the tables they find in the database to the schema. [This is fine for development purposes, but when you go into production, it may result in slow startup times. It's usually faster to write complete models yourself using Table and Column declarations.]

If you want to use a different key for the discovered object(s), use the alias(oldname, newname) method of any Schema, Table, or IndexSet container:

table = self.schema.discover(tablename)
self.schema.alias(table.name, newname)

Tables

Unlike more implicit ORM's, Geniusql models actual database objects very explicitly. You begin persisting data, therefore, by creating instances of geniusql.Table, one for each table in your database:

Printer = schema.table('Printer')
This is all you need for a fully-functioning table. There is no base class that you are required to override; simply create a Table object. [We actually ask the Schema to create one for us that's specific to the underlying DB.] However, this is a fairly uninteresting table, since it has no columns! The first thing we will probably want to add to our new table is a set of those (see below).

Creating Tables in the Database

A schema is a dict-like set of Table objects. When you add or delete items from a Schema object, appropriate CREATE TABLE/DROP TABLE commands are executed. Therefore, a new Table object should have all of its columns populated before adding it to the Schema. Once we've done that (see below), we only need to add the Table to the Schema:
schema['Printer'] = Printer
The key that you use to refer to the table in the schema dict does not need to be the same as the Table.name (or Table.qname, the "quoted name" for use by the DB). In particular, this allows you to use separate capitalization rules for various layers.

Columns

Once you have defined a table (but before attaching it to the Schema), define columns for it by asking the Schema for backend-specific Column objects.

We might enhance our Printer example thusly:

Printer = schema.table('Printer')
Printer['ID'] = schema.column(int, autoincrement=True, key=True)
Printer['Manufacturer'] = schema.column(unicode)
Printer['ColorCopies'] = schema.column(bool)
Printer['PPM'] = schema.column(float)
This adds four columns to our Printer table, each with a different datatype.

You manipulate rows by using insert, save, and delete methods of the Table. The insert method returns a dict of the inserted data:

>>> p = Printer.insert(PPM=25)
which you can then inspect:
>>> p['PPM']
25
However, rest assured that the int value we provided was coerced to a float when we inserted it into the database. This is because we specified the PPM attribute as a 'float' type when we created it. The value of a Column is restricted to the type which you specify. The only other valid value for a Column (of any type) is None; any Column may be None at any time, and in fact, all Column values are None until you assign values to them:
>>> p['ColorCopies'] is None
True

The datetime.datetime type

If you use datetime.datetime for the type of a Column, most providers will throw away the microseconds. This is an unfortunate oversight that may be corrected sometime in the future.

ID Columns

All Columns possess a key attribute; if True, the Column is used to define the uniqueness of a row. Typically, you will use a single 'ID' column, so there will be only one primary key. If you wish to use keys of a different number, types, or names, simply set the 'key' attribute to True for each identifying Column:

Printer = schema.table('printer')
Printer['Model'] = schema.column(unicode, key=True)
Printer['UnitNumber'] = schema.column(int, key=True)

In rare cases, you may even desire a Table that has no keys. For example, an activity log that only needs inserts and no updates may not need unique identifiers for each row; leaving them out can make inserts faster, since there are no constraints to satisfy and no indices to update.

Column Metadata

Columns are custom Python objects. This is significant, because it allows us to store metadata about the columns themselves:

>>> c = Printer['UnitNumber']
>>> c.pytype, c.dbtype, c.default, c.key, c.name, c.qname
(<type 'int'>, geniusql.providers.mysql.MEDIUMINT(bytes=4, max_bytes=4,
signed=True), 0, True, 'unitnumber', '`unitnumber`')
When you define a Column instance, you can pass in these extra attributes. You can do it by hand, but in general, you should ask your Schema object to make a Column object for you using its column(pytype=unicode, dbtype=None, default=None, key=False, autoincrement=False, hints=None) method. This ensures that all of the cooperating objects (like indexes, datatypes, and type adapters) use the correct specialized subclasses for your database of choice.

Supply any, all, or none of the attributes as needed. The name attribute is set for you when you bind the column to a Table (i.e. Table['name'] = schema.column(...)). The pytype attribute limits column values to instances of that type (or None). The key attribute declares whether the column is part of the table's primary key or not. Finally, the hints dictionary provides hints to backend providers to help optimize storage. If you write a custom provider, you may define and use your own hints. Here are the ones that most builtin providers understand:

KeyValuesDescription
bytes b >= 0 Inform providers that a particular column will never exceed b bytes. This applies to long and int columns, as well as str and unicode. A value of 0 implies no limit. If not specified, a default maximum will be used. Many backends default to 255. Check your provider.
scale s >= 0 Scale is the number of digits to the right of the decimal point in a NUMERIC (fixedpoint or decimal) field. This hint informs providers that would usually store such data at a default scale (usually 2), that the column should use a different scale.
precision p >= 0 Precision is the total number of decimal digits in a NUMERIC (fixedpoint or decimal) field, or the total number of binary digits in a float field. This hint informs providers that the column will never exceed p digits. If missing, the provider will supply a default maximum precision. For example, PostgreSQL can handle 1000 decimal digits. If explicitly set to 0 (zero), the provider will allow unlimited precision, if possible. Note that the fixedpoint module uses the word "precision" where we use the word "scale"; it actually has unlimited precision (as we use the word). The decimal module, in contrast, uses limited precision but no scale.
signed True/False If True, the column will use a signed numeric type (if available).

Sequencing

Most Tables have one or more keys (identifiers). Typically, there is only one, of type int; however, you can use as many of whatever types you need. As long as you provide your own identifier values, nothing will break--you can save and select data without problems. However, if you save a row without providing identifier values, the database may need to provide identifier values for it.

Column objects possesses three attributes to define sequencing:

Rows

In Geniusql, rows are always manipulated as normal Python dictionaries or lists. There's just data, no code; nothing to override, no magic behavior, and no hidden effects. If you want a fancy object per row, you can certainly implement that on top of Geniusql (as Dejavu does), but if you don't need it, why bother? Geniusql is much more relational than other ORM's, and therefore more powerful: how many other ORM's can generate backend-specific SQL like INSERT INTO newtable SELECT columns FROM join for you?

Single Rows

The Table methods insert(**kwargs), save(**kwargs), and delete(**kwargs) all take a (keyword-arg) dict of data. Typically, you will use these methods to manipulate a single row:

brian = Person.insert(Name='Brian Norris')
brian['Age'] = 39
Person.save(**brian)
...
Person.delete(**brian)
Geniusql knows which columns in the Person table are primary keys (unique identifiers), so you don't have to. Just pass the whole dict for a given row to each method. We could also have written the insert call above like this:
brian = {'Name': 'Brian Norris'}
brian = Person.insert(**brian)
It's important to understand that insert returns a copy of the data dict you pass to it, often with new autoincrement values. So we make sure to re-bind 'brian' above to the returned dict since we're going to keep manipulating it.

Multiple Rows

There are additional Table methods for manipulating multiple rows. The save_all(data, restriction=None, **kwargs) method allows you to update rows based on any criteria, not just the identifier columns Geniusql knows about. So, for example, you could manage the zip code change for Bethesda, MD via:

Address.save_all(data={'Zip': 20814}, Zip=20014)
Likewise, you can delete multiple rows with delete_all(restriction=None, **kwargs):
Address.delete_all(Zip=20014)
If you need to update or delete rows based on more complicated comparisons, use the restriction arg to pass a lambda (or logic.Expression):
Address.delete_all(lambda x: x.Zip >= 20014 and x.Zip <= 20018)

References between Tables

Once you've put together some Tables, chances are you're going to want to join them. Generally, this is accomplished by creating a column in table B (foreign key) which stores IDs from table A (primary key).

arch = schema.table('Archaeologist')
arch['ID'] = schema.column(int, autoincrement=True, key=True)
arch['Height'] = schema.column(float)

bio = schema.table('Biography')
bio['ID'] = schema.column(int, autoincrement=True, key=True)
bio['ArchID'] = schema.column(int)
bio['PubDate'] = schema.column(datetime.date)
In this example, each Biography row will have an ArchID value, which will equal, and refer to, the ID value of some Archaeologist.

You could stop at this point in your design, and simply remember what these keys are and how they relate, and manipulate them accordingly. But Geniusql allows you to explicitly declare these references:

bio.references['Archaeologist'] = ('ArchID', 'Archaeologist', 'ID')
You provide the near key, the far table name, and the far key.

What does an explicit reference buy for you? First, you can associate Tables without having to remember which keys are related. Second, joins discover references and automatically connect known paths for you.

Note, however, that a single Table A might have multiple relationships with Table B. For example, a Biography might be both written about one Archaeologist and written by another Archaeologist. Therefore, the key you use to add a reference tuple to the references dict doesn't have to be the name of the far table:

bio.references['Author'] = ('AuthorID', 'Archaeologist', 'ID')
bio.references['Subject'] = ('SubjectID', 'Archaeologist', 'ID')
In this way, we can both use multiple references, and write tools that are aware of them. By default, the join machinery will use the reference which matches the name of the far table (if present), but there are ways to specify you want to join using an alternate reference.

Indices

All Table instances, in addition to containing Column objects, also possess an indices attribute. This is an instance of an IndexSet, a dict-like container for Index objects. You can add indices to a Table via its add_index(columnkey) method; pass the key for the column you wish to index.