SQL Injection Prevention in Geniusql
SQL injection can take many forms. The primary mechanisms, however, are type mismatches and incorrect escaping. Geniusql completely avoids both vectors due to its strong typing and complete syntax parsing. Typing and parsing protection is managed in several different places, offering more defense-in-depth than any other ORM design.
Complete Parsing
When a developer writes a Geniusql query, they do so using native Python syntax. When the query is to be executed, Geniusql parses the query function into an Abstract Sytntax Tree, and then de-parses that tree into concrete SQL. For example, given the following query restriction:
lambda a: a.Lifespan > 40
...Geniusql will construct an intermediate tree to represent the expression:
[>]
/ \
[getattr] [40.0]
/ \
[a] [Lifespan]
Each [] in the tree is a separate object, an instance of the SQLExpression class. Each SQLExpression knows both its own Python type (e.g. <float>) and its database type (e.g. FLOAT8). As the tree is constructed, the SQL for each branch is generated and combined. The example branch above produces the SQL "Animal"."Lifespan" > 40 (the 'a' node is replaced with a Table name during deparsing).
Geniusql first takes advantage of Python's built-in parser and compiler. Queries which are not valid Python cannot even be compiled, much less deparsed to SQL or executed. This alone denies a large swath of possible injection attacks (both malicious and accidental). For example, broken queries like lambda a: a.Species == 'Lion won't even compile; Python will complain right away with SyntaxError: EOL while scanning single-quoted string.
Escaping
Consider the following code. The developer has expected the user to supply a string, perhaps in a web form:
data = db.select((Animal, ['ID'], lambda a: a.Species == species))
A malicious user might suspect that the SQL will delimit strings by single quotes (and in fact it does), so he or she might attempt to broaden the results by injecting the malicious string "Ape' or 1=1 or '". If the SQL writer failed to properly escape the embedded single quotes, the malicious user might succeed in running the SQL:
SELECT "ID" FROM "Animal" WHERE ("Animal"."Species" = 'Ape' or 1=1 or '')
Geniusql, however, performs proper escaping of all string constants. In this example, the executed Postgres SQL is:
SELECT "ID" FROM "Animal" WHERE ("Animal"."Species" = 'Ape'' or 1=1 or ''')
If you examine it closely, you will see that all embedded single quotes are properly escaped, and Postgres sees the single string constant: Ape'' or 1=1 or ''.
This is an important difference between Geniusql and ORMs or DALs which construct SQL without type information. As Steve Friedl notes*, it is important not only that the embedded single-quote fails to cause any damage; it is equally important that the statement not error, because doing so communicates valuable information to SQL injection attackers about the SQL construction algorithm.
The same is true for all database types, not just strings. Geniusql will either faithfully render the developer's intent from Python to SQL, or it will fail. All builtin Python and standard SQL types are fully supported. If a database offers an extension type which Geniusql does not support directly, you need only to write a small DatabaseType class, plus one or more Adapter classes to translate from Python to SQL and back (and enforce operator validity).
The ability for Geniusql to support (and restrict) written values in the SQL is not limited to compatible types. For example, if your database designer decided to use VARCHAR(8) columns for date fields (in YYYYMMDD format, for example), it's a snap to write an Adapter between the Python date type and the VARCHAR type your database provides. That adapter will enforce the proper representation of a Python date as an escaped, 8-character string.
Strong typing
But the representation of values from one type system (Python) in another (SQL) is not the only safeguard Geniusql provides. Type information is stored and checked for each SQLExpression node, both singly (values must be of the correct type when producing their SQL representations) and when combined by operators. For example, if a developer makes the mistake of trying to add a float to a date: lambda z: [z.Founded + 13.0], Geniusql will error: CannotRepresent: No binary function '+' between geniusql.deparse.SQLExpression('"public"."Zoo"."Founded"', dbtype=DATE) and geniusql.deparse.SQLExpression('13.0', dbtype=FLOAT8). If the developer wrote an even more broken query lambda z: z.Founded + 13.0 (forgot the list brackets), Genuisql errors with: ValueError: Attribute AST roots must be Tuple or List, not Add.
Deferred semantic errors like lambda a: a.Species == Tiger (the developer has forgotten to quote the string) will also fail early. Python would raise a NameError when the function is called, but Geniusql will fail even faster; when the lambda is converted to an Expression instance (as they all must be), Geniusql will raise KeyError: "'Tiger' is not present in supplied globals."
Comparison to other solutions
Placeholders and prepared statements
Many database drivers allow the use of placeholders to separate dynamic values from their static SQL string (in fact, it's an SQL standard). For example, you might decide you can mitigate injection attacks by writing all your database code like this:
db.fetch_one('SELECT "ID" FROM "Animal" WHERE ("Animal"."Species" = ?)', species)
This technique works fine. However, Geniusql gives you exactly the same capability by allowing free variables in its query functions, and is shorter and more readable to boot:
db.select((Animal, ['ID'], lambda a: a.Species == species))
Stored procedures
Stored procedures help stop SQL injection primarily by enforcing type constraints. It might seem at first that it does so at a larger grain than Geniusql, since stored procedures may contain multiple statements. However, if you compare the compilation of a stored procedure and the parsing steps of a set of Geniusql queries, they essentially perform the same type checking: they compare types involved in comparison and binary operations to assert that they are of compatible types.
Although stored procedures might seem to have an edge in that they check the types of user input as early as possible, in practice this is not a very strong deterrent. It is still possible to write stored procedures which are vulnerable to injection attacks. For example, you might make the mistake of constructing an SQL statement using a user-supplied "ORDER BY" clause via:
query_value := 'SELECT id FROM transactions ORDER BY ' || order_by;
Without rigorous filtering of the order_by input (whether in the stored procedure or in calling code), this is just as open to SQL injection attacks as any naive code.
