Wednesday, April 14, 2010

Finding Our Way to LATERAL

One of the few types of SQL query that we don't yet implement in PostgreSQL is LATERAL, and we seem to get fairly regular requests for it. The typical use case for this feature is where you have a table called, say, foo, and a set-returning function called, say, func, and you'd like to do this:

SELECT foo.x, bar.x FROM foo, func(foo.x) AS bar;

But that doesn't work:

ERROR: function expression in FROM cannot refer to other relations of same query level

In fact, the SQL standard apparently specifies that this is not legal syntax. If you want to reference a variable at the same query level, you need to wrap it using LATERAL():

SELECT foo.x, bar.x FROM foo, LATERAL(func(foo.x)) AS bar;

Currently, PostgreSQL does not support LATERAL(), so this query fails with a complaint that there's no function called LATERAL available. But wouldn't it be neat if it worked? We get requests for this feature pretty regularly. Right now, there's a pretty limited number of ways to call SRFs, and it's hard to get the equivalent of the above query without major gymnastics.

I wanted to see how much work it would be to implement LATERAL() in PostgreSQL, so, just for fun, I beat the parser with a large enough sledgehammer to make it accept the first of the two queries listed above. This effort was doomed to failure, however. When the planner goes to join foo to func(foo.x), it doesn't understand the implicit dependency between those two items. By jiggering the costs, you can get the planner to decide on a nonsensical plan like this: for each row in the output of func(foo.x), scan all the rows in foo. Of course, to us human beings, it's obvious that we have to do those two steps in the opposite order, but the planner doesn't know that. Its only knowledge about the connection between different tables comes from the join clauses, and there are no join clauses in this query.

As it turns out, fixing this turns out to be closely related to solving a problem with nested-loop-with-inner-indexscan plans, which could more generally be called "partial index scan" plans. Currently, we find these plans using some special case crocks that cover most, but not quite all, of the interesting cases. Andrew Gierth supplied this example:
SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a);
If t1 is small and t2 and t3 are large, we might like to get a plan that looks something like this:

Nested Loop Left Join
Join Filter: (t1.a = t2.a)
-> Seq Scan on t1
-> Nested Loop
Join Filter: (t2.a = t3.a)
-> Index Scan using t2_a on t2
Index Qual: (t2.a = t1.a)
-> Index Scan using t3_a on t3
Index Qual: (t3.a = t1.a)

But we won't. Instead, we'll spend a lot of energy joining all of t2 to all of t3, and then throw away those results when we do the join to t1. This happens because planner will only consider a partial index scan on t2 or t3 if the index scan is the inner side of a nested loop and the table providing the value to look for is on the outer side of the same nested loop. Here, the index quals are referencing t1, which isn't part of the join: it's one level up. In many cases, the planner can work around the inability to form these types of plans by reordering the joins, but swapping the order of the LEFT and INNER joins in this query would change it's meaning, so that doesn't work in this case.

Tom Lane had an idea about how to generalize the inner-indexscan machinery to handle this. Instead of considering partial index scans using special-case logic at the time we form a nested loop, Tom proposed, essentially, promoting them to first-class citizens, to be considered along with all the other paths we examine when forming a join. To make that work, a little bit of tracking data must be added: if we choose a partial index scan path for t2, we don't need to join it directly to t1, but we do need to *eventually* put whichever portion of the plan contains it on the inner side of a nested loop that has t1 on the outer side, so that t1 can supply the values for the index lookup. As it happens, set-returning functions called via LATERAL need exactly the same thing: a way to make sure that the portion of the query tree where they end up being called is somewhere under the inner side of a nested loop whose outer side supplies the necessary variables.

There's one more fly in the ointment: this is not just a planning problem. We also need to make sure that whatever plan the planner creates can be executed by the executor. It turns out that the executor has an almost identical limitation. Nested loops pass down an "outer tuple" to their inner plan, and if that plan happens to be an index scan then the outer tuple can be used to constrain the index lookup. Unfortunately, again, this only works if the nested loop immediate parent of the index scan. Tom Lane has an idea for fixing this, too: replace the existing mechanism with executor parameters, which are more general and can pass values between far-distant parts of the query plan. Again, we'll use this same design for LATERAL, when a set-returning function has input parameters that must be supplied by a table scan elsewhere in the query.

So, when might we see actual LATERAL support in PostgreSQL? Tom Lane has indicated that he plans to fix the executor limitation described above for 9.1; I'm not sure whether he's planning to fix the planner limitation as well. Once those are fixed, I think we'll be within striking distance of the core feature. So, I'm hopeful that we'll get LATERAL in either 9.1 or 9.2. Stay tuned!

2 comments:

  1. Genial dispatch and this post helped me alot in my college assignement. Say thank you you for your information.

    ReplyDelete
  2. LATERAL is finally implemented in PostgreSQL 9.3 (which is currently in beta1 status)

    ReplyDelete