Joins of a sets of records is objectively the most expensive operation in DBMSs. In my previous post on factorization, I said that in the field of databases, once in a while you run into a very simple idea that deviates from the norm that gets you very excited. Today, I will discuss another such idea, worst-case optimal join (wcoj) algorithms. Wcoj algorithms and the theory around it in one sentence says this:
- Queries involving complex "cyclic joins" over many-to-many relationships should be evaluated column at a time instead of table at a time, which is the norm.
Wcoj algorithms find their best applications when finding cyclic patterns on graphs, such as cliques or cycles, which is common in the workloads of fraud detection and recommendation applications. As such, they should be integrated into every graph DBMS (and possibly to RDBMSs) and I am convinced that they eventually will.