Skip to main content

2 posts tagged with "internals"

View All Tags

· 21 min read
Semih Salihoğlu

Joins of a sets of records is objectively the most expensive operation in DBMSs. In my previous post on factorization, I said that in the field of databases, once in a while you run into a very simple idea that deviates from the norm that gets you very excited. Today, I will discuss another such idea, worst-case optimal join (wcoj) algorithms. Wcoj algorithms and the theory around it in one sentence says this:

  • Queries involving complex "cyclic joins" over many-to-many relationships should be evaluated column at a time instead of table at a time, which is the norm.

Wcoj algorithms find their best applications when finding cyclic patterns on graphs, such as cliques or cycles, which is common in the workloads of fraud detection and recommendation applications. As such, they should be integrated into every graph DBMS (and possibly to RDBMSs) and I am convinced that they eventually will.

· 23 min read
Semih Salihoğlu

Many of the core principles of how to develop DBMSs are well understood. For example, a very good query compilation paradigm is to map high-level queries to a logical plan of relational operators, then optimize this plan, and then further map it to an executable code often in the form of a physical query plan. Similarly, if you want updates to a DBMS to be atomic and durable, a good paradigm is to use a write-ahead log that serves as a source of truth and can be used to undo or redo operations. Many systems adopt such common wisdom paradigms. As core DBMS researcher, once in a while however, you run into a very simple idea that deviates from the norm that gets you very excited. Today, I want to write about one such idea called factorization.