What is a Cost-Based Optimizer in CockroachDB?
When it comes to CockroachDB, scalability and performance have always been top priorities. That’s why version 2.1 of CockroachDB includes a cost-based optimizer. This integral new part of CockroachDB determines the lowest “cost” for a given query, which results in big performance boosts for applications that utilize the database. If you’re not familiar with cost-based optimization, don’t be intimidated by the term– it may be a powerful concept, but it’s also a simple one. In this article, we’ll take a close look at the cost-based optimizer in CockroachDB, explaining how it works and why it’s so valuable.
What is a Cost-Based Optimizer in CockroachDB?
Before we talk about the benefits of the cost-based optimizer in CockroachDB, let’s take a step back and explain what it is. When you think of “cost”, your mind probably goes to dollars and cents. However, in the world of databases, cost is associated with the amount of time it takes to get the result of a query. The goal is to reduce that cost as much as possible. How can that be accomplished?
The answer is simple: Find the quickest possible path to the data you need. For a typical SQL query, there can be thousands or even millions of “query plans” available when there are multiple nodes that contain the same data. A query plan is a set of steps needed to access data in a relational database. Cost-based optimization works by evaluating each of these query plans, estimating the cost of each one and selecting the plan with the lowest cost. This functionality is built into CockroachDB to ensure that queries are always executed in the fastest way possible. As page load speed and responsiveness become increasingly critical, having the fastest, most efficient query plan is vital.
The cost-based optimizer performs an exhaustive analysis on the query, asking many questions that vary depending on the structure of the query. The optimizer might need to decide among a hash join, a lookup join and a merge join. It may have to determine whether to evaluate a particular field either before a join or after a join. Even when these individual decisions are made, the analysis isn’t complete. The optimal query plan often relies on finding the ideal combination of these answers.
The Role of Indexed Data
Cost-based optimization also takes into account indexed data. When data is queried frequently, it’s usually indexed so that the database does not have to search row by row to find what it’s looking for. Cost-based optimization does take into account indexed data as a factor in the computation, but even when indexed data exists, there may still be decisions for the optimizer to make. Imagine a query with multiple clauses, one of which filters by name. If there happens to be a secondary index on the “name” field, the optimizer would need to decide whether to use that index to locate matching names or to rely on the primary index to locate matching ids.
The Heuristic Planner
The cost-based optimizer in CockroachDB has, for the most part, taken the place of something called the heuristic planner. A heuristic planner is a different type of optimizer where decisions are made based on a predefined set of heuristics, or rules. For example, there might be a rule that when a query contains an “equals” condition, it’s always best to use a hash join instead of another structure like a nested loop join. Rules like these usually result in improved query plans, so they’re considered effective heuristics.
However, relying on these static rules isn’t ideal. Although they generally work well in determining a “pretty good” query plan for a given query, they don’t always find the best query plan. No matter how carefully you tune your set of heuristics, there will always be exceptions to the rule or cases where one rule might conflict with another. Of course, you might be wondering: “Can’t a cost-based optimizer be wrong, too?” The answer to that is yes– a cost-based optimizer’s decisions are dependent on the accuracy of the costs it assigns. The cost-based optimizer in CockroachDB has access to statistical data about tables and their contents; these database statistics help the optimizer make more informed, accurate decisions when selecting the optimal query plan.
While most queries will now use the cost-based optimizer in CockroachDB, there are still a few situations where the heuristic planner will still be used. The cost-based optimizer supports most SQL statements; in the rare case that an unsupported query is run, the heuristic planner will take over instead. The heuristic optimizer will also be used if the cost-based optimizer is explicitly turned off.
The introduction of a cost-based optimizer is an exciting step forward for CockroachDB. This first release, included in CockroachDB version 2.1, represents both our commitment to continuous improvement and our focus on scalability and performance. If you’re thinking about using CockroachDB, don’t hesitate to reach out to an expert at Object Rocket. We’d be happy to discuss your specific use case and find the solution that’s right for you.