Direct-to-shard queries
PgDog has a powerful parser that can extract sharding hints directly from SQL queries. Queries that refer to a column in one of the sharded tables are sent directly to the corresponding database in the configuration.
Direct-to-shard queries are foundational to horizontal database scaling. The more queries can be routed to just one database, the more requests can be served by the entire cluster.
How it works
PgDog is using the pg_query library, which provides direct access to the native PostgreSQL parser. This allows PgDog to read and understand 100% of valid SQL queries and commands.

PgDog is deployed as a proxy between Postgres shards and the application and takes care of routing queries between them. Each SQL command is different and is handled differently by our query router, as documented below.
SELECT
To route SELECT
queries, the query router looks for a sharding key in the WHERE
clause. For example, if your database is sharded by the user_id
column, all queries that filter rows by that column, either directly or through a foreign key, can be sent to a single shard:
SELECT * FROM payments
INNER JOIN orders
ON orders.id = payments.order_id
WHERE
payments.user_id = $1; -- Sharding key.
Both regular queries and prepared statements are supported. So if your database driver is using placeholders instead of actual values, PgDog will extract the sharding key value from the extended protocol messages.
Supported syntax
The SELECT
query can express complex filtering logic and not all of it is currently supported. The following filters in the WHERE
will work:
Filter | Example |
---|---|
Column equals to a value | payments.user_id = $1 |
Column matches against a list | payments.user_id IN ($1, $2, $3) |
All other variations will be ignored and the query will be sent to all shards.
Query router improvements
This is an area of constant improvement. Check back here for updates or create an issue to request support for a particular filter you're using.
If the query has multiple sharding key filters, all of them will be extracted and converged to a set of unique shard numbers.
For example, when filtering by a list of values, e.g., WHERE user_id IN ($1, $2, $3)
, if all of them map to a single shard, the query will be sent to that shard only. If they map to two or more shards, it will be sent to all corresponding shards concurrently.
INSERT
Insert queries are routed using the values in the VALUES
clause, for example:
If the query is inserting a row into a sharded table, the query router will extract the sharding key, and route the query to the corresponding shard.
Just like for SELECT
queries, both prepared statements and regular queries are supported.
Supported syntax
To correctly identify the sharding key in the VALUES
clause, the INSERT
statement must explicitly name the columns in the tuple. Additionally, statements must create one row at a time. Multi-tuple INSERT
s are not currently supported.
For example:
UPDATE and DELETE
Both UPDATE
and DELETE
queries work identically to SELECT
queries. The query router looks inside the WHERE
clause for sharding keys, and routes the query to the corresponding shard.
If no WHERE
clause is present, or it's filtering on a column not used for sharding, the query is sent to all shards concurrently, for example:
Foreign keys
While it's best to choose a sharding column present in all tables, it is sometimes not desirable or possible to do so. For example, it's redundant to store a foreign key in a table that has a transitive relationship to another table:

In this example, the order_items
table has a foreign key to orders
, which in turn refers to users
. This makes order_items
related to users
as well, but it doesn't need a foreign key to that table. However, this also means that table doesn't have a sharding key.
To make querying the order_items
table in a sharded database possible, the following workarounds are available:
Workaround | Description |
---|---|
Add sharding key column | Add the sharding key column to the table and backfill it with corresponding values. |
Manual routing | Provide sharding hints to the query router via SQL comments or SET commands. |
Use joins | For SELECT queries only, refer to the table as part of a join to a table that has the sharding key column. All other queries would need to use manual routing. |
Adding the sharding key column is often best, because it makes writing queries a lot easier. The sharding key is usually a compact data type, like a BIGINT
or a UUID
, so it doesn't take up much space, and can be backfilled relatively quickly. If backfilling, make sure to do so in small batches, so as to reduce impact on database performance.
Sharding configuration
If most or all of your tables have the sharding key and the column name is the same, you can add it to pgdog.toml without specifying a table name, for example:
This will match all queries referring to all tables with the user_id
column and route them to a shard accordingly.
For the table storing the actual data referred to by the foreign keys, you can make another entry in the config, this time with the table name explicitly stated:
The latter will match queries referring to the users.id
column only. Together with the user_id
entry, all tables that contain the sharding key will be supported by the query router for direct-to-shard queries.