Skip to main content
 首页 » 数据库

PostgreSQL 分区表教程

2022年07月19日136lautakyan007

由于数据库中存储了大量数据,性能和可伸缩性是受到影响的两个主要因素。随着加载表数据的增加,需要更多的数据扫描、页交换到内存,以及其他表操作成本也会增加。分区可能是一个很好的解决方案,因为它可以将一个大表划分为更小的表,从而减少表扫描和内存交换问题,从而提高数据库性能。

分区有助于扩展PostgreSQL,它将大型逻辑表拆分为较小的物理表,这些物理表可以根据使用情况存储在不同的存储介质上。通过使用显示分区及利用postgres_fdw实现的外部表,用户可以更好地利用伸缩性。

自从 PostgreSQL10 支持分区表,并在后续版本中逐步支持list,range,hash以及它们的组合分区方法。

1. 概述

分区表的优势就是单个表不能提供的功能。因此,当有大量数据需要写到单个表时可以使用分区表。除了数据因素,其他情况也要考虑,如频繁更新的数据,在一段时间内使用的数据,一个范围数据可以划分多小等。通过合理规划并考虑多种因素,表分区可以极大地提高性能,使得PostgreSQL可以扩展支持更大数据集。

PostgreSQL支持用户可以很灵活定义分区。可以基于需求创建任何级别分区,可以在单个分区或全部分区上使用约束、触发器、索引。相比于单个大表性能显著提升。分区连接和分区聚合特性可以提高复杂查询计算的性能。批量载入或删除会更快,因为可以在单个分区执行这些操作。每个分区能容纳的数据可以根据数据的使用频率来确定,我们可以把不常用的数据存储在便宜的、较慢的存储媒介上。

2. 分区示例

下面通过示例展示如何使用这些不同类型的分区表。

2.1 列表分区

列表分区根据预定义的值创建分区表,缺省分区容纳没有明确分区的值(11版本才支持)。

postgres=# CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY LIST(status); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_active PARTITION OF customers FOR VALUES IN ('ACTIVE'); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_archived PARTITION OF customers FOR VALUES IN ('EXPIRED'); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_others PARTITION OF customers DEFAULT; 
 
CREATE TABLE 
 
 
 
postgres=# \d+ customers 
 
                           Partitioned table "public.customers" 
 
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description  
 
--------+---------+-----------+----------+---------+----------+--------------+------------- 
 
 id     | integer |           |          |         | plain    |              |  
 
 status | text    |           |          |         | extended |              |  
 
 arr    | numeric |           |          |         | main     |              |  
 
Partition key: LIST (status) 
 
Partitions: cust_active FOR VALUES IN ('ACTIVE'), 
 
            cust_archived FOR VALUES IN ('EXPIRED'), 
 
            cust_others DEFAULT 
 
 
 
postgres=# \d+ cust_active  
 
                                Table "public.cust_active" 
 
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description  
 
--------+---------+-----------+----------+---------+----------+--------------+------------- 
 
 id     | integer |           |          |         | plain    |              |  
 
 status | text    |           |          |         | extended |              |  
 
 arr    | numeric |           |          |         | main     |              |  
 
Partition of: customers FOR VALUES IN ('ACTIVE') 
 
Partition constraint: ((status IS NOT NULL) AND (status = 'ACTIVE'::text)) 
 
Access method: heap 
 
 
 
postgres=# INSERT INTO customers VALUES (1,'ACTIVE',100), (2,'RECURRING',20), (3,'EXPIRED',38), (4,'REACTIVATED',144); 
 
INSERT 0 4 
 
postgres=# SELECT tableoid::regclass,* FROM customers; 
 
   tableoid    | id |   status    | arr  
 
---------------+----+-------------+----- 
 
 cust_active   |  1 | ACTIVE      | 100 
 
 cust_archived |  3 | EXPIRED     |  38 
 
 cust_others   |  2 | RECURRING   |  20 
 
 cust_others   |  4 | REACTIVATED | 144 
 
(4 rows) 

2.2 范围分区

范围分区根据分区键对应值的范围进行分区。需要指定分区的最大值和最小值,包括最小值被,最大值不被包括。

postgres=# CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY RANGE(arr); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_arr_small PARTITION OF customers FOR VALUES FROM (MINVALUE) TO (25); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_arr_medium PARTITION OF customers FOR VALUES FROM (25) TO (75); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_arr_large PARTITION OF customers FOR VALUES FROM (75) TO (MAXVALUE); 
 
CREATE TABLE 
 
postgres=#  
 
postgres=# \d+ customers 
 
                           Partitioned table "public.customers" 
 
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description  
 
--------+---------+-----------+----------+---------+----------+--------------+------------- 
 
 id     | integer |           |          |         | plain    |              |  
 
 status | text    |           |          |         | extended |              |  
 
 arr    | numeric |           |          |         | main     |              |  
 
Partition key: RANGE (arr) 
 
Partitions: cust_arr_large FOR VALUES FROM ('75') TO (MAXVALUE), 
 
            cust_arr_medium FOR VALUES FROM ('25') TO ('75'), 
 
            cust_arr_small FOR VALUES FROM (MINVALUE) TO ('25') 
 
 
 
postgres=# \d+ cust_arr_small  
 
                               Table "public.cust_arr_small" 
 
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description  
 
--------+---------+-----------+----------+---------+----------+--------------+------------- 
 
 id     | integer |           |          |         | plain    |              |  
 
 status | text    |           |          |         | extended |              |  
 
 arr    | numeric |           |          |         | main     |              |  
 
Partition of: customers FOR VALUES FROM (MINVALUE) TO ('25') 
 
Partition constraint: ((arr IS NOT NULL) AND (arr < '25'::numeric)) 
 
Access method: heap 
 
 
 
postgres=#  
 
postgres=# INSERT INTO customers VALUES (1,'ACTIVE',100), (2,'RECURRING',20), (3,'EXPIRED',38), (4,'REACTIVATED',144); 
 
INSERT 0 4 
 
postgres=# SELECT tableoid::regclass,* FROM customers; 
 
    tableoid     | id |   status    | arr  
 
-----------------+----+-------------+----- 
 
 cust_arr_small  |  2 | RECURRING   |  20 
 
 cust_arr_medium |  3 | EXPIRED     |  38 
 
 cust_arr_large  |  1 | ACTIVE      | 100 
 
 cust_arr_large  |  4 | REACTIVATED | 144 
 
(4 rows) 

2.3 Hash分区

自11版本才支持,hash分区使用对值取模的余数进行分区,当插入数据时生成hash值,对hash值取模,根据其余数确定分区。

postgres=# CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY HASH(id); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_part1 PARTITION OF customers FOR VALUES WITH (modulus 3, remainder 0); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_part2 PARTITION OF customers FOR VALUES WITH (modulus 3, remainder 1); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_part3 PARTITION OF customers FOR VALUES WITH (modulus 3, remainder 2); 
 
CREATE TABLE 
 
postgres=#  
 
postgres=# \d+ customers 
 
                           Partitioned table "public.customers" 
 
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description  
 
--------+---------+-----------+----------+---------+----------+--------------+------------- 
 
 id     | integer |           |          |         | plain    |              |  
 
 status | text    |           |          |         | extended |              |  
 
 arr    | numeric |           |          |         | main     |              |  
 
Partition key: HASH (id) 
 
Partitions: cust_part1 FOR VALUES WITH (modulus 3, remainder 0), 
 
            cust_part2 FOR VALUES WITH (modulus 3, remainder 1), 
 
            cust_part3 FOR VALUES WITH (modulus 3, remainder 2) 
 
 
 
postgres=# \d+ cust_part1  
 
                                 Table "public.cust_part1" 
 
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description  
 
--------+---------+-----------+----------+---------+----------+--------------+------------- 
 
 id     | integer |           |          |         | plain    |              |  
 
 status | text    |           |          |         | extended |              |  
 
 arr    | numeric |           |          |         | main     |              |  
 
Partition of: customers FOR VALUES WITH (modulus 3, remainder 0) 
 
Partition constraint: satisfies_hash_partition('16475'::oid, 3, 0, id) 
 
Access method: heap 
 
 
 
postgres=#  
 
postgres=# INSERT INTO customers VALUES (1,'ACTIVE',100), (2,'RECURRING',20), (3,'EXPIRED',38), (4,'REACTIVATED',144); 
 
INSERT 0 4 
 
postgres=# SELECT tableoid::regclass,* FROM customers; 
 
  tableoid  | id |   status    | arr  
 
------------+----+-------------+----- 
 
 cust_part1 |  2 | RECURRING   |  20 
 
 cust_part1 |  4 | REACTIVATED | 144 
 
 cust_part2 |  3 | EXPIRED     |  38 
 
 cust_part3 |  1 | ACTIVE      | 100 
 
(4 rows) 

2.4 多级分区

PostgreSQL多级分区支持N级。分区方式有LIST-LIST, LIST-RANGE, LIST-HASH, RANGE-RANGE, RANGE-LIST, RANGE-HASH, HASH-HASH, HASH-LIST, HASH-RANGE 。

postgres=# CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY LIST(status); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_active PARTITION OF customers FOR VALUES IN ('ACTIVE','RECURRING','REACTIVATED') PARTITION BY RANGE(arr); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_arr_small PARTITION OF cust_active FOR VALUES FROM (MINVALUE) TO (101) PARTITION BY HASH(id); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_part11 PARTITION OF cust_arr_small FOR VALUES WITH (modulus 2, remainder 0); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_part12 PARTITION OF cust_arr_small FOR VALUES WITH (modulus 2, remainder 1); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_other PARTITION OF customers DEFAULT PARTITION BY RANGE(arr); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_arr_large PARTITION OF cust_other FOR VALUES FROM (101) TO (MAXVALUE) PARTITION BY HASH(id); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_part21 PARTITION OF cust_arr_large FOR VALUES WITH (modulus 2, remainder 0); 
 
CREATE TABLE 
 
postgres=# CREATE TABLE cust_part22 PARTITION OF cust_arr_large FOR VALUES WITH (modulus 2, remainder 1); 
 
CREATE TABLE 
 
postgres=#  
 
postgres=# \d+ customers 
 
                           Partitioned table "public.customers" 
 
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description  
 
--------+---------+-----------+----------+---------+----------+--------------+------------- 
 
 id     | integer |           |          |         | plain    |              |  
 
 status | text    |           |          |         | extended |              |  
 
 arr    | numeric |           |          |         | main     |              |  
 
Partition key: LIST (status) 
 
Partitions: cust_active FOR VALUES IN ('ACTIVE', 'RECURRING', 'REACTIVATED'), PARTITIONED, 
 
            cust_other DEFAULT, PARTITIONED 
 
 
 
postgres=# \d+ cust_active  
 
                          Partitioned table "public.cust_active" 
 
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description  
 
--------+---------+-----------+----------+---------+----------+--------------+------------- 
 
 id     | integer |           |          |         | plain    |              |  
 
 status | text    |           |          |         | extended |              |  
 
 arr    | numeric |           |          |         | main     |              |  
 
Partition of: customers FOR VALUES IN ('ACTIVE', 'RECURRING', 'REACTIVATED') 
 
Partition constraint: ((status IS NOT NULL) AND (status = ANY (ARRAY['ACTIVE'::text, 'RECURRING'::text, 'REACTIVATED'::text]))) 
 
Partition key: RANGE (arr) 
 
Partitions: cust_arr_small FOR VALUES FROM (MINVALUE) TO ('101'), PARTITIONED 
 
 
 
postgres=# \d+ cust_arr_small  
 
                         Partitioned table "public.cust_arr_small" 
 
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description  
 
--------+---------+-----------+----------+---------+----------+--------------+------------- 
 
 id     | integer |           |          |         | plain    |              |  
 
 status | text    |           |          |         | extended |              |  
 
 arr    | numeric |           |          |         | main     |              |  
 
Partition of: cust_active FOR VALUES FROM (MINVALUE) TO ('101') 
 
Partition constraint: ((status IS NOT NULL) AND (status = ANY (ARRAY['ACTIVE'::text, 'RECURRING'::text, 'REACTIVATED'::text])) AND (arr IS NOT NULL) AND (arr < '101'::numeric)) 
 
Partition key: HASH (id) 
 
Partitions: cust_part11 FOR VALUES WITH (modulus 2, remainder 0), 
 
            cust_part12 FOR VALUES WITH (modulus 2, remainder 1) 
 
 
 
postgres=# \d+ cust_part11  
 
                                Table "public.cust_part11" 
 
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description  
 
--------+---------+-----------+----------+---------+----------+--------------+------------- 
 
 id     | integer |           |          |         | plain    |              |  
 
 status | text    |           |          |         | extended |              |  
 
 arr    | numeric |           |          |         | main     |              |  
 
Partition of: cust_arr_small FOR VALUES WITH (modulus 2, remainder 0) 
 
Partition constraint: ((status IS NOT NULL) AND (status = ANY (ARRAY['ACTIVE'::text, 'RECURRING'::text, 'REACTIVATED'::text])) AND (arr IS NOT NULL) AND (arr < '101'::numeric) AND satisfies_hash_partition('16621'::oid, 2, 0, id)) 
 
Access method: heap 
 
 
 
postgres=#  
 
postgres=# INSERT INTO customers VALUES (1,'ACTIVE',100), (2,'RECURRING',20), (3,'REACTIVATED',38), (4,'EXPIRED',144); 
 
INSERT 0 4 
 
postgres=# SELECT tableoid::regclass,* FROM customers; 
 
  tableoid   | id |   status    | arr  
 
-------------+----+-------------+----- 
 
 cust_part11 |  1 | ACTIVE      | 100 
 
 cust_part11 |  2 | RECURRING   |  20 
 
 cust_part12 |  3 | REACTIVATED |  38 
 
 cust_part22 |  4 | EXPIRED     | 144 
 
(4 rows) 

2.5 分区限制

自 PostgreSQL 10开始引入分区,后续不断提升功能并趋于稳定,但仍然有一些限制:

  1. 在分区表上的唯一约束必须包括所有分区键。较好方法是在每个分区上创建唯一约束,而不是在分区表上创建约束。
  2. 分区表不支持BEFORE ROW 触发器,需要须在各个分区上定义,而不是分区表。
  3. 范围分区不允许有null值。
  4. 当没有显式创建子分区时PostgreSQL不会自动创建子分区。因此如果有子分区,至少要有一个分区保存值。

3. 总结

本文介绍了PostgreSQL 分区表的优势,如何使用分区表,包括列表分区、范围分区、哈希分区以及三者的组合分区。


本文参考链接:https://blog.csdn.net/neweastsun/article/details/112444364
阅读延展