由于数据库中存储了大量数据,性能和可伸缩性是受到影响的两个主要因素。随着加载表数据的增加,需要更多的数据扫描、页交换到内存,以及其他表操作成本也会增加。分区可能是一个很好的解决方案,因为它可以将一个大表划分为更小的表,从而减少表扫描和内存交换问题,从而提高数据库性能。
分区有助于扩展PostgreSQL,它将大型逻辑表拆分为较小的物理表,这些物理表可以根据使用情况存储在不同的存储介质上。通过使用显示分区及利用postgres_fdw实现的外部表,用户可以更好地利用伸缩性。
自从 PostgreSQL10 支持分区表,并在后续版本中逐步支持list,range,hash以及它们的组合分区方法。
1. 概述
分区表的优势就是单个表不能提供的功能。因此,当有大量数据需要写到单个表时可以使用分区表。除了数据因素,其他情况也要考虑,如频繁更新的数据,在一段时间内使用的数据,一个范围数据可以划分多小等。通过合理规划并考虑多种因素,表分区可以极大地提高性能,使得PostgreSQL可以扩展支持更大数据集。
PostgreSQL支持用户可以很灵活定义分区。可以基于需求创建任何级别分区,可以在单个分区或全部分区上使用约束、触发器、索引。相比于单个大表性能显著提升。分区连接和分区聚合特性可以提高复杂查询计算的性能。批量载入或删除会更快,因为可以在单个分区执行这些操作。每个分区能容纳的数据可以根据数据的使用频率来确定,我们可以把不常用的数据存储在便宜的、较慢的存储媒介上。
2. 分区示例
下面通过示例展示如何使用这些不同类型的分区表。
2.1 列表分区
列表分区根据预定义的值创建分区表,缺省分区容纳没有明确分区的值(11版本才支持)。
postgres=# CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY LIST(status);
CREATE TABLE
postgres=# CREATE TABLE cust_active PARTITION OF customers FOR VALUES IN ('ACTIVE');
CREATE TABLE
postgres=# CREATE TABLE cust_archived PARTITION OF customers FOR VALUES IN ('EXPIRED');
CREATE TABLE
postgres=# CREATE TABLE cust_others PARTITION OF customers DEFAULT;
CREATE TABLE
postgres=# \d+ customers
Partitioned table "public.customers"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | text | | | | extended | |
arr | numeric | | | | main | |
Partition key: LIST (status)
Partitions: cust_active FOR VALUES IN ('ACTIVE'),
cust_archived FOR VALUES IN ('EXPIRED'),
cust_others DEFAULT
postgres=# \d+ cust_active
Table "public.cust_active"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | text | | | | extended | |
arr | numeric | | | | main | |
Partition of: customers FOR VALUES IN ('ACTIVE')
Partition constraint: ((status IS NOT NULL) AND (status = 'ACTIVE'::text))
Access method: heap
postgres=# INSERT INTO customers VALUES (1,'ACTIVE',100), (2,'RECURRING',20), (3,'EXPIRED',38), (4,'REACTIVATED',144);
INSERT 0 4
postgres=# SELECT tableoid::regclass,* FROM customers;
tableoid | id | status | arr
---------------+----+-------------+-----
cust_active | 1 | ACTIVE | 100
cust_archived | 3 | EXPIRED | 38
cust_others | 2 | RECURRING | 20
cust_others | 4 | REACTIVATED | 144
(4 rows)
2.2 范围分区
范围分区根据分区键对应值的范围进行分区。需要指定分区的最大值和最小值,包括最小值被,最大值不被包括。
postgres=# CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY RANGE(arr);
CREATE TABLE
postgres=# CREATE TABLE cust_arr_small PARTITION OF customers FOR VALUES FROM (MINVALUE) TO (25);
CREATE TABLE
postgres=# CREATE TABLE cust_arr_medium PARTITION OF customers FOR VALUES FROM (25) TO (75);
CREATE TABLE
postgres=# CREATE TABLE cust_arr_large PARTITION OF customers FOR VALUES FROM (75) TO (MAXVALUE);
CREATE TABLE
postgres=#
postgres=# \d+ customers
Partitioned table "public.customers"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | text | | | | extended | |
arr | numeric | | | | main | |
Partition key: RANGE (arr)
Partitions: cust_arr_large FOR VALUES FROM ('75') TO (MAXVALUE),
cust_arr_medium FOR VALUES FROM ('25') TO ('75'),
cust_arr_small FOR VALUES FROM (MINVALUE) TO ('25')
postgres=# \d+ cust_arr_small
Table "public.cust_arr_small"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | text | | | | extended | |
arr | numeric | | | | main | |
Partition of: customers FOR VALUES FROM (MINVALUE) TO ('25')
Partition constraint: ((arr IS NOT NULL) AND (arr < '25'::numeric))
Access method: heap
postgres=#
postgres=# INSERT INTO customers VALUES (1,'ACTIVE',100), (2,'RECURRING',20), (3,'EXPIRED',38), (4,'REACTIVATED',144);
INSERT 0 4
postgres=# SELECT tableoid::regclass,* FROM customers;
tableoid | id | status | arr
-----------------+----+-------------+-----
cust_arr_small | 2 | RECURRING | 20
cust_arr_medium | 3 | EXPIRED | 38
cust_arr_large | 1 | ACTIVE | 100
cust_arr_large | 4 | REACTIVATED | 144
(4 rows)
2.3 Hash分区
自11版本才支持,hash分区使用对值取模的余数进行分区,当插入数据时生成hash值,对hash值取模,根据其余数确定分区。
postgres=# CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY HASH(id);
CREATE TABLE
postgres=# CREATE TABLE cust_part1 PARTITION OF customers FOR VALUES WITH (modulus 3, remainder 0);
CREATE TABLE
postgres=# CREATE TABLE cust_part2 PARTITION OF customers FOR VALUES WITH (modulus 3, remainder 1);
CREATE TABLE
postgres=# CREATE TABLE cust_part3 PARTITION OF customers FOR VALUES WITH (modulus 3, remainder 2);
CREATE TABLE
postgres=#
postgres=# \d+ customers
Partitioned table "public.customers"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | text | | | | extended | |
arr | numeric | | | | main | |
Partition key: HASH (id)
Partitions: cust_part1 FOR VALUES WITH (modulus 3, remainder 0),
cust_part2 FOR VALUES WITH (modulus 3, remainder 1),
cust_part3 FOR VALUES WITH (modulus 3, remainder 2)
postgres=# \d+ cust_part1
Table "public.cust_part1"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | text | | | | extended | |
arr | numeric | | | | main | |
Partition of: customers FOR VALUES WITH (modulus 3, remainder 0)
Partition constraint: satisfies_hash_partition('16475'::oid, 3, 0, id)
Access method: heap
postgres=#
postgres=# INSERT INTO customers VALUES (1,'ACTIVE',100), (2,'RECURRING',20), (3,'EXPIRED',38), (4,'REACTIVATED',144);
INSERT 0 4
postgres=# SELECT tableoid::regclass,* FROM customers;
tableoid | id | status | arr
------------+----+-------------+-----
cust_part1 | 2 | RECURRING | 20
cust_part1 | 4 | REACTIVATED | 144
cust_part2 | 3 | EXPIRED | 38
cust_part3 | 1 | ACTIVE | 100
(4 rows)
2.4 多级分区
PostgreSQL多级分区支持N级。分区方式有LIST-LIST, LIST-RANGE, LIST-HASH, RANGE-RANGE, RANGE-LIST, RANGE-HASH, HASH-HASH, HASH-LIST, HASH-RANGE 。
postgres=# CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY LIST(status);
CREATE TABLE
postgres=# CREATE TABLE cust_active PARTITION OF customers FOR VALUES IN ('ACTIVE','RECURRING','REACTIVATED') PARTITION BY RANGE(arr);
CREATE TABLE
postgres=# CREATE TABLE cust_arr_small PARTITION OF cust_active FOR VALUES FROM (MINVALUE) TO (101) PARTITION BY HASH(id);
CREATE TABLE
postgres=# CREATE TABLE cust_part11 PARTITION OF cust_arr_small FOR VALUES WITH (modulus 2, remainder 0);
CREATE TABLE
postgres=# CREATE TABLE cust_part12 PARTITION OF cust_arr_small FOR VALUES WITH (modulus 2, remainder 1);
CREATE TABLE
postgres=# CREATE TABLE cust_other PARTITION OF customers DEFAULT PARTITION BY RANGE(arr);
CREATE TABLE
postgres=# CREATE TABLE cust_arr_large PARTITION OF cust_other FOR VALUES FROM (101) TO (MAXVALUE) PARTITION BY HASH(id);
CREATE TABLE
postgres=# CREATE TABLE cust_part21 PARTITION OF cust_arr_large FOR VALUES WITH (modulus 2, remainder 0);
CREATE TABLE
postgres=# CREATE TABLE cust_part22 PARTITION OF cust_arr_large FOR VALUES WITH (modulus 2, remainder 1);
CREATE TABLE
postgres=#
postgres=# \d+ customers
Partitioned table "public.customers"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | text | | | | extended | |
arr | numeric | | | | main | |
Partition key: LIST (status)
Partitions: cust_active FOR VALUES IN ('ACTIVE', 'RECURRING', 'REACTIVATED'), PARTITIONED,
cust_other DEFAULT, PARTITIONED
postgres=# \d+ cust_active
Partitioned table "public.cust_active"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | text | | | | extended | |
arr | numeric | | | | main | |
Partition of: customers FOR VALUES IN ('ACTIVE', 'RECURRING', 'REACTIVATED')
Partition constraint: ((status IS NOT NULL) AND (status = ANY (ARRAY['ACTIVE'::text, 'RECURRING'::text, 'REACTIVATED'::text])))
Partition key: RANGE (arr)
Partitions: cust_arr_small FOR VALUES FROM (MINVALUE) TO ('101'), PARTITIONED
postgres=# \d+ cust_arr_small
Partitioned table "public.cust_arr_small"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | text | | | | extended | |
arr | numeric | | | | main | |
Partition of: cust_active FOR VALUES FROM (MINVALUE) TO ('101')
Partition constraint: ((status IS NOT NULL) AND (status = ANY (ARRAY['ACTIVE'::text, 'RECURRING'::text, 'REACTIVATED'::text])) AND (arr IS NOT NULL) AND (arr < '101'::numeric))
Partition key: HASH (id)
Partitions: cust_part11 FOR VALUES WITH (modulus 2, remainder 0),
cust_part12 FOR VALUES WITH (modulus 2, remainder 1)
postgres=# \d+ cust_part11
Table "public.cust_part11"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | text | | | | extended | |
arr | numeric | | | | main | |
Partition of: cust_arr_small FOR VALUES WITH (modulus 2, remainder 0)
Partition constraint: ((status IS NOT NULL) AND (status = ANY (ARRAY['ACTIVE'::text, 'RECURRING'::text, 'REACTIVATED'::text])) AND (arr IS NOT NULL) AND (arr < '101'::numeric) AND satisfies_hash_partition('16621'::oid, 2, 0, id))
Access method: heap
postgres=#
postgres=# INSERT INTO customers VALUES (1,'ACTIVE',100), (2,'RECURRING',20), (3,'REACTIVATED',38), (4,'EXPIRED',144);
INSERT 0 4
postgres=# SELECT tableoid::regclass,* FROM customers;
tableoid | id | status | arr
-------------+----+-------------+-----
cust_part11 | 1 | ACTIVE | 100
cust_part11 | 2 | RECURRING | 20
cust_part12 | 3 | REACTIVATED | 38
cust_part22 | 4 | EXPIRED | 144
(4 rows)
2.5 分区限制
自 PostgreSQL 10开始引入分区,后续不断提升功能并趋于稳定,但仍然有一些限制:
- 在分区表上的唯一约束必须包括所有分区键。较好方法是在每个分区上创建唯一约束,而不是在分区表上创建约束。
- 分区表不支持BEFORE ROW 触发器,需要须在各个分区上定义,而不是分区表。
- 范围分区不允许有null值。
- 当没有显式创建子分区时PostgreSQL不会自动创建子分区。因此如果有子分区,至少要有一个分区保存值。
3. 总结
本文介绍了PostgreSQL 分区表的优势,如何使用分区表,包括列表分区、范围分区、哈希分区以及三者的组合分区。
本文参考链接:https://blog.csdn.net/neweastsun/article/details/112444364