PostgreSQL: Techniques in production

you Can read a lot of books on databases, write a bunch of apps to outsource or for yourself. But it is impossible not to step on the same rake, when working with really large databases/tables especially when downtime on a big project I want to minimize, and even better to completely avoid. Here, the simplest operations, such as changing the table structure can become more challenging. The most interesting cases, problems, rake and solutions from personal experience that we are on the project Pushwoosh had to face are described below. In the article, no pretty pictures, but there are a lot of dry text.

image


Add a new column to an existing table


It seems to be a common operation, you need to make a new feature or extend already existing table. But what if the table is in production 50Gb of data, and every second several thousand requests from live clients, for clients it needs to be discreet.

Task: add a new column to an existing table in production.
Problem: add a new column to the blocks table.
Solution: adding a new column does not lock the table if DEFAULT is omitted or NULL uses DEFAULT.

For those who are wondering, you can read the full documentation .
excerpt from the documentation
When a column is added with ADD COLUMN, all existing rows in the table are initialized with the column''s default value (NULL if no DEFAULT clause is specified). If there is no DEFAULT clause, this is merely a metadata change and does not require any immediate update of the table's data; the added NULL values are supplied on readout, instead.

Adding a column with a DEFAULT clause or changing the type of an existing column will require the entire table and its indexes to be rewritten. As an exception when changing the type of an existing column, if the USING clause does not change the column contents and the old type is either binary coercible to the new type or an unconstrained domain over the new type, a table rewrite is not needed; but any indexes on the affected columns must still be rebuilt. Adding or removing a system oid column also requires rewriting the entire table. Table and/or index rebuilds may take a significant amount of time for a large table; and will temporarily require as much as double the disk space.


But what to do if you want to initialize a new column with a default value? In this case you can write a script that will go to table and change value of field small portions of the records.

adding a new index to an existing table


Now we come to the Department of Analytics and want to receive their interesting data every hour, for that we need a new sql query, but it's tricky, and the index we have.

Task: add an index to an existing table in production
Problem: add index block entry(insert/update/delete) into the table. To read from such a table is still possible.
Solution: use CREATE INDEX CONCURRENTLY, which doesn't lock the table, but works twice as long and requires more system resources.

For those who are interested, you can read the full dokumentaciju.

excerpt from the documentation
Creating an index can interfere with regular operation of a database. Normally PostgreSQL locks the table to be indexed against writes and performs the entire index build with a single scan of the table. Other transactions can still read the table, but if they try to insert, update, or delete rows in the table they will block until the index build is finished. This could have a severe effect if the system is a live production database. Very large tables can take many hours to be indexed, and even for smaller tables, an index build can lock out writers for periods that are unacceptably long for a production system.

PostgreSQL supports building indexes without locking out writes. This method is invoked by specifying the CONCURRENTLY option of CREATE INDEX. When this option is used, PostgreSQL must perform two scans of the table, and in addition it must wait for all existing transactions that could potentially use the index to terminate. Thus this method requires more total work than a standard index build and takes significantly longer to complete. However, since it allows normal operations to continue while the index is built, this method is useful for adding new indexes in a production environment. Of course, the extra CPU and I/O load imposed by the index creation might slow other operations.


Monitoring of current calls to the database


the the the
pid integer Process ID of this backend
query_start timestamp with time zone Time when the currently active query was started, or if state is not active, when the last query was started
query text Text of this find out s most recent query. If state is active this field shows the currently executing query. In all other states, it shows the last query that was executed.

In the end, once we have a start time of the query, query_start, you can easily calculate how the query works fine and sort the queries by execution time.

the
pushwoosh=# select now() - query_start, pid, waiting, query from pg_stat_activity where state != 'idle' order by 1 desc;
?column? | pid | waiting | query
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
23:42:13.468115 | 6877 | f | DELETE FROM application_goals WHERE applicationid = '9254'
00:30:51.943691 | 24106 | f | SELECT applicationid, pushtoken FROM application_devices WHERE hwid in ('1abd5e5fd79318cd','1abd5f3eda7acbca','1abd601f2bafabf8','1abd62a0a092ac17','1abd6303b83accf9','1abd64726a98fb63','1abd676c087c3617','1abd67ebecb6f3ce','1abd68a3b78fb730','1abd697af6bc8552','1abd70ebb654aeb2','1abd7114a8576a67','1abd729a385caff8','1abd731ff62c4521','1abd738bd2d457eb','1abd7760f7210155','1abd79dbc085c2c0','1abd7ab46dc24304','1abd7d48bd5e04ab','1abd7e7aee3c0e58','1abd7e8129a53ab3','1abd827c8c21630','1abd82cd204c69a9','1abd843ee3dedb1','1abd88d346c74d67','1abd88e8bd01c168','1abd8ceac00808cc','1abd8d3b2cb72de3','1abd8e139f267260','1abd8e74a288204c','1abd8f00bb4a0433','1abd8fd7e8f4f125','1abd91c193455ada','1abd92448396a9bf','1abd946ac4cf0e22','1abd9594ed1bd791','1abd96cc0df2202b','1abd975a98849a0b','1abd997c96d3c9b1','1abd9b3cfb66852c','1abd9bead472be5','1abd9f5bed3cbbd8','1abd9f73b8122bf1','1abda233b9a00633','1abda2ee3db5bccb','1abda486901c3a14','1abdac09e0e3267b','1abdae8235cf19dd','1abdaf9e3a143041','1abdb54fe96'
00:04:49.592503 | 18899 | f | autovacuum: ANALYZE public.device_tags_values
00:00:00.040265 | 11748 | f | INSERT INTO device_tags_values (hwid,valueid) VALUES ('27976b81cc72c7ac','8470317') RETURNING uid


For PosgreSQL 9.1, the query can be rewritten in the following way
the
select now() - query_start, procpid, waiting, current_query from pg_stat_activity where current_query != '<IDLE>' order by 1 desc;


From this output we see that the process with PID 6877 om for almost a day takes the request that is most likely not optimal and needs more detailed profiling. So we see that the second query is executed half an hour and probably also not optimal, but we do not see the query completely, it is cut, and we also interesting the complete request.
Task: see what queries are currently performed in the database.
Problem: pg_stat_activity shows current requests are not completely(cuts).
Solution: gdb.

Perhaps this problem has a solution easier, but we never found him. Take the PID from the query above and connect to it
the
gdb [path_to_postgres] [pid]

and once connected to the process running
the
printf "%s\n", debug_query_string


In our case
bash$ gdb postgres 24106
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.el7
Copyright 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<www.gnu.org/software/gdb/bugs>...
Reading symbols from /usr/bin/postgres...Reading symbols from /usr/bin/postgres...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Attaching to program: /bin/postgres, process 24106
Reading symbols from /lib64/libxml2.so.2...Reading symbols from /lib64/libxml2.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
...
(gdb) printf "%s\n", debug_query_string


Slow log


The database always need to be monitoring to see which queries are executed slowly. You can solve this code, and ORM or somewhere deeper to measure the time of the request and if it is greater than threshold, record the request in the log. But it is always better not to write the Bicycle.
Task: Monitor slow queries
Problem: I Want to do this at the database level
Solution: log_min_duration_statement

Setting log_min_duration_statement specified in milliseconds, and logging all queries to the database that were played longer than the specified value.

Let's fix that PostgreSQL config vim /var/lib/pgsql/9.4/data/posgresql.conf and put in it 3 seconds as the threshold
the
log_min_duration_statement = 3000 # -1 is disabled, 0 logs all statements
# and their durations, > 0 logs only
# statements running at least this number
# of milliseconds


For the changes to take effect, restart the database, simply run the command from psql, pgadmin or other interface to the database
the
SELECT pg_reload_conf();


or run from the command line
the
su - postgres
/usr/bin/pg_ctl reload


You need to remember that some of the parameters in the configuration file will take effect only after you restart the database.

And after that you can see in PostgreSQL log that is on this path /var/lib/pgsql/9.4/data/pg_log/postgresql-2015-07-07.log and you can see that there is a request which is executed almost 6 seconds.
the
2015-07-07 09:39:30 UTC 192.168.100.82(45276) LOG: duration: 5944.540 ms statement: SELECT * FROM application_devices WHERE applicationid='1234' AND hwid='95ea842e368f6a64' LIMIT 1


As an option in the future to monitor the log file you can make a bunch of logstash+elasticsearch+kibana and immediately send via zabbix notice of appearance of slow queries, if it is critical for the project.

to See which queries in the moment makes the process for production


If you have a lot of demons that often communicate with the database, and one day the demon began to run slowly or it was not clear what he's doing, then come to the aid of strace, which will show queries and execution time, without stopping the process, adding logs to the program and waits for the next occurrence of the problem — no matter what you write php, python, ruby, etc. — strace is suitable for all.
Task: find out what makes the process(as an example which sends queries to the database)
Problem: the process cannot be interrupted or stopped.
Solution: strace

It's enough to take the pid of the process to specify the length and add the option -T. In the end, the output of strace can be about such

strace -p 27345 -s 1024-T 2> out
the
gettimeofday({1437846841, 447186}, NULL) = 0 < 0.000004 > 
sendto(8, "Q\0\0\0005SELECT * FROM accounts WHERE uid='25143' LIMIT 1\0", 54, MSG_NOSIGNAL, NULL, 0) = 54 <0.000013>
poll([{fd=8, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=8, revents=POLLIN}]) <0.000890>


NULL and unique indexes


This example has nothing to do with production environment. Let's start with the simple fact NULL is unequal to NULL.
Documentation
Do not write expression = NULL because NULL is not equal to NULL. (The null value represents an unknown value, and it is not known whether two unknown values are equal.) This behavior conforms to the SQL standard.

Suppose we want to create a composite unique index in which one of the fields can be NULL and can be a number. In this case, a unique index doesn't work for fields containing NULL, but I really want to filter these records by index.
Documentation
When an index is declared unique, multiple table rows with equal indexed values are not allowed. Null values are not considered equal. A multicolumn unique index will only reject cases where all indexed columns are equal in multiple rows.


Consider the example

the
psql=# create table test (
psql(# a varchar NOT NULL,
psql(# b varchar default null
psql(# );
CREATE TABLE
psql=# create unique index on test (a, b);
CREATE INDEX
psql=# insert into test values (1, null);
INSERT 0 1
psql=# insert into test values (1, null);
INSERT 0 1
psql=# select * from test;
a | b
---+---
1 |
1 |
(2 rows)

Despite the fact that we have created a unique index record (1, null) insert twice.

Task: to make a composite unique index, one of the fields can be null
Problem: a unique index does not compare null fields
Solution: split the index into 2 different index.

To avoid this behavior, you can split the index into 2 indices.
the
sql=# create table test (
sql(# a varchar NOT NULL,
sql(# b varchar default null
sql(# );
CREATE TABLE
sql=# create unique index on test (a, b) where b is not null;
CREATE INDEX
sql=# create unique index on test (a), where b is null;
CREATE INDEX
sql=# insert into test values (1, null);
INSERT 0 1
sql=# insert into test values (1, null);
ERROR: duplicate key value violates unique constraint "test_a_idx"
DETAIL: Key (a)=(1) already exists.


manage duplicates


This example also has nothing to do with production environment.
Task: to be able to handle duplicates that are not allowed to create a unique index.
Problem: Need to catch acceptany, parse error codes
Solution: keep calm and catch the exception and wait until PostgreSQL 9.5 which will upsert
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

The use of Lisp in production

FreeBSD + PostgreSQL: tuning the database server

Habr dying?