FreeBSD + PostgreSQL: tuning the database server

Hello, Charcoaldust!

Probably, my article will be of no interest hardened sysadmin and copy-paste seem. But I'm addressing it to those who, like me, being the only developer for the first time faced with the need also to administrate the server, thus solving the problem of highly loaded database. And to Google you are not cursed, try to gather in one place the basic techniques for acceleration of the database server, which I have successfully managed to implement.

The input data my problem is the following: dual processor (Intel Xeon) machine, 8 hard drives at 500GB and 12Gb of RAM. And complete, including physical, access to this good. Task: to arrange a fast database server based on FreeBSD and PostgreSQL.

1. RAID


The correct partitioning of the available hard drives on raids we need to be PostgreSQL, as tablespacing (more on this below). Your 8 hard I broke into pairs and organized in this way: two pairs unite in RAID1 and two pairs in RAID0 (actually, for our purposes, need a minimum of 6 hardy — two pairs unite in RAID1, the other 2 leave as is). In the presence of a larger number of hard you can come up with something more reliable, like RAID5, RAID10, etc., but it is likely that the work is a bit slower. I will not go into details how to organize raids, because in the gland I am not strong, I'll just say that no controllers were not touched, because on the server after the BIOS loading utility that allows you to do it programmatically.

2. OS installation, DB server and use your core


First, just put prahu on the first RAID1. I put the FreeBSD 8.2 Release AMD64 with all files. 64-bit version is necessary to make the system "see" all of the RAM.

Now the most interesting: why do we compile kernel and what settings to change? This is necessary to allow the PostgreSQL server to use as many resources as you need for high loading. So, what are the database settings we are interested in. the book, Alexey Vasiliev "Working with Postgresql. Setup, zoom" recommended the following options for heavy database (the postgresql.conf):
the
    the
  • shared_buffers = 1/8 of RAM or more (but no more 1/4);
  • the
  • swork_mem in 1/20 RAM;
  • the
  • smaintenance_work_mem to 1/4 of RAM;
  • the
  • smax_fsm_relations in the planned number of tables in the databases * 1.5;
  • the
  • in max_fsm_pages max_fsm_relations * 2000;
  • the
  • fsync = true;
  • the
  • wal_sync_method = fdatasync;
  • the
  • commit_delay = 10 to 100;
  • the
  • commit_siblings = 5 to 10;
  • the
  • effective_cache_size = 0.9 from value cached, which shows free;
  • the
  • random_page_cost = 2 for a fast cpu, 4 for slow;
  • the
  • cpu_tuple_cost = 0.001 for fast cpu, 0.01 for the slow;
  • the
  • cpu_index_tuple_cost = 0.0005 for fast cpu, 0.005 for the slow;
  • the
  • autovacuum = on;
  • the
  • autovacuum_vacuum_threshold = 1800;
  • the
  • autovacuum_analyze_threshold = 900;

These options really suit us, but two:

1) the Maximum number of connections

Depends on the situation. I have a running script in the crown (to connect to the database and enters the data), I figured that should be enough 256:
the
    the
  • max_connection = 256;

But setting determines the default FreeBSD configuration does not provide a value for the number of connections. If you set this value and tries to run the demon Postgres, it won't work. Need to increase the corresponding system settings. To do this, and build your core. Take setting determines the default config of the GENERIC kernel, make a copy with the name KERNEL_MAX_PERF, edit KERNEL_MAX_PERF as follows: change the number of semaphores, adding to the default options line:
options SEMMNI=512
options SEMMNS=1024
options SEMUME=64
options SEMMNU=512

(this is the value for max_connection = 256).

2) the Maximum amount of RAM, which can take a PostgreSQL (this is important when bulk requests). Is the shared_buffers parameter in postgresql.conf. About the meaning for this value there are different recommendations. I came to the conclusion that if it is a dedicated database server, it is possible for one process to give almost the entire amount of RAM minus what the system needs for its own needs. I have allocated 8Gb of the 12. In order that the system allowed us to set the desired value for shared_buffers, at the core, you must change the option SHMMAXPGS, the value of which is calculated by the formula:
SHMMAXPGS = shared_buffers / PAGE_SIZE

in my case, shared_buffers = 8Gb, PAGE_SIZE = 4K for all i386, so
SHMMAXPGS = 8 * 1024 * 1024 / 4 = 2097152); can now record parameter SHMMAX (calculated dynamically in the kernel). So, I write in the config of the kernel:
options SHMMAXPGS = 2097152
options SHMMAX = "(SHMMAXPGS*PAGE_SIZE + 1)"


It remains to compile the kernel with config KERNEL_MAX_PERF. The procedure of compiling the kernel is simple, here I'll refer you to the official mana.

Bootable OS with its kernel, installed the latest version of PostgreSQL (I have had this version 9.0.4), the first to test start setting determines the default PostgreSQL config. If everything is OK, change the settings in postgresql.conf as stated above, do a restart PostgreSQL. Start — go further.

Note: if for any reason failed to compile kernel with set parameters, it is possible to register them in sysctl.conf:
kern.ipc.shmall=2097152
kern.ipc.shmmax=8589938688
kern.ipc.semmap=256

and run prahu setting determines the default with the GENERIC kernel.


3. Tablespacing


Tablespacing — is the ability of PostgreSQL to determine the file system location where to store the files representing database objects. Simply put, if we shall place tables, indexes and logs on different disks, the read/write data will be faster than if it was all on one disc.

Here we need our raids. Let me remind you that we have four sections: two RAID1 and two RAID0. On the first RAID1 we have installed OSes and Postgres. On the second RAID1 will store our database. Suppose that it is mounted as /disk1. On the first RAID0 will store indexes. Let him be mounted to the file system as /disk2. The logs leave the second RAID0, we assume that it is mounted as /disk3.

You must do the following steps:
    the
  1. to create folder under the tables, indexes and log:
    #mkdir -p /disk1/postgresql/tables
    #mkdir -p /disk2/postgresql/ind
    #mkdir -p /disk3/postgresql/log
  2. the
  3. to make owner postgres for these folders, and the rest to take away all rights (remember that postgres is the user that starts when you install PostgreSQL, if an installation to produce a standard way according to the official Manu):
    #chown-R postgres /disk1/postgresql/tables /disk2/postgresql/ind /disk3/postgresql/log
    #chmod-R go-rwx /disk1/postgresql/tables /disk2/postgresql/ind /disk3/postgresql/log

  4. the
  5. go to client psql under postgres and create two tablespace:
    CREATE TABLESPACE space_table LOCATION '/disk1/postgresql/tables';
    CREATE TABLESPACE space_index LOCATION '/disk2/postgresql/ind';

  6. the
  7. if owner your DB is not postgres, but, for example, myuser, you need to give the user myuser rights on the created tablespace (can also be performed in the client):
    GRANT CREATE ON TABLESPACE space_table TO myuser;
    GRANT CREATE ON TABLESPACE space_index TO myuser;

  8. the
  9. now under myuser 'om, you can change tablespace for tables and indexes:
    ALTER TABLE mytable SET TABLESPACE space_table;
    ALTER INDEX mytable SET TABLESPACE space_index;

  10. the
  11. to stop the demon Postgres, move the folder with the log and make a symbolic link to it:
    #/usr/local/bin/rc.d/postgres.sh stop
    #mv /usr/local/pgsql/data/pg_xlog /disk3/postgresql/log
    #cd /usr/local/pgsql/data
    #ln-s /disk3/postgresql/log/pg_xlog

    Run Postgres:
    #/usr/local/bin/rc.d/postgres.sh start

    If done correctly, the daemon should start.


4. Partitioning


Protezirovanie is a logical splitting a large table into small physical pieces. This can significantly speed up the query execution time if the table is really big.

I have a fairly typical situation: in a crone the script is running, collecting statistics on a certain dimension. On the web interface the user can view this statistic. A week in inserteda table about 10 million rows. If all write to the same table, it will curse you. This will all work terribly slow.
Let's try to split this table into pieces, taking the criterion of splitting time. In this case, when the user wants to see the statistics, but to see it only for a certain time period, the database server when prompted to have the wool not the whole big table and a few small fall in the selected time period.

Unfortunately, in PostgreSQL protezirovanie not implemented at the DB level, so will have to make the handles using property inheritance tables.

So, we have a table measure_data_master where we write their dimensions. For example, as the time period suits one week. Let's go:
    the
  1. to a master table measure_data_master NOT to do any of the integrity constraints check and do NOT create indexes
  2. the
  3. in the config of postgresql.conf edit option:
    constraint_exclusion = on

  4. the
  5. to create a table-descendants of the form:
    CREATE TABLE measure_data_y2011m06d06 (CHECK(measure_time = DATE '2011-06-06' AND measure_time DATE '2011-06-13')
    ) INHERITS (measure_data_master);

  6. the
  7. to create indexes for tables-descendants:
    CREATE INDEX measure_data_y2011m06d06_key ON measure_data_y2011m06d06(measure_time);

  8. the
  9. it is necessary that when you insert a new row, it was recorded in the appropriate table-descendant. For this we will create the trigger function:
    CREATE OR REPLACE FUNCTION measure_insert_trigger()
    RETURNS TRIGGER AS $$
    BEGIN
    IF(NEW.measure_time >= DATE '2011-06-06' AND
    NEW.measure_time < DATE '2011-06-13') THEN
    INSERT INTO measure_data_y2011m06d06 VALUES(NEW.*);
    ELSIF(NEW.measure_time >= DATE '2011-06-13' AND
    NEW.measure_time < DATE '2011-06-20') THEN
    INSERT INTO measure_data_y2011m06d13 VALUES(NEW.*);
    .....................................
    ELSIF(NEW.measure_time >= DATE '2011-12-19' AND
    NEW.measure_time < DATE '2011-12-26') THEN
    INSERT INTO measure_data_y2011m12d19 VALUES NEW.*);
    ELSE
    RAISE EXCEPTION 'Date out of range.Fix the measure_insert_trigger() function!';
    END IF;
    RETURN NULL;
    END;
    $$
    LANGUAGE plpgsql;

  10. the
  11. and the trigger that calls the function:
    CREATE TRIGGER insert_measure_trigger
    BEFORE INSERT ON measure_data_master
    FOR EACH ROW EXECUTE PROCEDURE measure_insert_trigger();


Of course, to write such large queries uncomfortable. I wrote a php script that creates the tables and everything that they need for the year ahead.

Here, perhaps, and all I wanted to tell. If you share your experience from this area, I would be very grateful.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Why I left Google Zurich

2000 3000 icons ready — become a sponsor! (the table of orders)