Sunday, February 9, 2014

Introducing Trite: A tool for automating import of InnoDB tablespaces

Mysqldump is a fantastic tool for backing up and restoring small and medium sized MySQL tables and databases quickly. However, when databases surge into the multi-terabyte range restoring from logical backups is inefficient. It can take a significant amount of time to insert a hundred million plus rows to a single table, even with very fast I/O. Programs like MySQL Enterprise Backup and Percona XtraBackup allow non-blocking binary copies of your InnoDB tables to be taken while it is online and processing requests. XtraBackup also has an export feature that allows InnoDB file per tablespaces to be detached from the shared table space and imported to a completely different MySQL instance.

The necessary steps to export and import InnoDB tables are in the XtraBackup documentation here. Although fairly simple and straight forward it requires a bit too much typing if you want to move more than a couple of tables. To automate the process I created a new open source project named Trite.

Trite stands for TRansport Innodb Tables Efficiently and is also a nod to the repetitive manual steps required to move tablespaces. Trite is a client/server written in 100% Go that leverages XtraBackups's import/export feature and aids a DBA in creating, recovering or refreshing databases. The initial problem that led me to create Trite is a common development database environment where I work. Multiple applications share the same database servers which requires each applications data be refreshed at different intervals. Before Trite we used mysqldump and it took the majority of a weekend for the data refresh to complete, even with good hardware and my.cnf's tuned for performance over crash resistance. With Trite we get the same results in a fraction of the time!

Trite has three modes of operation: dump, server and client. Dump mode writes out create statements for schemas, tables, stored procedures, functions, triggers and views. Server mode makes the XtraBackup and dump create files accessible via HTTP. Client mode downloads the create statements and XtraBackup files from a Trite server to recreate database objects.

Twelve gigabytes of IMDB data will make a suitably sized database for an example. The hardware used is a woefully under powered laptop; 4 gigs of RAM, 32bit CPU, slow SATA hard drive. Before each data load the imdb schema was removed and MySQL restarted to wipe caches. It took just over two hours to restore from the dump file. This is with innodb_doublewrite turned off to try and speed up all the inserts.

jprunier@beefeater:~/Downloads$ ls -alh imdb.sql
-rw-rw-r-- 1 jprunier jprunier 4.8G Jan 24 18:41 imdb.sql

jprunier@beefeater:~/Downloads$ time mysql -uroot -p < imdb.sql
Enter password: 

real 130m55.167s
user 2m16.052s
sys 0m9.020s

jprunier@beefeater:~/Downloads$ sudo du -sh /var/lib/mysql/imdb
[sudo] password for jprunier: 
12G /var/lib/mysql/imdb

A binary backup of the database took 8 minutes with XtraBackup.

root@beefeater:~# innobackupex --user=root --password= /root

... output omitted ...

innobackupex: Backup created in directory '/root/2014-01-25_14-22-06'
140125 14:30:29  innobackupex: Connection to database server closed
140125 14:30:29  innobackupex: completed OK!

real 8m23.118s
user 0m42.076s
sys 0m31.524s

And 16 seconds to prepare the tables for export. This will take longer depending on the amount of changes the database receives during the backup. It can sped up by giving the process more memory.

root@beefeater:~# time innobackupex --apply-log --use-memory=1G --export /root/2014-01-25_14-22-06

... output omitted ...

140125 14:33:03  InnoDB: Starting shutdown...
140125 14:33:07  InnoDB: Shutdown completed; log sequence number 70808870412
140125 14:33:07  innobackupex: completed OK!

real 0m16.310s
user 0m0.900s
sys 0m0.620s

A dump of the database objects should be done as close to the backup as possible. If the Trite create statement from a dump differs significantly from the copy of table taken with XtraBackup the table may fail to import.

root@beefeater:~# ./trite -dump -user=root -password= -host=localhost
Dumping to: /root/localhost_dump20140125173337

Enter password: 

imdb: 21 tables, 0 procedures, 0 functions, 0 triggers, 0 views

22 total objects dumped

Total runtime = 2.19331264s

In this example the Trite server is being run on the same machine as the database being restored. If your database is very large the XtraBackup, create dumps and Trite server will probably be on a different host.

root@beefeater:~# screen -S trite_server
root@beefeater:~# ./trite -server -dump_path=/root/localhost_dump20140125173337 -backup_path=/root/2014-01-25_14-22-06

Starting server listening on port 12000

The Trite client was able to restore the imdb database in 13 minutes. Much faster than doing all those inserts over. You can run the client with multiple worker threads, the default is single threaded. The optimal number of workers depends on the size of your tables, how fast your disks are and how fast your network is if the server is remote. In the case of the laptop, multiple workers actually makes the restore time slower.

root@beefeater:~# ./trite -client -user=root -password= -socket=/var/lib/mysql/mysql.sock -server_host=localhost
Enter password: 
imdb.link_type has been restored
imdb.comp_cast_type has been restored
imdb.movie_info has been restored
imdb.aka_name has been restored
imdb.complete_cast has been restored
imdb.movie_link has been restored
imdb.company_name has been restored
imdb.char_name has been restored has been restored
imdb.person_info has been restored
imdb.keyword has been restored
imdb.cast_info has been restored
imdb.role_type has been restored
imdb.aka_title has been restored
imdb.kind_type has been restored
imdb.movie_info_idx has been restored
imdb.movie_keyword has been restored
imdb.company_type has been restored
imdb.info_type has been restored
imdb.title has been restored
imdb.movie_companies has been restored
Applying triggers for imdb
Applying views for imdb
Applying procedures for imdb
Applying functions for imdb

Total runtime = 13m3.881954844s

Some of the features I am planning on adding next are compression for transfer across slow networks and replication support so only objects conforming to a slaves replication filters will be applied.

Trite does have some limitations. The full list can be found on the GitHub page but the top ones are:
  • InnoDB file per table is required.
  • The database you are restoring to must be running Percona Server 5.1 or 5.5.
  • Windows is not supported since Percona Server and XtraBackup are required.
If you think Trite might benefit you and your organization give it a try and send me any feedback you might have. I have tried hard to make Trite efficient and bug free but there might be issues when used with certain configurations. Be sure to test Trite on non-critical databases before use in production. If you encounter any bugs please open an issue via GitHub.

Sunday, June 2, 2013

DDL statements in MySQL 5.x with row-based replication

In the replication topology I manage there are many layers of replication filters that prune data at the database and in a few places table level. The way MySQL replicates Data Definition Language (create, alter, drop) statements differs from how Data Manipulation Language (insert, update, delete) statements are handled with row-based replication. I often need to fix broken replication due to a lack of understanding of these subtle differences.

With row-based replication DML statements focus directly on the table being modified. DDL on the other hand always uses statement-based replication and is tied to what is known in MySQL as the "default database". The default database is the schema/database currently in use when a DDL statement is executed. The only information I was able to find regarding this in the MySQL documentation is the second to last paragraph here. If do-db or ignore-db filters are configured on slaves, replication can break when DDL statements are used in certain scenarios. Table level or wildcard replication filters can also add to the potential breakage but I will save that for a separate blog post.

A few examples should make things clear. The slave in the following examples is configured with "replicate-ignore-db = test". That means it will replicate all objects from its master except for anything in the "test" schema/database. The test environment is reset between examples. I have changed the prompt via the my.cnf to show the default database and statement counter: prompt = '\h(\d):\c> '

Example #1

On the master we use the test schema and create a table named ddl_test.

The ddl_test table does not replicate to the test schema which is the expected behavior.

Example #2

This time on the master we use the world schema, create the ddl_test table in the test schema by way of a fully qualified table name and insert a row to it.

In this case the table creation is replicated because create is a DDL statement and the world schema was the default database when the statement was issued. Even though the default database was world, an insert does not replicate to the slave because it is a DML statement and the slave is configured to ignore events on objects in the test schema.

Example #3

Now we create the ddl_test table while test is the default database, then alter the table to add a column while world is the default database.

Ouch! The result is broken replication on the slave. The reason is the original table create did not replicate but the additional alter statement does.

The error message shows exactly what went wrong. Table does not exist is pretty clear. The default database is world, the query references the test.ddl_test table and shows the alter table command that was attempted. Show slave status output also shows us the Replicate_Ignore_DB configuration has been set on the slave to ignore all DDL and DML statements against the test schema.

The next course of action is to get replication going again. We have deduced the alter table statement should not have replicated so we simply skip the event using the sql_slave_skip_counter variable.

Not directly related to the theme of this post but worth mentioning is the unique schema in MySQL known as none or null. This schema can not be used directly since it is the absence of having a default database. The only way I know of acquiring null schema is to connect to mysql without the -D option or by dropping the schema that is currently in use. How DDL and DML statements are handled remains the same with or without a default database.

Hopefully the examples above make the rules that govern replication of DDL and DML statements clearer. The potential impact on a dba really depends on the size and configuration of their replication topology. The more databases and filters in place to limit and/or direct what is replicated, the greater the chance replication can break. Number of users manipulating database objects, their knowledge of MySQL and the replication topology they are working within is also a big factor.

Monday, May 13, 2013

Delayed row-based replication with large tables lacking a primary key

I configure all our master databases to use row-based binary logging where I work. In my opinion it is a much safer option than statement-based replication. The advantages and disadvantages of both types of MySQL replication are detailed in the online documentation here. You can't view the events a slave is applying directly with 'show processlist' but by issuing 'show open tables where in use' you can detect what table is receiving the attention of the SQL thread. If you need more information the mysqlbinlog command must be used to decode the slaves relay logs or masters binary logs.

Our developers often change a lot of rows with a single update statement. This usually results in some reasonable replication lag on downstream slaves. Occasionally the lag continues to grow and eventually nagios complains. Investigating the lag I sometimes discover the root of the problem is due to millions of rows updated on a table with no primary key. Putting a primary key constraint on a table is just good practice, especially on InnoDB tables. This is really necessary for any large table that will be replicated.

To show what replication is actually doing I have cloned the table, omitting the indexes, inserted all its rows and finally updated the description column for all 1,000 rows in the new table.

I have edited the output of mysqlbinlog to only show the entries related to the first row created by the insert and update statements above. The @ symbol followed by a number is a mapping to the columns in the table. So @1=film_id, @2=title, @3=description and so on. Note that the update statement records a before and after picture of the row. This is can be used in a pinch to fix mistaken updates if the damage is small instead of having to restore from backups.

So row-based replication is performing as named and creating a binary log event for each row affected. My single insert and update statement on the master then became 1,000 separate events on the slave.

Digging in to the MySQL source code I was unable to confirm exactly how the SQL thread applies relay log events on a slave. I assume it is similar to what happens when a normal update statement is executed on a table with no indexes. The server must perform a full table scan to locate a single row. For a table with a million plus rows a million full table scans is expensive! A primary key or suitable unique index will prevent this type of problem.