Call us Toll-Free:
1-800-218-1525
Email us

 Sponsors

Database Performance Optimization - Denormalization using MySQL Triggers

Mike Peters, 10-08-2007
Just about any site these days is powered by MySQL (or other relational) database, storing all information about your hits, customers, transactions, keywords etc.

Fetching information from your database is one of the most expensive operations you perform and it is the #1 reason for websites slowing down under high traffic.

In a typical configuration you are limited to a maximum of 50 database connections. This means if you have more than 50 concurrent users all trying to load pages (which fetch information from your database), your users will start to experience stalling as your database races to complete all requests.

When it comes to optimizing your website database performance, there are two areas you need to attack -

Front end database optimization:

This includes anything you can do to cache frequently accessed data so that there's no need to access the database at all:

* Get a dedicated web server (Your database will then automatically cache a higher number of tables and query results in memory)
* Install memCached and configure it to save frequently accessed data in memory
* Reduce the number of real-time database WRITE operations (INSERT, UPDATE and DELETE). Database READs (SELECT) are always cheaper.

While you're at it, you should probably also install eAccelerator to speed up your php code.

Back end database optimization:

If you've done a good job with your front-end database optimization, 90% of the database activity is going to take place offline, in the background, and not while the user is waiting for the page to load.

A properly designed website will run offline scripts to generate pages or text files storing processed data, later to be quickly retrieved by the user interface layer.

Depending on the type of your website service, you are eventually going to have to retrieve some data from the database.

Back end database optimization is the process of optimizing your database schema and queries, so that retrieving data is super fast and doesn't lock-up entire database tables.

As part of this post I would like to give an example of how we love to denormalize databases using MySQL triggers.

Database Denormalization - What is it?

Rookie database administrators are taught in school to always keep their databases normalized. A normalized database is one where you minimize duplication of data by keeping groups of data in separate tables and linking (using indexes) between the tables.

For example, as part of the SoftwareProjects CRM application, we store customer contact information in a 'customers' table, and financial transactions in a 'transactions' table. Both tables are keyed to a 'customer_id', allowing easy lookup of all transactions belonging to an individual customer.

Whenever we take on new clients, we typically find clean, normalized tables with multiple indexes connecting tables together. At first glance the concept of database normalization makes perfect sense - it's clean, easy to understand and makes it easy to keep data is in sync.

In the real world of high traffic websites and users demanding sub-second response times, database normalization is a fatal mistake.

A de-noramlized database is one where you keeps redundant frequently-accessed data, duplicated across several tables, making it possible to fulfill a user request without having to look at more than a single table.

Remember: The key to high performance database access is sticking to single-table SELECT queries with short indexes.

--

To further explain how denormalized databases work, I'll use a real life example from the SoftwareProjects Customer Relationship Management service:

Database Denormalization - Example

The SoftwareProjects CRM application allows our customers to view, search and manipulate segments of their customer base. The system was tested to run properly on a database of 100 million customers.

The main screen displays a list of customers, with some basic information about each customer:

Image

In the example above, I am searching for all customers whose name or email-address contains the string "estri".

The system prints out a mood-icon to indicate the customer state of mind, the customer name, contact information and - LifeTime Value as well as current Balance.

Earlier I explained we keep customer's contact information in a 'customers' table and financial transactions under a 'transactions' table, both keyed by 'customer_id'.

In a normalized database, in order to display the lifetime value for each customer, we will have to perform a query such as this:


$Result
= mysql_query("SELECT sum(amount) FROM transactions where customer_id=$customer_id AND type like '%deposit%'");
$Row = @mysql_fetch_row($Result);
$ltv = $Row[0];

In plain english - calculate the sum of all transactions for customer X, where the type of transaction is a deposit-transaction.

We are linking between the customers and transactions tables using the customer_id.

What's the big deal?

Imagine what it would take to display LifeTimeValue and Balance for 100 customers. Or what happens when our end-user wishes to sort all customers based on the LifeTimeValue, top to bottom?

Because our database is normalized, such an operation would require A LOT of database resources and the query is likely to use TEMPORARY or FILE SORT, resulting in a 10+ second response time for an average size customers database.

--

The "database ninja" way to handle this scenario, is to duplicate these two pieces of frequently accessed data (LifeTimeValue and Balance) from the 'transactions' table into the 'customers' table.

We're going to store the current LifeTimeValue and Balance for each customer under the 'customers' table, so that this information can be retrieved using a single-table select without requiring any additional computations.


mysql_query
("SELECT name,phone,ltv,balance FROM customers");

How do we easily maintain database integrity, so that the LifeTimeValue and Balance listed under the 'customers' table is always accurate, without having to remember to update these two fields everytime we add a new financial transaction? Meet MySQL Triggers.

MySQL Triggers

MySQL Triggers (starting from MySQL 5.0) allow us to define a rule to perform whenever a certain table is updated.

These two rules take care of updating the LifeTimeValue and Balance under the 'customers' table whenever transactions information change:


mysql_query
("CREATE TRIGGER balance_insert AFTER INSERT ON transactions FOR EACH ROW UPDATE customers SET customers.balance = customers.balance + NEW.amount, customers.ltv = customers.ltv + ((NEW.type like '%deposit%') * NEW.amount) WHERE customers.user_id=NEW.operator_id");

mysql_query("CREATE TRIGGER balance_delete AFTER DELETE on transactions FOR EACH ROW UPDATE customers SET customers.balance = customers.balance - OLD.amount,customers.ltv = customers.ltv - ((OLD.type like '%deposit%') * OLD.amount) WHERE customers.user_id=OLD.operator_id");

That's it. Now the database will automatically update the values of LifeTimeValue (ltv) and Balance for us and we can allow end-users to manipulate this information, while maintaining sub second response times.

I hope this brief introduction to database denormalization opened your eyes about the different things you can do to improve database performance.

If you have any specific question about optimizing your database, ask us! We're here to help.

Or use the comments section below to share what are YOU doing to improve your database performance.

Kris Cutler, 10-08-2007
Thank you for a super detailed explanation. You just got another RSS subscriber :-) and I'm going to have our DBA take a look at this

Tim, 10-09-2007
Great post. Lots of useful information... thanks

Lars Pohlmann, 10-09-2007
Normalization is fundamental to keep the integrity of your data. Just imagine, one trigger is not working properly and your duplicated data gets out of sync. Which table to trust in this case?

What you are doing there to speed up read-requests (having tables with aggregated data, populated by a trigger) is a good thing, you could also build temporary tables mimicking "materialized views" (known e.g. in oracle) but you should *always* have a normalized data-structure as the leading system in the background. So you have *one* source of data you can trust and can run synchronizations (if needed) to keep your data in sync.

Mike Peters, 10-09-2007
Kris & Tim - Thank you!

Lars - You're absolutely right about always knowing which fields to trust.

Whenever we denormalize we do so into "helper fields" such as the LifeTimeValue and Balance in the example above.

Denormalized values are only used for speeding up the user interface (when generating an invoice for example, the code still looks at the source data). As for disaster recovery - it's very easy to write a script that would rebuild all denormalized values from the source.

Oracle materialized views and MySQL views (as of MySQL 5.0) are never as fast as storing all processed data in a single table (single-table selects) and just about every high-performance system out there is using this approach to speed up database performance.

Dragon Yurgen, 10-09-2007
Subscribed to your RSS feed! It's cool that I can get new posts delivered by email, I didn't know you could do that with rss.

Chase Saunders, 10-10-2007
This is called caching - it requires extra work and complication of the model in order to get efficiency. Fine in theory... though in practice most folks who end up trying to out-optimized RDBMS's end up paying for it (e.g. you are still doing all the same work but now slicing it across one transaction per item... would be less efficient, not more, in a write heavy environment).

You are NOT talking about denormalization. I have seen so many people "declare" that normalization is bad for efficiency reasons and then shot down that I don't even want to argue the point here. I'd encourage anyone who reads this to look at the real academic literature on relational databses. Short version: denormalization is terrible for efficiency in the general case, and normalization makes RDBMS's more efficient, not less

Dawn Rossi, 10-10-2007
Hey Chase - Are you having a bad day today?

According to Wikipedia - "Denormalization is the process of attempting to optimize the performance of a database by adding redundant data. A normalized design will often store different but related pieces of information in separate logical tables (called relations). If these relations are stored physically as separate disk files, completing a database query that draws information from several relations (a join operation) can be slow. If many relations are joined, it may be prohibitively slow. "

The described solution would work great in any "many reads / few writes" environment. Oracle is slightly better than MySQL when it comes to separating logical data (normalized) from physical and supports a higher degree of tuning, but I've seen this approach implemented across the board.

Chase Saunders, 10-11-2007
First off, sorry if my original post came off as rude. The type of denormalization described in Wikipedia is a more general case... one in which we are attempting to avoid a JOIN to boost speed. The article is correct to use the word "attempting" Do a test for yourself... you'll see that it's very difficult to save any more than microseconds even on very large operations. And in doing so, you pay a major price in terms of memory usage. If you were actually working with large data sets you would need a lot of deep skilz to ensure that you weren't jumping out of the frying pan into the fire... especially given all the application logic that must be updated to accomdate the denormalized design.

As far as the optimzation described, it's simply caching a value. IMO this cannot properly be described as denormalization. I'm sure it was warranted in the situation the author described, however I think the author was remiss to advise readers to denormalize without qualifying the extreme risks one takes in doing this, and the deep level of understanding needed to do it properly in the general case.

Mpume, 10-13-2007
Good article. I have been thinking of doing the same thing. Question I have and have not found any information on is, "What are the performance implications, if any, of having triggers?"

Mike Peters, 10-15-2007
Mpume,

Good question!

Triggers are executed BEFORE or (directly) AFTER an Insert/Update/Delete operation. This means that if the cost of an Insert operation is O(1) and you have a MySQL trigger on table Y that does another INSERT to table X whenever you update table Y, your cost of Insert into table Y will now be O(2)

In most "few writes / many reads" environments this is unnoticeable and the benefits of de-normalization for speeding up "read"s far outweigh the cost of the additional op.

If, on the other hand, you operate in a "many writes / few reads" environment, stay away from MySQL triggers. Look into offline processing, creating standalone cache tables on a fixed schedule (as opposed to at the time of insert) and finally if nothing helps, consider switching to a different storage platform altogether.

Chris Pizzi, 08-25-2008
I know this is an old thread but I feel compelled to comment. As an Oracle developer I take exception to, "Oracle materialized views... are never as fast as storing all processed data in a single table." Oracle materialized views are in fact stored as single tables. I use them for precisely the reasons advocated in this article.

puukuula, 11-12-2008
Now the question is whether to do this on application logic (create de-normalized table for fast reads), or let mysql triggers to do it.

Mahmoud, 01-27-2012
the best article I have ever read about denormalization
thanks for your time:)
Enjoyed this post?

Subscribe Now to receive new posts via Email as soon as they come out.

 Comments
Post your comments












Note: No link spamming! If your message contains link/s, it will NOT be published on the site before manually approved by one of our moderators.



About Us  |  Contact us  |  Privacy Policy  |  Terms & Conditions