Advertisment

Optimize Your MySQL Database

author-image
CIOL Bureau
Updated On
New Update

NEW DELHI - INDIA : MySQL is a pretty fast database and serves most requirements for a decent sized Web application. But the dragon starts raising its head, when the number of records grow. In this series we will look into various ways of optimizing the MySQL database so as to tame the dragon from time to time.

Advertisment

The simplest optimization in MySQL is to choose correct data types, depending on the record set stored in the table. Many MySQL database designers fail to do this, resulting in database driven websites failing miserably. In this article, we will cover the most common data types used in MySQL and how to use built-in MySQL functionalities to figure out the most optimal data type for a column.

Size does matter

A standard practice among database designers is to define primary key as an auto-increment INTEGER. By default MySQL will set this as Unsigned INT. This means you can store 4,294,967,295 records in the given table. Sounds good from a scalability point of view but if you are designing the table for a small website ask yourself a question, is your website going to get 4,294,967,295 users ? Of course we wish your website gets these many users but most websites do not have so many users.

Thus choosing Unsigned INT as userID might not be a good choice. For small to medium sized websites choosing MEDIUIMINT as userID will be the bast choice.

Advertisment

When compared to INT, MEDIUMINT consumes approximately 25% less space. A column with INT type consumes 4 bytes whereas a column with MEDIUMINT type will consume 3 bytes.

Direct Hit!
Applies To: Database designers

USP: Learn various ways of optimizing MySQL database

Primary Link: None

Keywords: Optimizing MySQL Database

MySQL has several specialized data types which require lesser memory and space. Smaller the data type size, faster is the record look up and faster the query execution. Specially the primary index of a table should be as small as possible to make row identification faster.

Advertisment

Of course when more users start joining your website you can always alter the table to go to next higher data type. MySQL manual is a good source to know size of different data types and identify the most suitable, smallest data type for a column. Storage requirements for each data type can be found here http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html .

Rule to remember is to start small and grow the field type as record set increase in the table.

NULL is not NULL

NULL columns require additional one bit per column. Setting column as NOT NULL can make everything faster. Specially when your query involves subqueries, MySQL will perform an additional search for rows that contain NULL. This search can be avoided when there are no NULLs in the table. You should definitely use NULL if your application requires them but do not set columns as NULLable by default. Using a constant value which falls outside domain range for the column instead of NULL can speed up the look ups.

Date and time

Setting record creation date and record update date to DATETIME is a favorite among database designer. Most people are not aware that DATETIME is among the costliest data types in terms of space. DATETIME consumes 8 bytes of space. Date and time types should be avoided unless you need to compare dates in your quries.

Advertisment

If you have to store dates for display purpose only, storing Unix time stamp as a MEDIUMINT can surve the purpose. All programing languages provide mechanism for creating current Unix time stamp and converting Unix time stamp back to date format. So, 1223890532 will result in faster look up compared to 2008-10-13 05:35:32.

CHAR or VARCHAR ?

CHAR is fixed width and VARCHAR is variable length. But VARCHAR requires extra byte to store the record. If length of VARCHAR is 255 or less it requires one byte but if the length of VARCHAR is greater than 255 bytes it requires two bytes. A VARCHAR is store as one-byte or two-byte length prefix plus data.

The length prefix contains the number of bytes in the record. A field declared as VARCHAR(5) may take 6 bytes instead of 5 bytes as expected.

Advertisment

Finding the correct data type

MySQL comes with a built-in procedure to analyze a query and suggests the optimal data types for each column. This procedure is called "Procedure Analyse".

Procedure analyse will analyze the existing data in a table and try to guess the optimal data type based on the distinct values per column present in the table.

The syntax for this procedure is following;

Advertisment

SELECT col1, col2 FROM table1 PROCEDURE ANALYSE(>);

Procedure Analyze expects two optional parameters max_elements and max_memory. max_memory is maximum memory that shall be allocated on per column basis while trying to find all distinct values for a column.

max_elements defaults to a value of 256 and is used to identify the maximum number of distinct values analyse will use per column. Distinct values are used to identify if ENUM is a

suitable data type for a given column.

Advertisment

To test procedure analyse we created a table called user_master here is the description of the table

mysql> desc user_master ;

+-----------------+---------------------+---

| Field | Type | Null | Key | Default | Extra |

+-----------------+---------------------+---

| user_id | int(11) | NO | PRI | | auto_increment |

| first_name | varchar(20) | YES | | NULL | |

| last_name | varchar(20) | YES | | NULL | |

+-----------------+----------------------+--

3 rows in set (0.00 sec)

We loaded the table with 16 users but ensured that first_name of the users is "Pcquest" or "Preview" or "Test". We also ensured every user has unique last_name. Following command was used to analyze the table and identify the optimal data type.

mysql> select user_id,first_name,last_name from user_master procedure analyse(10,50)\G

********* Row 1*****************

Field_name: mydb.user_master.user_id

Min_value: 1

Max_value: 16

Min_length: 1

Max_length: 2

Empties_or_zeros: 0

Nulls: 0

Avg_value_or_avg_length: 8.5000

Std: 4.6098

Optimal_fieldtype: TINYINT(2) UNSIGNED NOT NULL

**************Row 2********** ***

Field_name: mydb.user_master.first_name

Min_value: Pcquest

Max_value: Test

Min_length: 4

Max_length: 7

Empties_or_zeros: 0

Nulls: 0

Avg_value_or_avg_length: 4.1875

Std: NULL

Optimal_fieldtype: ENUM('Pcquest','Preview','Test') NOT NULL

*************Row 3************ Field_name: mydb.user_master.last_name

Min_value: Usr1

Max_value: User101

Min_length: 4

Max_length: 7

Empties_or_zeros: 0

Nulls: 0

Avg_value_or_avg_length: 4.1875

Std: NULL

Optimal_fieldtype: VARCHAR(7) NOT NULL

As you can notice the optimal data type suggested for user_id is TINYINT, as for only 16 users INT type is an overkill. TINYINT will consume 1 byte where as INT will consume 4 bytes, a saving of 75% of space and memory.

Procedure Analyse at times reports optimal data type as ENUM where you least expect. In the example above first_name is suggested to be ENUM as procedure analyse will only see 3 distinct values in 16 records. As procedure analyse works on existing records in the table and distinct records in the current data set, ENUM will the most optimal data type for first_name.

By trying different values to procedure analyse, more suggestions about optimal data type can be obtained. You may have to play with settings till procedure analyse does not return ENUM data type when it is not appropriate.

On a fairly large record set, results of procedure analyse are very useful.

tech-news