100 MySQL Tips You MUST Know
Posted on 28 June 2007 by admin
Here are 100 Tips, Tricks and Optimizations that you absolutely MUST know and learn if you are going to run MySql.
1) Use EXPLAIN to profile the query execution plan
2) Use Slow Query Log (always have it on!)
3) Don’t use DISTINCT when you have or could use GROUP BY
4) Insert performance
i. Batch INSERT and REPLACE
ii. Use LOAD DATA instead of INSERT
5) LIMIT m,n may not be as fast as it sounds
6) Don’t use ORDER BY RAND() if you have > ~2K records
7) Use SQL_NO_CACHE when you are SELECTing frequently updated data or large sets of data
Avoid wildcards at the start of LIKE queries
9) Avoid correlated subqueries and in select and where clause (try to avoid in)
10) No calculated comparisons — isolate indexed columns
11) ORDER BY and LIMIT work best with equalities and covered indexes
12) Separate text/blobs from metadata, don’t put text/blobs in results if you don’t need them
13) Derived tables (subqueries in the FROM clause) can be useful for retrieving BLOBs without sorting them. (Self-join can speed up a query if 1st part finds the IDs and uses then to fetch the rest)
14) ALTER TABLE…ORDER BY can take data sorted chronologically and re-order it by a different field — this can make queries on that field run faster (maybe this goes in indexing?)
15) Know when to split a complex query and join smaller ones
16) Delete small amounts at a time if you can
17) Make similar queries consistent so cache is used
18) Have good SQL query standards
19) Don’t use deprecated features
20) Turning OR on multiple index fields (<5.0) into UNION may speed things up (with LIMIT), after 5.0 the index_merge should pick stuff up.
21) Don’t use COUNT * on Innodb tables for every search, do it a few times and/or summary tables, or if you need it for the total # of rows, use SQL_CALC_FOUND_ROWS and SELECT FOUND_ROWS()
22) Use INSERT … ON DUPLICATE KEY update (INSERT IGNORE) to avoid having to SELECT
23) use groupwise maximum instead of subqueries
24) MySQL is interpreted from right to left so you should put the most significant limiters as far to the right as possible.
25) Don’t put things that changes very rarely in the database, instead put it in a global array in some include file
26) Use indexes on the columns in the WHERE clause and on the columns you want to ORDER BY
27) If you only want one line as a result from the database you should always use LIMIT 1. This way mysql stops searching when it finds the first line instead of continuing through the whole database, only to find that there weren’t any more lines that matched the query
28) If you use $line = mysql_fetch_array($result) you’ll get two ways of accessing the columns, $line[0] and $line['columnname']. If you only use the $line['columnname'] you should use $line = mysql_fetch_assoc($result) instead, then there will not be any $line[int index] array
29) Sometimes mysql_free_result() end up wasting more memory than it saves. Check the difference with memory_get_usage().
30) Use datatypes that fits your data, not too large. For example, INT can hold values up to 4294967295 unsigned, which is often unnecessarily big. Use MEDIUMINT or SMALLINT where applicable
31) Use NOT NULL as default value as much as you can, it speeds up execution and saves one bit
32) Try to avoid complex SELECT queries on MyISAM tables that are updated frequently, to avoid problems with table locking that occur due to contention between readers and writers
33) Use INSERT DELAYED when you do not need to know when your data is written. This reduces the overall insertion impact because many rows can be written with a single disk write
34) Use LOAD DATA INFILE to load large amounts of data. This is faster than using INSERT statements
Scaling Performance Tips:
35) Use benchmarking
36) isolate workloads don’t let administrative work interfere with customer performance. (ie backups)
37) Debugging sucks, testing rocks!
38) As your data grows, indexing may change (cardinality and selectivity change). Structuring may want to change. Make your schema as modular as your code. Make your code able to scale. Plan and embrace change, and get developers to do the same.
39) Don’t ask the database for the same stuff over and over again, save the result
Network Performance Tips:
40) Minimize traffic by fetching only what you need.
i. Paging/chunked data retrieval to limit
ii. Don’t use SELECT *
iii. Be wary of lots of small quick queries if a longer query can be more efficient
41) Use multi_query if appropriate to reduce round-trips
OS Performance Tips:
42) Use proper data partitions
For Cluster. Start thinking about Cluster *before* you need them
43) Keep the database host as clean as possible. Do you really need a windowing system on that server?
44) Utilize the strengths of the OS
45) pare down cron scripts
46) create a test environment
47) source control schema and config files
48) for LVM innodb backups, restore to a different instance of MySQL so Innodb can roll forward
49) partition appropriately
50) partition your database when you have real data — do not assume you know your dataset until you have real data
MySQL Server Overall Tips:
51) innodb_flush_commit=0 can help slave lag
52) Optimize for data types, use consistent data types. Use PROCEDURE ANALYSE() to help determine the smallest data type for your needs.
53) use optimistic locking, not pessimistic locking. try to use shared lock, not exclusive 54) lock. share mode vs. FOR UPDATE
55) if you can, compress text/blobs
56) compress static data
57) don’t back up static data as often
58) enable and increase the query and buffer caches if appropriate
59) config params — http://docs.cellblue.nl/2007/03/17/easy-mysql-performance-tweaks/ is a good reference
60) Config variables & tips:
i. use one of the supplied config files
ii. key_buffer, unix cache (leave some RAM free), per-connection variables, innodb memory variables
iii. be aware of global vs. per-connection variables
iv. check SHOW STATUS and SHOW VARIABLES (GLOBAL|SESSION in 5.0 and up)
v. be aware of swapping esp. with Linux, "swappiness" (bypass OS filecache for innodb data files, innodb_flush_method=O_DIRECT if possible (this is also OS specific))
vi. defragment tables, rebuild indexes, do table maintenance
vii. If you use innodb_flush_txn_commit=1, use a battery-backed hardware cache write controller
viii. more RAM is good so faster disk speed
ix. use 64-bit architectures
61) –skip-name-resolve
62) increase myisam_sort_buffer_size to optimize large inserts (this is a per-connection variable)
63) look up memory tuning parameter for on-insert caching
64) increase temp table size in a data warehousing environment (default is 32Mb) so it 65) doesn’t write to disk (also constrained by max_heap_table_size, default 16Mb)
66) Run in SQL_MODE=STRICT to help identify warnings
67) /tmp dir on battery-backed write cache
68) consider battery-backed RAM for innodb logfiles
69) use –safe-updates for client
70) Redundant data is redundant
Storage Engine Performance Tips:
71) InnoDB ALWAYS keeps the primary key as part of each index, so do not make the primary key very large
72) Utilize different storage engines on master/slave ie, if you need fulltext indexing on a table.
73) BLACKHOLE engine and replication is much faster than FEDERATED tables for things like logs.
74) Know your storage engines and what performs best for your needs, know that different ones exist.
i. ie, use MERGE tables ARCHIVE tables for logs
ii. Archive old data — don’t be a pack-rat! 2 common engines for this are ARCHIVE tables and MERGE tables
75) use row-level instead of table-level locking for OLTP workloads
76) try out a few schemas and storage engines in your test environment before picking one.
Database Design Performance Tips:
77) Design sane query schemas. don’t be afraid of table joins, often they are faster than denormalization
78) Don’t use boolean flags
79) Use Indexes
80) Don’t Index Everything
81) Do not duplicate indexes
82) Do not use large columns in indexes if the ratio of SELECTs:INSERTs is low.
be careful of redundant columns in an index or across indexes
83) Use a clever key and ORDER BY instead of MAX
84) Normalize first, and denormalize where appropriate.
85) Databases are not spreadsheets, even though Access really really looks like one. Then again, Access isn’t a real database
86) use INET_ATON and INET_NTOA for IP addresses, not char or varchar
87) make it a habit to REVERSE() email addresses, so you can easily search domains (this will help avoid wildcards at the start of LIKE queries if you want to find everyone whose e-mail is in a certain domain)
88) A NULL data type can take more room to store than NOT NULL
89) Choose appropriate character sets & collations — UTF16 will store each character in 2 bytes, whether it needs it or not, latin1 is faster than UTF8.
90) Use Triggers wisely
91) use min_rows and max_rows to specify approximate data size so space can be pre-allocated and reference points can be calculated.
92) Use HASH indexing for indexing across columns with similar data prefixes
93) Use myisam_pack_keys for int data
94) be able to change your schema without ruining functionality of your code
95) segregate tables/databases that benefit from different configuration variables
Other:
96 Hire a MySQL Certified DBA
97) Know that there are many consulting companies out there that can help, as well as MySQL’s Professional Services.
98) Read and post to MySQL Planet at http://www.mysqlplanet.org
99) Attend the yearly MySQL Conference and Expo or other conferences with MySQL tracks
100) Support your local User Group
101) Read the Articles at http://www.goitexpert.com
102) Learn how to use Paypal and donate to sites that support GNU, GPL and Open Source
Tags | Databases, Linux, Microsoft, Networking, Servers
