- Don't Index
Everything
- Use benchmarking
- Minimize traffic by fetching only what you need.
- Paging/chunked data
retrieval to limit
- Don't use SELECT
*
- Be wary of lots of small quick queries if a longer query
can be more efficient
- Use EXPLAIN to profile the query
execution plan
- Use Slow Query
Log (always have it on!)
- Don't use DISTINCT when you have or could use GROUP BY
- Use proper data
partitions
- For Cluster. Start thinking about Cluster *before* you need them
- Insert
performance
- Batch INSERT
and REPLACE
- Use LOAD DATA
instead of INSERT
- LIMIT m,n may not be as fast as it sounds
- Don't use ORDER BY RAND() if you have > ~2K records
- Use SQL_NO_CACHE when you are SELECTing frequently updated data
or large sets of data
- avoid wildcards at the start of LIKE queries
- avoid correlated subqueries and in select
and where clause (try to avoid in)
- config params --
- no calculated comparisons -- isolate indexed columns
- innodb_flush_commit=0 can help slave lag
- ORDER BY and LIMIT work best with equalities and covered indexes
- isolate workloads don't let administrative work interfere with customer performance. (ie backups)
- use optimistic locking, not pessimistic locking. try to use shared lock
, not exclusive lock
. share mode vs. FOR UPDATE
- use row-level instead of table
-level locking for OLTP workloads
- Know your storage engines and what performs best for your needs, know that different ones exist.
- use MERGE tables ARCHIVE tables for logs
- Optimize for data
types, use consistent data
types. Use PROCEDURE ANALYSE() to help determine if you need less
- separate text/blobs from metadata, don't put text/blobs in results if you don't need them
- if you can, compress text/blobs
- compress static data
- don't back up static data
as often
- derived tables (subqueries in the FROM clause) can be useful for retrieving BLOBs w/out sorting them. (self-join
can speed up a query
if 1st part finds the IDs and use it to fetch the rest)
- enable and increase the query
and buffer caches if appropriate
- ALTER TABLE
...ORDER BY can take chronological data
and re-order it by a different field
- InnoDB ALWAYS keeps the primary key
as part of each index
, so do not make the primary key
very large, be careful of redundant columns in an index
, and this can make the query
faster
- Do not duplicate indexes
- Utilize different storage engines on master/slave ie, if you need fulltext indexing on a table
.
- BLACKHOLE engine and replication
is much faster than FEDERATED tables for things like logs.
- Design sane query
schemas. don't be afraid of table
joins, often they are faster than denormalization
- Don't use boolean flags
- Use a clever key and ORDER BY instead of MAX
- Keep the database
host as clean as possible. Do you really need a windowing system on that server?
- Utilize the strengths of the OS
- Hire a MySQL
(tm) Certified DBA
- Know that there are many consulting companies out there that can help, as well as MySQL
's Professional Services.
- Config variables & tips:
- use one of the supplied config files
- key_buffer, unix cache (leave some RAM free), per-connection variables, innodb memory variables
- be aware of global vs. per-connection variables
- check SHOW STATUS and SHOW VARIABLES (GLOBAL|SESSION in 5.0 and up)
- be aware of swapping esp. with Linux, "swappiness" (bypass OS filecache for innodb data
files, innodb_flush_method=O_DIRECT if possible (this is also OS specific))
- defragment tables, rebuild indexes, do table
maintenance
- If you use innodb_flush_txn_commit=1, use a battery-backed hardware cache write controller
- more RAM is good so faster disk speed
- use 64-bit architectures
- Know when to split a complex query
and join
smaller ones
- Debugging sucks, testing rocks!
- Delete
small amounts at a time if you can
- Archive old data
-- don't be a pack-rat! 2 common engines for this are ARCHIVE tables and MERGE tables
- use INET_ATON and INET_NTOA for IP addresses, not char or varchar
- make it a habit to REVERSE() email addresses, so you can easily search domains
- --skip-name-resolve
- increase myisam_sort_buffer_size to optimize large inserts (this is a per-connection variable)
- look up memory tuning parameter for on-insert
caching
- increase temp table
size in a data
warehousing environment (default is 32Mb) so it doesn't write to disk (also constrained by max_heap_table_size, default 16Mb)
- Normalize first, and denormalize where appropriate.
- Databases are not spreadsheets, even though Access
really really looks like one. Then again, Access isn't a real database
- In 5.1 BOOL/BIT NOT NULL
type is 1 bit, in previous versions it's 1 byte.
- A NULL
data
type can take more room to store than NOT NULL
- Choose appropriate character sets & collations -- UTF16 will store each character in 2 bytes, whether it needs it or not, latin1 is faster than UTF8.
- make similar queries consistent so cache is used
- Have good SQL
query
standards
- Don't use deprecated features
- Use Triggers wisely
- Run in SQL_MODE=STRICT to help identify warnings
- Turning OR on multiple index
fields (<5.0)>
- /tmp dir on battery-backed write cache
- consider battery-backed RAM for innodb logfiles
- use min_rows and max_rows to specify approximate data
size so space can be pre-allocated and reference points can be calculated.
- as your data
grows, indexing may change (cardinality and selectivity change). Structuring may want to change. Make your schema as modular as your code. Make your code able to scale. Plan and embrace change, and get developers to do the same.
- pare down cron scripts
- create a test environment
- try out a few schemas and storage engines in your test environment before picking one.
- Use HASH indexing for indexing across columns with similar data
prefixes
- Use myisam_pack_keys for int data
- Don't use COUNT * on Innodb tables for every search, do it a few times and/or summary tables, or if you need it for the total # of rows, use SQL_CALC_FOUND_ROWS and SELECT
FOUND_ROWS()
- use --safe-updates for client
- Redundant data
is redundant
- Use INSERT
... ON DUPLICATE KEY update
(INSERT
IGNORE) to avoid having to SELECT
- use groupwise maximum instead of subqueries
- be able to change your schema without ruining functionality of your code
- source control schema and config files
- for LVM innodb backups, restore to a different instance of MySQL
so Innodb can roll forward
- use multi_query if appropriate to reduce round-trips
- partition appropriately
- partition your database
when you have real data
- segregate tables/databases that benefit from
|