MySQL Performance: Identifying Long Queries

MySQL performance

An optimized database server is key to ensuring that apps which rely on MySQL run efficiently. Sometimes all that’s needed to see website and application performance improvements are a few small adjustments, so in this article, we look at some of the tweaks that can help boost performance the most.

What Does it Cover?

Our advice here applies to the majority of MySQL servers running Linux and covers (among others) both dedicated and cloud VPS servers that run on various Linux distros and these system types:

  • Core-managed CentOS 6x/7x
  • Core-managed Ubuntu 14.04/16.04
  • Fully-managed CentOS 6/7 cPanel
  • Fully-managed CentOS 7 Plesk Onyx/Obsidian
  • Self-managed Linux servers

If you’ve chosen not to take advantage of direct support, you can still use these techniques, but we’re assuming that you’re familiar with these basic system admin principles:

  • SSH connections and how to navigate the standard Linux command-line shell environment.
  • Opening, Editing and Saving files in Vim or your preferred system editor.
  • MySQL interactive mode and general MySQL query syntax.

What is MySQL Optimization?

MySQL Optimization doesn’t have an exact meaning, but the general idea is pretty self-evident. It’s just about making your database function as efficiently and effectively as possible, and what that means in practical terms will obviously vary depending on the needs of your organization. Here, we’re taking it to mean a server which has been set up to avoid the kind of typical bottlenecks that can arise.

What’s a bottleneck?

A wine bottle narrows at the neck, so there’s a tighter space for your tipple to squeeze through, so it serves as the perfect metaphor for a road that narrows from two lanes to one and slows traffic or a computer system that hinders data transit for some technical reason.

E.g. the server could cope with 10 simultaneous connections, but as it is, it can only manage 5. If no more than 5 connections are established then the system will hum along happily, but as soon as it hits 6 then the limits will become all too clear.

When Should a MySQL database be optimized?

The best time to tune your database is when doing so will have the lowest impact on whatever services rely on it. As for frequency, you should audit it every week or month to make sure that poor performance isn’t slowing down apps. It’s easy to see when this is happening because you’ll notice a few tell-tale signs:

  • queries back up and never complete in the MySQL process table.
  • websites or websites that rely on it become sluggish.
  • more connection timeout errors, particularly during peak hours.

It’s fine to have a number of concurrently running queries at the same time on a system, but that becomes a burden when they regularly take more time than usual to complete. Even though the threshold for what constitutes a burden will vary from system to system and from app to app, average query times beyond several seconds will result in slower performance for websites and apps that depend on them. You may not notice that anything is amiss until a spike in traffic reveals that there is indeed a bottleneck. That’s why it pays to be proactive and optimize before this happens.

Tracking Down Performance Issues

To diagnose a particular bottleneck, you’ll need to know how to do an examination of the MySQL process table. You can view it in a few different ways, although this will vary according to your setup. To keep it short, we’ll just look at the most frequently used Secure Shell (SSH) access methods:

The MySQL Process Table: Method 1

The ‘mysqladmin’ command-line tool can be used with the flag ‘processlist’ or ‘proc’ for short. (You  can add the flag ‘statistics’ or ‘stat’ for short to show running statistics for queries since MySQL’s most recent restart.)

Command:

mysqladmin proc stat

Output:

+-------+------+-----------+-----------+---------+------+-------+

| Id | User | Host | db | Command | Time | State | Info | Progress |

+-------+------+-----------+-----------+---------+------+-------+--------------------+----------+

| 77255 | root | localhost | employees | Query | 150 | | call While_Loop2() | 0.000 |

| 77285 | root | localhost | | Query | 0 | init | show processlist | 0.000 |

+-------+------+-----------+-----------+---------+------+-------+--------------------+----------+

Uptime: 861755 Threads: 2 Questions: 20961045 Slow queries: 0 Opens: 2976 Flush tables: 1 Open tables: 1011 Queries per second avg: 24.323

For: can be used on the shell interface, so feeding output to other scripts and tools is no problem.

Against: The info column of the process table is always truncated so you won’t be able to view the full query if it’s longer.

The MySQL Process Table: Method 2

Run the query ‘show processlist;’ from the MySQL interactive mode prompt. (Add the ‘full’ modifier to the command to stop the Info column being truncated. You’ll be glad of it when long queries would otherwise get cut off.)

Command:

show processlist;

Output:

MariaDB [(none)]> show full processlist;

+-------+------+-----------+-----------+---------+------+-------+-----------------------+----------+

| Id | User | Host | db | Command | Time | State | Info | Progress |

+-------+------+-----------+-----------+---------+------+-------+-----------------------+----------+

| 77006 | root | localhost | employees | Query | 151 | NULL | call While_Loop2() | 0.000 |

| 77021 | root | localhost | NULL | Query | 0 | init | show full processlist | 0.000 |

+-------+------+-----------+-----------+---------+------+-------+-----------------------+----------+

 

For: Using the full modifier allows for seeing the full query on longer queries.

Against: MySQL Interactive mode can’t access the scripts and tools of the shell interface.

Using the slow query log

Another useful tool in MySQL is the included slow query logging feature, which is the preferred way of regularly finding long-running queries. You can adjust this feature using several directives, but the most commonly needed settings are:

slow_query_log – enable/disable the slow query log

slow_query_log_file – name and path of the slow query log file

long_query_time – time in seconds/microseconds defining a slow query

You can set these from the [mysqld] section of the MySQL configuration file that you’ll find at /etc/my.cnf. You’ll need to do a MySQL service restart to make them work. Take a look at this example for formatting:

Caution

The slow query log file requires a lot of disk space, and it needs continual attention while it’s running. Remember that the lower your long_query_time directive the quicker the slow query log will fill a disk partition.

[mysqld]

log-error=/var/lib/mysql/mysql.err

innodb_file_per_table=1

default-storage-engine=innodb

innodb_buffer_pool_size=128M

innodb_log_file_size=128M

max_connections=300

key_buffer_size = 8M

slow_query_log=1

slow_query_log_file=/var/lib/mysql/slowquery.log

long_query_time=5

After the slow query log has been enabled you will need to follow up from time to time to evaluate troublesome queries that require your attention. To review the contents of the slow query log file, you can parse it directly. This example shows statistics for a sample query that ran over the allotted 5 seconds:

Caution

If you enable the slow query log feature, then be advised that performance will take a hit. That’s because extra routines will be analyzing every query and writing them to the log file which will increase the input/output burden. This additional overhead explains why it’s considered to be best practice to disable the slow query log on production systems. In fact, you should only enable it for a fixed period: when you’re actively hunting for queries that might be slowing down a website or app.

# Time: 180717 0:23:28

# [email protected]: root[root] @ localhost [] # Thread_id: 32 Schema: employees QC_hit: No

# Query_time: 627.163085 Lock_time: 0.000021 Rows_sent: 0 Rows_examined: 0

# Rows_affected: 0

use employees;

SET timestamp=1531801408;

call While_Loop2();

 

Alternatively, there’s also the mysqldumpslow command line tool, which parses the slow query log file and puts like queries together but excludes the values of number and string data.

NoSQL vs SQL: Examining the Differences and Deciding Which to Choose

NoSQL vs SQL

At 74, Larry Ellison, co-founder and CTO of Oracle has amassed a fortune of $66.1 billion. He got going in 1966 and in the seventies took an idea from IBM’s Edgar F. Cobb for a SQL relational database. This became the Oracle Database rdbms (relational database management system). With no free software competitors, Oracle totally dominated the market. Everything else, like DB2 was running on IBM mainframes, and even it couldn’t oust Oracle from its top position. Mainframes remained popular until the 1990s, when PCs started to be used as servers, as they still are today. Oracle is still in the top spot for the majority of transactional business applications used by the richest companies. It bought the commonest opensource, MySQL, along with opensource Java, but both are still free to use. The big choice for all companies is still SQL vs NoSQL – between relational (SQL) or non-relational (NoSQL) data structure. Both are great in their own way, and both come with pros and cons of course, which we’ve listed for you here.

What is SQL?

SQL (Structured Query Language) organizes information in relational databases. It’s used in Oracle, Sybase, Microsoft SQL Server, Access, and Ingres. SQL uses relations (usually referred to as tables) to store and match data using shared features within the dataset.

It was Cobb’s notion that you could have a database that could be queried using a structured query language. He used SQL to create data in objects known as tables, along with the schema for that data, which describes fields in columns. One SQL record is known as a row.

What is NoSQL?

A NoSQL database describes itself, so it doesn’t need a schema. It also doesn’t mandate relations between tables in all scenarios. It only uses JSON documents, which are self-contained and easy to understand. NoSQL means high-performance, non-relational databases that use many different data models. They are known to be easy to use, have scalable performance, are resilient, and also widely available. NoSQL database examples include MongoDB, MarkLogic, Couchbase, CloudDB, and Amazon’s Dynamo DB.

NoSQL vs SQL: Major Differences

When choosing a data management system for your organization, you need to take into account the many and varied differences between SQL and NoSQL. There are differences in:

  • Language
  • Scalability
  • Community
  • Structure

Language

Use of a Structured Query Language makes any SQL-based database very versatile and helps to explain why it is used so widely. On the downside though, this also restricts it. You have to use predefined schemas to set out the structure of your data before you can even get started. Your data has to use the same structure too, structure as well, you may have to invest considerable time into pairing your data to make it ready.

A NoSQL database has a dynamic schema for unstructured data which can be stored in a lot of different ways, including graph-based, document-oriented, column-oriented, or organized as a KeyValue store. Being highly flexible like this means you won’t be burdened with the same amount of preparation. You’re free to add fields as you go and vary the syntax from database to database. Every document can have its own individual structure, so have a great deal of latitude.

Scalability

Another significant difference between SQL and NoSQL is how scalable they are. With the majority of SQL databases, can scale them vertically, meaning individual servers can be boosted through the addition of more RAM, SSD, or faster CPU. But NoSQL databases scale horizontally, meaning that they can handle increased traffic simply by adding more servers to the database. NoSQL databases have the ability to become larger and much more powerful, so they are great for handling large or constantly evolving data sets.

Community

SQL has been around for long enough now that its community is large and well developed. If you need a query answered or want to pick up new skills, there are seemingly endless forums full of experienced users who will be glad to help you out. NoSQL can’t match this level of peer support yet because it’s the new kid on the block, so unfortunately, you’ll have to come back in a few years.

Structure

SQL databases use a tables approach which makes them better suited to handling apps that ask for multi-row transactions. Accounting systems or legacy systems that were originally created for a relational structure are examples of these. NoSQL databases can be key-value pairs, wide-column stores, graph databases, or document-based.

SQL or NoSQL: Which One is Going to Fit Your Business?

The best way to determine which database is right for your business is to look at what you need it to do. If you need a predetermined structure, multi-row transactions and set schemas then SQL is the one to go for. It’s also highly consistent, which makes it an ideal choice for accounting systems.

If your company is growing rapidly and doesn’t need clear schema definitions, then NoSQL is what you want. A relational database won’t offer as much flexibility as NoSQL, which is great for companies that need to churn through large amounts of data that comes in varying structures.

Examples

We can see that the first field is teacher and the second field is subject.

{ teacher:  "James Smith", subject:  "literature" }

With SQL, you create this schema before adding it to the database:

CREATE TABLE teacherSubjects (
teacher varchar,
subject varchar
);

Varchar is variable character length. To add data to that table, you would:

INSERT INTO teacherSubjects (teacher, subject)
VALUES ("James Smith", "literature");

With a NoSQL database, in this example using MongoDB, you would use the database API to insert data like this:

db.teacherSubjects.insert( { name: "James Smith", subject: "literature" } )

Now you can create the union (all elements from two or more sets) and intersection (common elements of two or more sets) of sets using SQL.

This was such a big deal because all this could be programmed using simple SQL syntax. Then Oracle added indexing fields and caching records to improve performance and make sure that the database could complete referrals with integrity. (Referential integrity is about the completeness of transactions, so you aren’t left with orphaned records. For instance, a sales record with no product to go with it. This is what enforcing the relationship between tables refers to in Oracle.)

Note that in the above MongoDB example, Oracle programmers would call the teacherSubjectes table an intersection. It tells you what subjects a teacher has and also which teachers are in which subject. So you could also add things like subject room number and teacher email address to both records.

The Oracle database is known as a row-oriented database because that’s how it’s organized. There’s no need to turn our attention to column-oriented databases like Cassandra here, because they have different architecture. So, they are not so fundamentally different as SQL vs NoSQL. In particular, the Cassandra NoSQL database columns with similar data near to each other for the fastest possible retrieval. Cassandra and NoSQL databases do away with the concept of database normalization, which is fundamental to Oracle, as we outline below. And they don’t keep empty column values, so the rose can be different lengths.

Normalization and Efficiency

Something which Oracle emphasized was the relationship between objects, insisting that all data should be normalized, and nothing should be stored twice. In practical terms, instead of repeating the school address in every teacher record, it would be more efficient to keep a school table and put the address there. This constraint is largely absent in NoSQL databases, so it wins out here in the SQL vs No SQL debate.

Storage space and memory were costlier in the 1970s, so normalization was necessary. These days though, assembling a record that is split between different tables takes more of both, not to mention the fact that you also need to maintain index files, which can slow everything down.

Fans of NoSQL databases say memory and storage are so cheap and processing power so exponentially faster now that none of that really matters. The computer can handle it and it’s easier for programmers to code.

NoSQL vs SQL

SAP is Oracle’s biggest business competitor and has its own database, Hana. Oracle keeps all its records in memory (flushing them to disk as necessary) for the speed advantage it brings, but apart from that, they work in pretty much the same way.

NoSQL has been around for so long that it’s hard to argue a business case for changing to a newer one. When firms already understand rdbms, why switch? Oracle has solved management issues like data replication, which might leave someone using, ElasticSearch, for instance, unsupported with a compromised system on their hands. To avoid this, some businesses support opensource databases, like ElasticSearch, in-house, so you can buy in the help you need from them.

There’s been a big shift towards transactional systems. The addition of a sale to a sales database is an easy-to-understand concept. Once it’s done, Oracle calculates on-hand inventory using a saved SQL operation called a view. For MongoDB, a program would have to go through inventory items and takeaway the sales to work out the new on-hand inventory.

NoSQL Databases in Action

Looking at MySQL vs NoSQL, it’s interesting to note that NoSQL databases tend to be used in niche rather than enterprise systems. Uber is a good example as it uses Cassandra to keep tabs on its drivers, but it has unique needs, like writing millions of records a second across many data centers. The company wrote its own version of Cassandra in order to have it run on Mesos. Mesos is an orchestration system that resembles containers.

Amazon markets is a DynamoDB database which has “millisecond latency.”

DynamoDB, like MongoDB, has a JavaScript interface, which makes it simple to use. To add a record, for instance you open the database, then add a JSON item like this:

var docClient = AWS.DynamoDB.DocumentClient()
docClient.put("{JSON … }"}

One implementation detail is that you can use Node.js to run these operations in MongoDB and DynamoDB. Which means JavaScript running in the middle tier, so you don’t have to create JAR files or middleware servers like Oracle Weblogic.

So, which of the two is best for you? You could still run your accounting system on a RDBMS system. But don’t necessarily need to pay licensing fees to Oracle. You could use MySQL instead. But will it use MongoDB? That is unlikely in the short term, as huge numbers of programmers across the globe use Java and Oracle, which project managers and users understand. Use ElasticSearch for logs and Spark for analytics. With the others, look at them individually to see if they will fit in with your resources, skills, tolerance for suffering lost transactions, etc.

Conclusion

Whatever your field, selecting the correct database for your firm is a crucial decision. NoSQL databases are rapidly establishing themselves as a significant force in the database landscape. They bring many benefits: they are cheaper, are open-source, and easily scalable, which makes NoSQL more appealing for anyone who needs Big Data integration. It’s a new technology though, which can bring its own problems.

SQL databases, in contrast have had more than four decades to establish their well-defined. A mature community offers almost limitless possibilities for collaboration and support.

In the end, the choice of SQL vs NoSQL for business will come down to the individual needs of the companies concerned. Only through extensive research comparing their abilities to your needs will you find the one that is the best fit.