Linux has a plethora of command line tools to use when troubleshooting performance or application issues. These utilities include top, df, ps, free, iostat, vmstat, lnstat, and others. Each tool is valuable and has its place in your troubleshooting and monitoring toolbox.
Out of all of these, there’s no tool combination I’ve found more valuable for troubleshooting performance issues than vmstat and top. Many Linux users use top frequently, even if you don’t understand everything it’s telling you. I’m going to be touching only lightly on top, but you may be surprised how much more you understand about top after reading this article as well.
Virtual Memory: The History
Vmstat’s man page describes it as: “Report virtual memory statistics”, which is great, if you know what that means. To understand this fully, and to understand what vmstat is telling you, we need to dive back into the 1960’s and 70’s when virtual memory concepts were really being developed.
Back in those days, memory was at a premium. It was extremely costly, so machines rarely had anywhere near enough primary memory storage. By this, I mean RAM, something most of us today measure in gigabytes, but which back then was measured in kilobytes. Because you couldn’t always do what you wanted in that small amount of memory, operating systems (such as UNIX) had the ability to swap out all or parts of programs to secondary storage (disks), when they didn’t need to be in primary (RAM) memory.
Linux today still provides this capability, even though memory is literally millions of times cheaper (in 1994, my company once bought 128MB of HP server memory at $1000/MB, yes, mega). Today, memory costs about $10/GB. Because memory’s become so cheap, swapping data to disk is rarely necessary on most physical servers. However, given the move to smaller virtualized instances where memory is the limiting factor in the scalability of your virtual environment, memory is once again at a premium.
Swapping and Secondary Memory
Today, we call it “swap space” or your “swap partition” or “swap file”, but no matter the term, it refers to either a raw disk partition or a file on your filesystem that the Linux kernel can use to shuffle rarely used data out to, if memory gets tight. Swap files are also called “virtual memory” because the operating system (in concert with a CPU that supports it, which nearly all of them do, today) makes it look to your application as if there is more memory than there really is.
All that’s well and good: we seemingly get more memory for free. But, as with most things, there’s a catch: obviously disk is not nearly as fast as your main memory. Solid state disks (SSDs) have helped with this quite a bit, but even the fastest SSD is still many times slower than main memory. And, because processors are so fast these days, the difference between running at full speed out of main memory and having to deal with swapping data to disk is more obvious than ever. So, you get even cheaper “virtual” memory, but at a huge cost in performance.
Swapping versus Paging: What’s the difference?
You may have seen the terms, swapping, and paging, and not known what the difference is. Swapping has historically referred to swapping entire processes to a swap file and taking them out of primary memory to make space available for other processes. Paging means taking only the least recently used blocks of memory held by a process, and writing those parts out to secondary storage.
Linux and most, if not all, modern OSes, actually do paging, and never swap out an entire process, or at least don’t look at swapping in that way. They may swap out all the data that makes up a process, but they won’t look at a process and say, let’s take that whole process out. It’s inefficient, and generally not a great idea: it’s very expensive for the disk I/O required, and is overkill for most situations.
Swapping has become a term that is generally synonymous with paging because the distinction has become moot in nearly all cases today. From now on, I’ll use the term “swapping”, because that’s what most everyone uses. Paging is actually more accurate, but less used.
The Performance Impacts of Swapping
Generally, you want your applications to never use swap space, except when data is rarely needed, so you don’t see an impact on your performance. But, you may have experienced your Linux server/instance slowing down to a crawl, and had trouble figuring out why. Maybe you just reboot it at that point, but sometimes that results in rebooting several times a day, which isn’t good for uptime. The answer almost always lies in this one question: is my Linux server swapping (paging) on a consistent basis?
Remember, it generally doesn’t hurt to swap if the data being swapped out is rarely being used. So, it’s safe to use swap space to get your application up and running and your server stabilized. If your server doesn’t stabilize, however (memory keeps growing, or too much is allocated and released on a frequent basis), your application will slow to a horrendous crawl.
How can you determine whether swapping is going on? That’s where vmstat comes in: it shows you what’s happening with virtual memory on a system-wide basis. Where vmstat gives you the system-wide view, top helps you zero-in on a particular process or processes.
Vmstat’s output is pretty cryptic:
[devbox opt]$ vmstat 1 procs ———–memory———- —swap– —–io—- –system– —–cpu—– r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 4036 94372 159648 339188 0 0 2 6 43 33 1 0 99 0 0 0 0 4036 94364 159648 339188 0 0 0 0 32 39 0 0 100 0 0 0 0 4036 94364 159648 339188 0 0 0 0 41 69 0 1 99 0 0
(Note: the “1” after the vmstat tells is to wait one second before showing another sample, and to keep repeating, which is handy to write out to a log file, so you can see how things change over time.)
But, let’s break it down with this simple table:
Heading |
Sub-Heading |
Meaning |
What it really means |
---|---|---|---|
procs |
r |
Number of processes waiting to run. |
If this is greater than zero on average, it means you’re running enough on your server so that your CPU has become a bottleneck. You might need a faster CPU, or to run less on that server. When this number is consistently greater than zero, your system load average will increase. |
procs |
b |
Number of processes in uninterruptable sleep. |
This is the number of processes blocked on I/O. That is, the number of processes waiting for some I/O event to happen (data from a disk read to complete, data written to the network to be fully written, waiting for data to come back from the swap space, etc.). If this is greater than zero, your application is spending a lot of time waiting on data. Maybe you need an index on a database table, or maybe your app is reading or writing too often. |
memory |
swpd |
Total amount of virtual memory in use. |
This indicates (in KB) how much of your swap space is in use. Just because this number is greater than zero doesn’t necessarily mean your server is in trouble, at least not by itself. However, if it gets close to your total swap space (see below to learn how to see your total swap space), your server is at imminent risk of crashing outright. If this number is growing over time, it almost certainly means you’re running out of memory, as your application is requesting more and more memory. |
memory |
free |
Total amount of memory that’s free and unused. |
Oddly, this is actually the least valuable metric reported by vmstat. It tells you very little, because it’s more valuable to have your memory in use for buffers and cache (see below) than sitting idle. Low memory/free values may indicate that you’re getting close to swapping, but the kernel will take some (but not all) memory from buffers and cache before that happens. |
memory |
buffers |
The amount of memory used to buffer I/O. |
This is the amount of memory set aside and used to allow a process to perform I/O and immediately move on to the next operation without waiting for the physical I/O to complete. This reduces the number of processes getting blocked waiting for I/Os to complete (see procs/b, above). |
memory |
cache |
The amount of memory used as file system cache. |
File system cache, not to be confused with file system buffers, is the amount of memory set aside to provide faster access to frequently used disk data, to reduce the amount of file system I/O and thus accelerate the system. |
swap |
si |
Number of “swap-in” blocks since the last poll. |
This indicates the number of data blocks that have been read from swap since the last polling interval (that is, for vmstat 1, the polling interval is one second). This can indicate serious problems, if this number is consistently greater than zero. It means that data is consistently being read from your swap storage, so your system is being slowed. Said slightly differently, this means that process data has been swapped to disk, and now is being pulled back. If that happens frequently, you can bet that your performance is suffering. |
swap |
so |
Number of “swap-out” blocks since the last poll. |
This is the number of data blocks written to swap since the last polling interval. This is one of the most important pieces of data that vmstat outputs: it shows the amount of disk I/O being performed to write to your swap area. This number will be greater than zero as processes are requesting more memory than is available, thereby forcing other memory pages out to disk. |
io |
bi |
Blocks read from a block device. |
This is the number of blocks read from any block device (generally meaning a disk), during the last polling interval. This can indicate that disk reads are the bottleneck of your system, if this is non-zero, but swap/si is 0. Copying or scanning large files, or running large database queries are very common reasons for this value to elevate. |
io |
bo |
Blocks written to a block device. |
This is the number of blocks written to any block device during the last polling interval. Again, because these are standard disk writes, they may or may not be related to swapping. If swap/so is zero and this is non-zero on a consistent basis, your process may simply be slowed by the amount of disk writes it is doing. |
system |
in |
Number of interrupts per second, including the system clock. |
An interrupt is a processor instruction that tells the processor to stop what it is doing and go take care of some urgent business. A common example is:
This number is insidious: a high number of interrupts generally occurs only in very specialized cases, and usually implicate a bad device driver, or special I/O circumstance (like a 10GBps Ethernet interface receiving traffic at full speed). The reason it’s insidious is that your system can seem to slow down for no good reason at all: memory’s fine, disk I/O can be minimal, and no specific process may be taking up much of the CPU. So, if you can’t find any other good reason for a system slow down, look here: it generally means that the kernel is doing things that are hidden from the user, and not represented within any process. |
system |
cs |
Context switches per second. |
Every bit as insidious as interrupts because its system impacts are not obvious to the user. This indicates how many times the thread of execution of the CPU switches between user space (running normal user process code) and kernel space (running code within the kernel). This can indicate an incorrectly compiled kernel, or it can indicate a user process that relies so heavily on kernel services that the CPU spends relatively little time executing the user process code, but spends a lot of time executing code within the kernel. A great example of this is when trying to use a user-space process to process and filter line-rate traffic at high speeds (> 1Gbps). So many packets come in, that in this case, actually the device driver (in the kernel) provides packets to a listening user space process, where the processing work is very short, and then execution returns to the kernel to get the next packet. |
cpu |
us |
Time spent executing user code. |
This makes the distinction between kernel and user processing, and indicates the percentage of time the CPU (across all CPU cores) is spending executing user processes. This number is generally going to be high on most busy systems. |
cpu |
sy |
Time spent executing kernel code. |
Indicates the amount of time the CPU (again, all cores) is spending executing system (kernel) code. It’s the amount of time your application is relying on operating system services. This number will be high when a process does a lot of I/O, because user processes must go through the kernel to access all I/O devices. |
cpu |
id |
Time spent idle. |
This indicates the percentage of time across all CPU cores that the CPU is idle. Just what it says: the CPU is sitting around doing nothing. |
cpu |
wa |
Time spent in I/O wait. |
This indicates the amount of time the CPU spent waiting on I/O to complete. When this number is large, it generally means the system is bottlenecked waiting for I/O to complete. |
cpu |
st |
Time stolen from a virtual machine. |
A fairly recent addition, this indicates the amount of time taken when your virtual CPU was ready to run something, but your hypervisor ran something else instead. This is valuable only when the server is a virtual instance, and indicates when the virtual machine is losing time to other virtual machines or other hypervisor activity. You can use this to find out when you’re sharing resources with too many other VMs, and that that is the cause of any slow downs. This can be very useful in public clouds. |
There are other options to vmstat, but these are the key ones that have been in use for years, and survived the test of time.
Ok, Genius, I know what it all means, now how do I use it?
From the table, above, you can use vmstat to identify the following conditions as they occur, so here’s what you want to look for:
Condition |
How to spot it |
What to do about it |
---|---|---|
App is slow because the server is swapping (paging). |
Look for non-zero swap/si and swap/so values, and generally for increasing or high memory/swpd (and potentially low values for memory/free, though this is not a great indicator, as mentioned above). |
Limit the amount of memory the app is using (for example, on a web server, limit the number of processes available to serve requests) until the condition abates, or add more memory to the server. |
App is slow because the server is doing lots of I/O. |
Look for high values of IO/bi and IO/bo, and CPU/wa. This will show when lots of block I/O is going on, and when processes are unable to do work, because they’re waiting for I/O to happen. This is a very frequent cause of slowdowns on database servers, and may indicate that you need to optimize indexes on your tables. |
See if you can reduce the overall amount of I/O. For a database server, you may need to create an index on some table(s). For other applications, you may want to see if the data you’re processing can be compressed. That will move more load into the CPU, and off the I/O device. |
App is slow because of interrupts or context switches. |
Look for high System/in and System/cs values. In this case, high means tens to hundreds of thousands of events per second, or more. |
This is a tricky one to solve, and may require new device drivers, kernel parameter tweaks, or completely rearchitecting your application. In the case of a network application, you may want to see if a load balancer could spread the load across multiple servers. |
App is slow because my hypervisor (or my public cloud server) is overloaded. |
Look for high CPU/st values. |
Complain to your public cloud provider, move to a different cloud server (maybe a different region or data center), or to a different cloud provider. |
Addendum: How much swap space do I have?
It’s easy in Linux to see how much swap space is configured (and easy to add more, actually):
[devbox opt]$ free total used free shared buffers cached Mem: 1020532 924176 96356 0 161868 327668 -/+ buffers/cache: 434640 585892 Swap: 2064376 4292 2060084
This says that this server has 2064376 KB (2GB) of swap space configured (under Swap: total).