Skip to content

CPU load becomes more relevant with virtualisation

December 22, 2011

On Linux, the decimal CPU load you can see with tools like top can generally be thought of as a queue indicator. If you have a single CPU[1] then if CPU load is below 1, nothing is having to wait for CPU time. Above 1 indicates that processes are having to wait for CPU time. This manifests itself as a slower overall response time or even requests backing up and timing out as the queue gets longer.

When we were using physical hardware, CPU load was generally ignored. This was probably a combination of fairly low load on the servers in those days anyway, but also the high spec, multi-socket, multi-core CPUs we had installed. On physical hardware those CPUs were 100% dedicated to serving our requests.

The same principle still applies with virtualisation, however there is another factor that comes into play – the host workload. As a VM, you rely on the host virtualisation layer to share out the physical CPU resources amongst all the guests, including yourself. If you don’t have control over the host such as with a VPS or on a cloud provider like Amazon EC2, this means you may be affected by their usage in unpredictable ways.

This is nothing new and one of the caveats of public virtualised environments but it means CPU load becomes relevant again. You might see low % utilisation but high CPU load because your requests for CPU time are being queued up.

Linux also has a metric called CPU steal – the st section in the top output. This indicates how much time is spent by the hypervisor servicing requests other than to your VM. It’s generally associated with the Xen hypervisor (which Amazon EC2 uses) but is also valid on VMWare platforms (where the equivalent metric in VMWare terms is CPU ready time). You can therefore see your usage (e.g. User %) as low but a high CPU steal %, which results in a high CPU load value.

top output showing CPU steal %

The combination of these 2 metrics allows you to see if VM performance problems are related to your host, which may require you to upgrade your instance type or get dedicated hardware.

[1] As a very simple explanation, the ratio changes based on the number of CPUs. See this wikipedia article for more details.

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,301 other followers