Automatic timezone conversion with JavaScript
Timezones are a pain to deal with as a programmer but our server monitoring service, Server Density, has handled them for some time. It requires you to select your timezone from your profile and we then adjust the times appropriately. It still requires an action on the user’s part so over the last few weeks we’ve been running a test on the dashboard to use JavaScript to automatically convert the timestamps we store in our database in GMT/UTC, to the time of the local system browser.
This means that if you change the timezone of your system and reload the dashboard (you may need to restart your browser), the timezone will automatically be detected and the right date/time displayed for you. This is particularly useful if you travel a lot, or have different computers in different timezones that you move between. Its also removes an unnecessary setting.
The code behind the conversion
The conversion has to be done from a UTC/GMT timestamp so if you view source on the dashboard you’ll see that we output a full RFC 2822 formatted date. This includes the timezone and can be anything that can be parsed by the JavaScript Date method. This is contained inside an HTML span tag so we can extract it in the JS later:
<span class="convertTimestamp">Fri, 03 Sep 2010 09:36:45 +0000</span>
This is then picked up on page load by a JQuery $(document).ready() call to our conversion method. Now our code includes some formatting fluff to make the output useful but the raw code that does the conversion is extremely simple.
You first read the timestamp into a Date object:
var utcDate = new Date("Fri, 03 Sep 2010 09:36:45 +0000");
And then you just call the output methods to display the parts you want. E.g.
document.write(utcDate.getDate());
would output 3 because we passed in the date 3rd Sept. But depending on your timezone this would be converted to your local date when it’s output. And that’s it! We’ve tested in the latest versions of Chrome, Firefox, Safari and Mobile Safari on the iPad and iPhone, and they all convert to the local timezone.
Our full conversion code
The Date methods aren’t 100% useful for outputting a nicely formatted date, so we do a small amount of formatting ourselves. This code is contained within a JS file and gets included on pages we want to the conversion on. It looks for timestamps inside a span with the class convertTimestamp and it requires JQuery.
We’re releasing this under the FreeBSD license like our monitoring agents, so you can do what you like with it.
var DateFormatting = {
init: function()
{
var months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'];
// Loop over and write in the date formatted using the local
$('span.convertTimestamp').each(function()
{
var utcDate = new Date($(this).html());
// Add leading zeros
// Hours
var hours = new String(utcDate.getHours());
if (hours.length == 1)
{
hours = '0' + utcDate.getHours();
}
// Minutes
var minutes = new String(utcDate.getMinutes());
if (minutes.length == 1)
{
minutes = '0' + utcDate.getMinutes();
}
// Seconds
var seconds = new String(utcDate.getSeconds());
if (seconds.length == 1)
{
seconds = '0' + utcDate.getSeconds();
}
$(this).html(utcDate.getDate() + ' ' + months[utcDate.getMonth()] + ' ' + utcDate.getFullYear() + ' ' + hours + ':' + minutes + ':' + seconds);
});
},
}
$(document).ready(DateFormatting.init);
6 Simple Selling Tips For Software Entrepreneurs
Last week I wrote a guest post for popular startup blog, OnStartups.com, covering some of the sales activities we do to increase conversions and get trial users to upgrade:
“Sales” is sometimes perceived with an aura of mystery. As a developer, it’s something I always assumed was reserved for a “real salesperson” and even though I was selling our products, I thought it was because the customer had come to us directly. Whilst there is an aspect of “sales” that involves cold calling and making visits to prospects, I actually recently realised that many of the things I’m doing on a daily basis are all part of what might be called “sales”.
What sales activities consist of will differ depending on what you’re selling, who you’re selling to (consumers, smaller businesses or enterprise) and how large your organisation is, but there are a few things that can be used universally and to great effect. Here are a few that I use with my own SaaS product.
FreeBSD monitoring + i/o statistics

We have just updated the Server Density monitoring agent which now includes full FreeBSD support and new i/o metrics. Just update your agent and the i/o metrics will start to be graphed within 5 minutes – you don’t need to do anything. And you can of course add them to your dashboard.
If you used our OS packages the agent will update automatically when your package manager next runs its update check. If you have manually installed the agent then you’ll need to run the python agent.py update command.
Instructions for installing the agent onto FreeBSD can be found on the normal agent instructions. This appears as the next page after you add a server and can be accessed from the “Agent” link from the Servers tab.
UPS delivers at 11.22MB/s
We recently had one of our MongoDB database slaves go out of sync which requires a full resync from scratch. Because we have our slaves in a different data centre, it takes quite a lot of time for the data to transfer, indexes to be built and the slave to get back up to date again; 6 days in fact.
Due to the time it took, our oplog on the master was too small – it has to hold all activity on the master from when the sync starts to when it finishes so when the initial sync is completed, the slave can run through all the operations and get up to date again. We could have increased the size but it was already at 75GB and would require double that size. This is sufficient for the slave to be taken offline even for a several days (for maintenance for example) and then be able to catch back up again, but is not sufficient for a full resync.
150GB is too large so, we decided to have our hosting provider, Rackspace, plug a USB drive into the master, run a slave off the disk so it would be copying the data locally and then physically ship the disk to the secondary data centre. This was all set up and the USB drive was unplugged at 16:09 UTC on 12th Aug 2010 after spending a few days syncing the slave. Rackspace passed the drive to UPS who then took the drive from our primary data centre in Virginia to the secondary data centre in Texas.
The USB drive was plugged back into our server on 13th Aug 2010 at 18:07 UTC with a total elapsed time of 1 day, 1 hour, 58 minutes, or 93,480 seconds. After we’d copied all the data to the internal disks, the USB drive was shipped to me in the UK.
Using the inter-data-centre transfer time, we can figure out the data transfer rate:
500GB / 93480 = 0.005348737697903 GB/s or
5.477107402652672 MB/s (5.48MB/s rounded)
If I use scp to copy a large file across between the two servers, then I get a slightly faster speed:
david@asriel ~: scp david@internal.stelmaria.boxedice.net:~/local/local.11 . local.11 100% 2047MB 6.8MB/s 05:01
However, whilst these figures are correct for the amount of data we transferred, the USB drive was actually 1TB and so the real data transfer rate is different:
1024GB / 93480 = 0.010954214805306GB/s or
11.217115960633343 MB/s (11.22MB/s rounded)
It’s interesting to see these real world figures, although not entirely surprising. Pingdom wrote a short post about how FedEx is still faster than the internet back in 2007, particularly for large data sets. High speed connectivity is expensive and of course there’s always the speed of light limit. Hopefully that’ll be solved by quantum networks sometime in the future!
The exciting adventures of an alert notification
A year ago, the alert and notification systems for our server monitoring service, Server Density, were very simple. They were based on batch cron jobs which processed all the items in a database table every minute.
Since then, we have grown significantly and this would no longer work. We now have a very robust alert and notification backend which can easily be scaled just be adding new servers. It’s quite interesting from a technical standpoint, so this is the exciting story of the adventure an alert notification takes through our systems to your inbox.
1: Agent sends a postback
Our monitoring agent reports back every 60 seconds. The stats payload is sent over HTTP (or HTTPS) as a JSON object and is immediately inserted into the database to display the latest data on the dashboard, through the monitoring API and on our graphs. The data is also stored in a postbacks capped collection inside MongoDB. A separate process transfers these JSON payloads from the postbacks collection into our RabbitMQ alertdetection queue. The web server does not queue directly to RabbitMQ because the various PHP AMQP libraries we tried caused too much load on the web server.
2: Is there an alert condition?
We have multiple RabbitMQ consumers listening to the queue waiting for new items. One of these sees there’s a new alertdetection item and pulls it down. The message pulled from the queue contains the same raw JSON payload. The data is then parsed and compared to all configured alerts to see if there is an alert condition match. In this case, load is a bit too high and so triggers an alert.
3: An alert is triggered
Alerts can have a delay so we need to check the configuration to see if we should alert right away. In this case, we do so. The alert is set to be sent via e-mail and iPhone push notification so 2 queue items are entered into the iphonealerts and emailalerts RabbitMQ queues.
4: Notifications are sent
Different consumers listening to the notification queues pick up the new queue items entered. The iPhone alert payload is built and sent to the Apple Push Notification service whilst the e-mail message is also constructed by a separate process, and then the Postmark API is called. The e-mail data is sent to Postmark to be queued and delivered.
5: The problem hasn’t been fixed
The alert is configured to alert every 5 minutes until the alert condition disappears. Every time the stats postback comes in, we run the comparisons and check already triggered alerts to see if there’s anything we need to do. 5 minutes later we see that the alert is still open and new notifications are triggered.
6: All is well
Shortly afterwards, a postback comes in with the alert condition fixed – load is back down again. We mark the alert fixed and send notifications to tell the user that all is well again.
But what if we stop receiving data?
If your server stops reporting back then there’s no payload to trigger the alert process. As such, we run a separate set of consumers which constantly check to see if we have data from your server and if we’ve stopped receiving postbacks, we’ll trigger the no data alerts after the time period defined in the alert configuration.
It takes seconds
From postback payload coming in to notifications being delivered only takes seconds because we can easily scale out the number of consumers running and processing queue items. Every action is logged and these are exposed in the alert log within the Server Density UI so you can see the times between events.
We are always working on improving this and one of the items on our roadmap is to combine the 2nd step so that the alert triggering bypasses the database and can get inserted into the queue immediately. Unfortunately the various PHP AMQP libraries available aren’t robust enough (connection pooling is the main thing missing here) to handle that many inserts so we’re investigating other queuing systems and methods of handling the high number of inserts.
Automating partitioning, sharding and failover with MongoDB
Two of the most eagerly anticipated features for MongoDB, the database backend we use for our server monitoring service, Server Density, are auto sharding and replica sets. Sharding will allow us to let MongoDB handle distribution of data across any number of nodes to maximise use of disk space and dynamically load balance queries. Replica sets provides automated failover and redundancy so you can be sure your data exists on any number of servers across multiple data centres.
This functionality has been in development for some time and it finally entering stable in the upcoming v1.6.0 release, due out in a few days. This post will take you through the basics of setting up a MongoDB cluster using auto sharding and ensuring you have full failover using replica sets.
Starting up the replica set
You can have any number of members in a replica set and your data will exist in full on each member of the set. This allows you to have servers distributed across data centres and geographies to ensure full redundancy. One server is aways the primary to which reads and writes are sent, with the other members being secondary and accepting reads only. In the event of the primary failing, another member will take over automatically.
The video embedded below shows the setup process.
You need a minimum of 2 members in each set and they must both be started before the set becomes available. We will start the first one on server1A now:
./mongod --rest --shardsvr --replSet set1/server1B
--restThis enables the admin web UI which is useful for viewing the status of the set. It is publicly accessible via HTTP on port 28017 so ensure it is properly firewalled.--shardsvrThis enables sharding on this instance, which will be configured later.--replSetThis uses the formsetname/serverList. You must give each set a name (“set1″) and specify at least 1 other member for the set. You do not need to specify them all – if the instance gets terminated, it will re-read the config from the specified servers when it comes back up.
When in production you will want to use the --fork and --logpath parameters so that mongod spawns off into a separate process and continues running when you close your console. They’re not used here so we can see the console output. Further tips about running MongoDB in the real world can be found here.
The naming convention we are using for the serverhostname is server[set][server], so this is server A in set 1 that will be connecting to server B in set 1. This just makes it a little easier to explain but in the real usage, these will need be actual hostnames that resolve correctly.
If you are running the instances on different ports, you must specify the ports as part of the parameters e.g. --replSet set1/server1B:1234,server1C:1234
You will see mongod start up with the following output to the console:
Sun Aug 1 04:27:15 [initandlisten] waiting for connections on port 27017 Sun Aug 1 04:27:15 [initandlisten] ****** Sun Aug 1 04:27:15 [initandlisten] creating replication oplog of size: 944MB... (use --oplogSize to change) Sun Aug 1 04:27:15 allocating new datafile data/local.ns, filling with zeroes... Sun Aug 1 04:27:15 [startReplSets] replSet can't get local.system.replset config from self or any seed (yet) Sun Aug 1 04:27:15 done allocating datafile data/local.ns, size: 16MB, took 0.036 secs Sun Aug 1 04:27:15 allocating new datafile data/local.0, filling with zeroes... Sun Aug 1 04:27:15 done allocating datafile data/local.0, size: 64MB, took 0.163 secs Sun Aug 1 04:27:15 allocating new datafile data/local.1, filling with zeroes... Sun Aug 1 04:27:16 done allocating datafile data/local.1, size: 128MB, took 0.377 secs Sun Aug 1 04:27:16 allocating new datafile data/local.2, filling with zeroes... Sun Aug 1 04:27:19 done allocating datafile data/local.2, size: 1024MB, took 3.019 secs Sun Aug 1 04:27:25 [startReplSets] replSet can't get local.system.replset config from self or any seed (yet) Sun Aug 1 04:27:35 [startReplSets] replSet can't get local.system.replset config from self or any seed (yet) Sun Aug 1 04:27:43 [initandlisten] ****** Sun Aug 1 04:27:43 [websvr] web admin interface listening on port 28017 Sun Aug 1 04:27:45 [initandlisten] connection accepted from 127.0.0.1:43135 #1 Sun Aug 1 04:27:45 [startReplSets] replSet can't get local.system.replset config from self or any seed (yet)
Next, we start the second member of the set (server1B). This will be connecting to server1A, the instance we just set up.
./mongod --rest --shardsvr --replSet set1/server1A
you will see similar console output on both the servers, something like:
Sun Aug 1 04:27:53 [websvr] web admin interface listening on port 28017 Sun Aug 1 04:27:55 [initandlisten] connection accepted from server1A:38289 #1 Sun Aug 1 04:27:56 [initandlisten] connection accepted from 127.0.0.1:48610 #2 Sun Aug 1 04:27:56 [startReplSets] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG) Sun Aug 1 04:27:56 [startReplSets] replSet have you ran replSetInitiate yet?
Initialising the replica set
Now the two instances are communicating, you need to initialise the replica set. This only needs to be done on one of the servers (either, it doesn’t matter) so from the MongoDB console on that server:
./mongo localhost:27017
MongoDB shell version: 1.5.7
connecting to: localhost:27017/test
> cfg = {
... _id : "set1",
... members : [
... { _id : 0, host : "server1A:27017"},
... { _id : 1, host : "server1B:27017"}
... ] }
> rs.initiate(cfg)
{
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
Here I created the config object and specified the members manually. This is because I wanted to specify the ports but you can just execute
rs.initiate()
and the current server, plus any members you specified on the command line parameters when starting mongod will be added automatically. You may also want to specify some extra options, all of which are documented here.
Perhaps the most important of these extra options is priority. Setting this for each host will allow you to determine in which order they become primary during failover. This is useful if you have 3 servers, 2 in the same data centre and 1 outside for disaster recovery. You might want the second server in the same DC to become primary first; setting its priority higher than the outside server allows this.
> cfg = {
_id : "set1",
members : [
{ _id : 0, host : "server1A:27017", priority : 2},
{ _id : 1, host : "server1B:27017", priority : 1}
{ _id : 2, host : "server1C:27017", priority : 0.5}
] }
The web console enabled by the --rest parameter can be accessed at the standard port 28017 e.g. http://example.com:28017. This shows the live status of the mongod instance.
Adding another server to the set
Adding a new server (server1C) to the replica set is really easy. Start the instance up specifying any one of the other members (or all of them) as part of the parameters:
./mongod --rest --shardsvr --replSet set1/server1A
then on the primary server, connect to the MongoDB console and execute the add command:
./mongo localhost:27017
MongoDB shell version: 1.5.7
connecting to: localhost:27017/test
> rs.add("server1C:27017")
{ "ok" : 1 }
This server will then become part of the set and will immediately start syncing with the other members.
Setting up sharding
Now we have our 3 member replica set, we can configure sharding. This has 3 parts:
- Shard servers – the
mongodinstances. We have already set these up with the--shardsvrparameter when starting eachmongod. - Config servers – these are
mongodinstances run with a--configsvrparameter that store the meta data for the shard. As per the documentation, “a production shard cluster will have three config server processes, each existing on a separate machine. Writes to config servers use a two-phase commit to ensure an atomic and replicated transaction of the shard cluster’s metadata.” mongos– processes that your clients connect to which route queries to the appropriate shards. They are self contained and will usually be run on each of your application servers.

The video embedded below shows the setup process.
Config servers
Having already set up the shard servers above, the next step is to set up the config servers. We need 3 of these, which will exist on each of our shard servers but can be on their own, lower spec machines if you wish. They will not require high spec servers as they will have relatively low load, but should be positioned redundantly so a machine or data centre failure will not take them all offline.
./mongod --configsvr --dbpath config/
--configsvrThis enables config server mode on thismongodinstance.--dbpathSince I am running this config server on a server that already has anothermongodinstance running, a separate data path is specified. This isn’t necessary if the config server is running on its own.
This is executed on each of our 3 servers already running the shards. The console output will be something like this:
Sun Aug 1 08:14:30 db version v1.5.7, pdfile version 4.5 Sun Aug 1 08:14:30 git version: 5b667e49b1c88f201cdd3912b3d1d1c1098a25b4 Sun Aug 1 08:14:30 sys info: Linux domU-12-31-39-06-79-A1 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41 Sun Aug 1 08:14:30 [initandlisten] diagLogging = 1 Sun Aug 1 08:14:30 [initandlisten] waiting for connections on port 27019 Sun Aug 1 08:14:30 [websvr] web admin interface listening on port 28019
nothing else happens until you connect the mongos to the config server.
Router processes
The router process is what you connect your clients to. They download the meta data from the config servers and then route queries to the correct shard servers. They stay up to date and are independent of each other so require no redundancy per se. Specify each config server in comma separated form:
./mongos --configdb server1A,server1B,server1C Sun Aug 1 08:19:44 mongodb-linux-x86_64-1.5.7/bin/mongos db version v1.5.7, pdfile version 4.5 starting (--help for usage) Sun Aug 1 08:19:44 git version: 5b667e49b1c88f201cdd3912b3d1d1c1098a25b4 Sun Aug 1 08:19:44 sys info: Linux domU-12-31-39-06-79-A1 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41 Sun Aug 1 08:19:44 SyncClusterConnection connecting to [server1A:27019] Sun Aug 1 08:19:44 SyncClusterConnection connecting to [server1B:27019] Sun Aug 1 08:19:44 SyncClusterConnection connecting to [server1C:27019] Sun Aug 1 08:19:54 [websvr] web admin interface listening on port 28017 Sun Aug 1 08:19:54 [Balancer] SyncClusterConnection connecting to [server1A:27019] Sun Aug 1 08:19:54 SyncClusterConnection connecting to [server1A:27019] Sun Aug 1 08:19:54 waiting for connections on port 27017 Sun Aug 1 08:19:54 [Balancer] SyncClusterConnection connecting to [server1B:27019] Sun Aug 1 08:19:54 SyncClusterConnection connecting to [server1B:27019] Sun Aug 1 08:19:54 [Balancer] SyncClusterConnection connecting to [server1C:27019] Sun Aug 1 08:19:54 SyncClusterConnection connecting to [server1C:27019]
Create the shard
Now you have all the mongo server instances running, you need to create the shard. Connect to the mongos instance using the MongoDB console, switch to the admin database and then add the shard.
./mongo
MongoDB shell version: 1.5.7
connecting to: test
> use admin
switched to db admin
> db.runCommand( { addshard : "set1/server1A,server1B,server1C", name : "shard1" } );
{ "shardAdded" : "shard1", "ok" : 1 }
> db.runCommand( { listshards : 1 } );
{
"shards" : [
{
"_id" : "shard1",
"host" : "server1A,server1B,server1C"
}
],
"ok" : 1
}
Note that the list of server hostnames includes the replica set name in the form [setName]/[servers]. The set name is what you called the replica set when you started the mongod instance with --shardsvr above. In our case we called it set1.
There are a number of config options here including the ability to set a maximum size on the shard so you can control disk space usage.
Shard the database
We can now finally use the shard by enabling sharding on a database and executing a couple of test queries. Here we will use the test database using the MongoDB console connected to the mongos instance:
> use admin
switched to db admin
> db.runCommand( { enablesharding : "test" } );
{ "ok" : 1 }
> use test
switched to db test
> db.hats.insert({hats: 5})
> db.hats.find()
{ "_id" : ObjectId("4c5568021fd8e7e6a0636729"), "hats" : 5 }
You can confirm this is working from the console output on each of the shards themselves:
Sun Aug 1 08:26:42 [conn6] CMD fsync: sync:1 lock:0
Sun Aug 1 08:26:42 [initandlisten] connection accepted from 10.255.62.79:38953 #7
Sun Aug 1 08:26:42 allocating new datafile data/test.ns, filling with zeroes...
Sun Aug 1 08:26:42 done allocating datafile data/test.ns, size: 16MB, took 0.046 secs
Sun Aug 1 08:26:42 allocating new datafile data/test.0, filling with zeroes...
Sun Aug 1 08:26:42 done allocating datafile data/test.0, size: 64MB, took 0.183 secs
Sun Aug 1 08:26:42 [conn6] building new index on { _id: 1 } for test.hats
Sun Aug 1 08:26:42 [conn6] Buildindex test.hats idxNo:0 { name: "_id_", ns: "test.hats", key: { _id: 1 } }
Sun Aug 1 08:26:42 [conn6] done for 0 records 0secs
Sun Aug 1 08:26:42 [conn6] insert test.hats 248ms
Sun Aug 1 08:26:42 [conn6] fsync from getlasterror
Sun Aug 1 08:26:42 allocating new datafile data/test.1, filling with zeroes...
Sun Aug 1 08:26:42 done allocating datafile data/test.1, size: 128MB, took 0.402 secs
Sharding the collection
The database test is sharded now but the documents will only exist on a single shard. To actually use the automated partitioning of data, you need to shard at the collection level. For example, setting the shard key on a timestamp will cause MongoDB to partition the data across shards based on that timestamp e.g. day 1 on shard 1, day 2 on shard 2, etc.
In our example above, we only have 1 shard and the test document is very simple but we could create a shard key on the number of hats:
> use admin
switched to db admin
> db.runCommand( { shardcollection : "test.hats.hats", key : { hats : 1 } } )
{ "collectionsharded" : "test.hats.hats", "ok" : 1 }
In the mongos console you will see
Mon Aug 2 22:10:55 [conn1] CMD: shardcollection: { shardcollection: "test.hats.hats", key: { hats: 1.0 } }
Mon Aug 2 22:10:55 [conn1] enable sharding on: test.hats.hats with shard key: { hats: 1.0 }
Mon Aug 2 22:10:55 [conn1] no chunks for:test.hats.hats so creating first: ns:test.hats.hats at: shard1:set1/server1A,server1B,server1C lastmod: 1|0 min: { hats: MinKey } max: { hats: MaxKey }
The default chunk size is 50MB so data will not start to be distributed to multiple shards until you hit that.
Notes on failover
- Terminating a mongod instance, either a config server or a shard server, will have no effect on the availability of the data and the ability to perform all operations on it. A detailed explanation of what happens when certain servers fail can be found here.
- However, in our case, if 2 out of 3 of the members of a replica set fail, the set will become read only even though there is a server remaining online. (source).
- As such, a 3 member replica set with 2 members in one data centre and 1 member in another has a point of failure if you want the set to remain fully operational in the event of a DC outage – the 1 server on its own. To protect against this you would need to have 4 members – 2 per data centre.
- Multiple replica sets per shard are not supported (source)
Future expansion
Now this is set up, adding additional shards is as easy as provisioning a new replica set and using the addShard command on that new set. The data will be balanced automatically so you have real horizontal scaling.
We have not yet deployed sharding and replica sets into production with Server Density – this is on our roadmap so I’ll be reporting back in a couple of months when we have been using it for some time. Subscribe to stay up to date!
Developing for monitoring Windows servers has been interesting. With Linux backgrounds, we’ve learnt a lot in the short time our Windows agent has been available for our server monitoring service, Server Density.
We have just released an update to the agent which introduces many improvements since the 1.0 release. 1.1.0 was released a week or so ago, and this is a further update to that.
The new agent uses a different system API to collect the statistics we use to make it more reliable and compatible across different Windows versions. API availability can differ across the Windows OSs we support so this provides a more consistent reporting mechanism.
CPU statistics
We have also modified the way we collect CPU statistics. On Linux, the load average is useful because it is averaged (hence the name!). On Windows, the CPU % shows the value at the time you query it. This means you might get a sudden spike or miss spikes that happen when the agent is not sampling. As such, we now take regular samples and average those over the sample period (60 seconds) to give a more useful picture of CPU load.
We’re also collecting minimum and maximum values. These are not yet exposed in our UI but are being stored for a future release so we can provide more useful troubleshooting.
MongoDB & IIS
v1.1 also introduced MongoDB support for Windows deployments (docs), as well as improved IIS statistic collection.
Updating
As part of our security policy, the agent does not update itself so you will need to launch the config app either from the Start menu or from the Server Density tray icon. This will prompt you that there is an update and then run the updater.
Let us know if you have any problems.
We have just pushed out an update to our server monitoring service, Server Density. A few weeks ago we spent a lot of time going through every page in the app and this is the first set of UI changes as a result. We’re also including updates to our backend processing, read on for more details.
The dashboard was designed back in the days of just a few metrics and no plugins. It’s had a few iterations since then but to use it as a general status display for your server infrastructure, it really needed more flexibility.
Enter the new dashboard functionality, which allows you to reorder servers and choose which metrics to display. You can remove a server from view by editing it in the Servers tab, reorder the servers by dragging the handle in the top right corner and add metrics using the button at the bottom of each server.
All metrics are supported, including those with multiple options like disk usage (choosing the mount point) or RabbitMQ (choosing your queue name). Plugins are also supported. Your preferences are saved on a per user basis so everyone can build a dashboard to their liking.
Alert history
Our original UI was very simple – for a long time we’ve only shown the last 5 triggered alerts on the dashboard. Now, you can view a full history of each alert triggered over any date range you specify. Just click the History link next to an alert from the Alerts tab. This is the first part of more detailed information about your alerts which we have been storing for a long time, but are now able to expose properly through our UI. Note the link only shows if the alert has ever been triggered.
Notifications backend
Although there’s nothing to see and no shiny UI, the notifications back end is just as important because it’s what processes all your alerts and sends notifications off. We have pushed out improved processing daemons which are now able to handle more alerts, faster, so you will notice that notifications reach you much sooner after an alert is triggered and notification processed.
We are also now trialing a different e-mail backend – Postmark. We have been using their excellent service for some time for our transactional e-mails – payment notifications, signup e-mails, password resets but we have now just switched on using them for alert e-mails too. Since they specialise in e-mail delivery and allow us to use more complex things like bounce detection and domain keys, we can be sure that messages are getting delivered to you promptly. And when there is a problem it makes it much easier to troubleshoot.
If you open the headers of an alert e-mail from us you’ll see their magic at work! This removes the need for us to run our own mail server, which are notorious for problems, and gives us a cool UI to track e-mails we sent. My only concern is the pricing per e-mail – it could add up to quite a large number at our volumes of mail, something I shall be keeping an eye on.
Other tweaks
We’ve made various other tweaks to the UI (including auto login!) that you’ll hopefully notice using the app, with more to come in the future!
Server Density architecture choices
Our server monitoring service, Server Density, is now over a year old. We’re continuing to release improvements but over the last 12 months, many of the decisions made at the start have come up in customer support requests – why isn’t this available? why does this work like that? This is always the same with any software product and the only way to change core elements is a complete rewrite, or major refactoring. Neither of these is necessary and a complete rewrite is almost always a bad idea. So what choices were made at the beginning?
Push only
The Server Density agents sent data to our servers. We never communicate with your servers – everything is one way. The first reason for this is security – it means a breach on our servers will not affect you servers.
Secondly, it means we do not need to maintain a larger server infrastructure because we only have to handle incoming data. We do not need servers all over the world to do availability checks nor do we need to queue requests and ensure every monitored server gets contacted within a specified time period (e.g. checking every minute).
Thirdly, it means we do not require any open ports other than standard HTTP (port 80, or 443 if you choose to use HTTPS). Almost every server has outbound access to the internet so this avoids firewall and network configuration issues. We can utilise a protocol that is well defined and supported, without needing any custom programming.
No auto update
If you install our agent manually then there is no automatic update. You must manually execute a command. This is for security so that we can’t automatically install software on your servers, so a breach on our servers would not affect you.
However, if you use our OS packages (yum/apt) then we use the built in updater to automatically keep the agent up to date. This will be as part of your normal OS package management update process so you should be aware when this happens, and you can disable it or block certain packages. But most importantly, all installs/updates have to be signed with our key pair. We keep this secure so you can be sure that when an update is applied, it is really from us.
HTTP by default, not HTTPS
Our agent communicates with us over HTTP by default, which is not encrypted. Although we provide our web UI URL using HTTPS by default, the agent does not. This follows on from the above discussion about open ports. It makes it very easy to install the agent first time, and you can easily change to use HTTPS by adding a character to the URL in the config.
Core metrics not plugin based
The original version of the agent had no support for plugins and included all the metrics in the agent code out of the box. There were only a few metrics too, which made the agent extremely simple to set up and use. If we were writing the agent again we would probably make it modular but the original prototype of the agent was written quickly to get the product out as a beta.
Agent key based server identification
The agent key uniquely identifies each server by default. This allows you to change the IP and hostname of a server without affecting the monitoring. However, we also include an auto copy mechanism so that if the hostname does change, we assume it is a new server and add it to your account using agent key and hostname combined to uniquely identify it. No server should have the same hostname and hostnames should never change in production.
Interesting choices
Every choice is right at the beginning. Some might need to be adapted as customers use the product (like our agent key based identification), others would be changed if you wrote the code again (like our plugins system) and still others would never change. It’s interesting to be able to look back and see how things were different.
Map reduce and MongoDB
As a long time PHP developer and inactive member of the PHP documentation team, I am used to the excellent manual and clear examples for every aspect of the language. The lack of comparable documentation is always something I notice when using other languages; even if the documentation is good, it rarely compares to the PHP docs.
Of course, PHP has been around for a long time and expecting other, much newer projects to have the same level of coverage is unfair. So when I started to play around with the aggregation and map reduce functions within MongoDB, the database we use to power our hosted server monitoring tool, Server Density, it took some time and much trial and error to figure out how to use it. This was partly because I’d not used map reduce before, and partly because the documentation and examples seemed to assume a minimum level of experience.
But with a bit of searching and running queries myself, I was able to figure it out. This short post will provide a few examples of what I discovered, and point to a few useful resources elsewhere, with the hope of helping anyone else who wants to use map reduce and MongoDB.
The problem
We store a lot of numerical time series data for each server that reports into Server Density. I wanted to extract some statistics about the average values. For example, getting the average CPU load over a specific time period.
Group
This problem can be solved quite easily using a group query. This allows you to do almost exactly what you can do with a map reduce function – perform operations on the returned data. In my case, grouping every document in a collection and then performing an average calculation on the results would look like this:
db.sd_boxedice_checksLoadAvrg.group(
{
initial: {count: 0, running_average: 0},
reduce: function(doc, out)
{
out.count++;
out.running_average+=doc.v;
},
finalize: function(out)
{
out.average = out.running_average / out.count;
}
}
);
which generates output directly to the console:
[
{
"count" : 204856,
"running_average" : 131204.59999999776,
"average" : 0.6404723317842669
}
]
You can also specify a cond parameter to the group() function and it will allow you to query the collection so it will only return documents matching that condition. This would allow you to specify a date range, for example
However, the group() function does not work in a sharded environment. Although not in production, we are planning to move to sharding when it is ready and so it made no sense to write code that made use of a function that will not be supported in the future.
Further, “the result is returned as a single BSON object and for this reason must be fairly small – less than 10,000 keys, else you will get an exception.” (from the docs) Map reduce is therefore the only option.
Map reduce
When we were evaluating database options this time last year, one of the reasons I dismissed CouchDB was its requirement that all queries be map reduce. MongoDB allows for ad-hoc queries which suited our requirements much better. But MongoDB added map reduce in v1.2 and is perfect for this kind of use case, even if it is a little more complicated to understand.
This is not a tutorial on the fundamentals of map reduce but it basically consists of two steps – get the data (map) then perform some intended function (reduce), with an optional one time end operation (finalize). The result is stored in a new collection so you can query it separately. This allows you to run jobs in the background whilst the older results are still being used.
In our case, the direct translation of the previous group query is:
res = db.runCommand( {
mapreduce: 'loadAverages',
map: function() {
emit( { }, { sum: this.v, recs: 1 } );
},
reduce: function(key, vals) {
var ret = { sum: 0, recs: 0 };
for ( var i = 0; i < vals.length; i++ ) {
ret.sum += vals[i].sum;
ret.recs += vals[i].recs;
}
return ret;
},
finalize: function (key, val) {
val.avg = val.sum / val.recs;
return val;
},
out: 'result1',
verbose: true
} );
This drops the results directly into a new collection called result1. I can then output the results by querying that collection:
> db[res.result].find()
{ "_id" : { }, "value" : { "sum" : 131204.59999999776, "recs" : 204856, "avg" : 0.6404723317842669 } }
It this case it has executed over every document in the collection but I can provide many other parameters to the map reduce function to narrow the results down (e.g. by date range) – query, sort and limit are supported.
A few things to be aware of
As of the current stable version (1.4.x):
- Map reduce seems quite slow over a lot of documents. This is because it runs using the Javascript engine which is single threaded. The query above was executed on 204,865 documents and took 24,385ms to complete.
- Sharding will help with this problem because the job will be distributed across servers (if your data is sharded of course).
- Map reduce “almost” blocks in MongoDB 1.4. It doesn’t technically block since it “yields” where necessary, but we saw some blocking-like effects during the time the query was running. The next release addresses some of this to make it friendlier to other queries. See here.
- You can run the map reduce query on your slave by executing the command
db.getMongo().setSlaveOk()before your query. However, thegroupquery did not appear to work even with this flag set. I have reported this as a bug. - Group is usually faster the map reduce. It our examples above it completed in about 3 seconds. This is expected. (source).
- There is still development work planned to improve map reduce. See here.
Translate SQL to MongoDB map reduce
Perhaps the most useful resource I found was a PDF describing how to directly translate a SQL query into a MongoDB map reduce query. This was created by Rick Osborne but I have uploaded a copy here as it’s too useful to risk it disappearing!
Other useful resources
- MongoDB Aggregation I: Counting and Grouping
- MongoDB Aggregation II: Grouping Elaborated
- MongoDB Aggregation III: Map-Reduce Basics
- MongoDB Tutorial: MapReduce
Recommendation to doc writers
Start very simple and build up to the advanced queries, clearly explaining each element and the expected result. There is a lot of documentation that explains the really cool advanced stuff you can do but skips over the basis, which are just as important (if not more so).












