smartd Settings on a CentOS Server

smartd is a great tool to keep track of the health status of your server disks. It tracks the S.M.A.R.T records on specified periods and warns you in case anything goes wrong. Even though it is quiet simple, people can get lost while setting up their configuration. Here I’ll explain how my generic settings go. Keep in mind that this is for CentOS servers.

To install the service, simply get the smartmontools package via yum. This will also install mailx if isn’t already installed.

yum install smartmontools -y

Now a file named /etc/smartd.conf will be created. This is where we tell smartd what to do. First, learn the names of your devices using fdisk.

root@eaVT:~# fdisk -l

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0006f1aa

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048   943237119   471617536   83  Linux
/dev/sda2       943239166   976771071    16765953    5  Extended
/dev/sda5       943239168   976771071    16765952   82  Linux swap / Solaris

This output tells that I have one physical disk (/dev/sda) with three partitions (/dev/sda1, /dev/sda2, /dev/sda3). But we are only interested in the physical devices, which means smartd will only deal with /dev/sda.

Open /etc/smartd.conf using your favourite (vi?) text editor. Find the line that says
DEVICESCAN -H -m root
and comment it out. Then add this line
DEVICESCAN -S on -o on -a -m youremail@yourdomain.com -s (S/../.././02|L/../../0603) -M test
The result should look like this:

# The word DEVICESCAN will cause any remaining lines in this
# configuration file to be ignored: it tells smartd to scan for all
# ATA and SCSI devices.  DEVICESCAN may be followed by any of the
# Directives listed below, which will be applied to all devices that
# are found.  Most users should comment out DEVICESCAN and explicitly
# list the devices that they wish to monitor.
#DEVICESCAN -H -m root
DEVICESCAN -S on -o on -a -m youremail@yourdomain.com -s (S/../.././02|L/../../0603) -M test

Of course, don’t forget to replace it with your own email address. After this simply restart smartd service.

service smartd restart

Now wait for a while and check your email. According to my personal experience, it takes around 5-10 minutes to receive it. You will get a TEST email that says your disks have error. Now that we’ve established you can get the email when an error occurs, lets set it up to a real case.

Go back to /etc/smartd.conf and uncomment the line starting with DEVICESCAN. Don’t forget that there shouldn’t be any line starting with DEVICESCAN on this file, otherwise smartd will halt reading the conf file after it.

Now add the following lines to the /etc/smartd.conf

/dev/sda -H -C 0 -U 0 -m youremail@yourdomain.com
/dev/sda -d scsi -s L/../../1/01 -m youremail@yourdomain.com

Of course, replace the /dev/sda and email address according to yours.

The first line tells smartd to run a silence check on the /dev/sda disk and email us on any error.
The second line indicates that a long check will be made every Monday and 1 a.m. and on any error it will be mailed to us. If we wanted to make the test every Sunday at 6 p.m. the setting would have been L/../../7/18 -m youremail@yourdomain.com

If you’d like to add a new disk, (for example /dev/sdb) simply add it as a new line.

/dev/sda -H -C 0 -U 0 -m youremail@yourdomain.com
/dev/sda -d scsi -s L/../../1/01 -m youremail@yourdomain.com
/dev/sdb -H -C 0 -U 0 -m youremail@yourdomain.com
/dev/sdb -d scsi -s L/../../1/01 -m youremail@yourdomain.com

Now save the file and restart the service again.

service smartd restart

Normally, it is possible that the service won’t get started on reboot. You must add it with chkconfig in order to run it automatically in a CentOS box. To check it:

[root@emre ~]# chkconfig --list |grep smartd
smartd         	0:off	1:off	2:off	3:off	4:off	5:off	6:off
[root@emre ~]# chkconfig smartd on
[root@emre ~]# chkconfig --list |grep smartd
smartd             0:off    1:off    2:on    3:on    4:on    5:on    6:off

This means that it will run on user levels 2, 3, 4 and 5. What this means is a different story.

So that’s it for now.

Leave a Reply

Your email address will not be published. Required fields are marked *