smartd is a great tool to keep track of the health status of your server disks. It tracks the S.M.A.R.T records on specified periods and warns you in case anything goes wrong. Even though it is quiet simple, people can get lost while setting up their configuration. Here I’ll explain how my generic settings go. Keep in mind that this is for CentOS servers.
To install the service, simply get the smartmontools package via yum. This will also install mailx if isn’t already installed.
yum install smartmontools -y
Now a file named /etc/smartd.conf will be created. This is where we tell smartd what to do. First, learn the names of your devices using fdisk.
root@eaVT:~# fdisk -l Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0006f1aa Device Boot Start End Blocks Id System /dev/sda1 * 2048 943237119 471617536 83 Linux /dev/sda2 943239166 976771071 16765953 5 Extended /dev/sda5 943239168 976771071 16765952 82 Linux swap / Solaris
This output tells that I have one physical disk (/dev/sda) with three partitions (/dev/sda1, /dev/sda2, /dev/sda3). But we are only interested in the physical devices, which means smartd will only deal with /dev/sda.
Open /etc/smartd.conf using your favourite (vi?) text editor. Find the line that says
DEVICESCAN -H -m root
and comment it out. Then add this line
DEVICESCAN -S on -o on -a -m firstname.lastname@example.org -s (S/../.././02|L/../../0603) -M test
The result should look like this:
# The word DEVICESCAN will cause any remaining lines in this # configuration file to be ignored: it tells smartd to scan for all # ATA and SCSI devices. DEVICESCAN may be followed by any of the # Directives listed below, which will be applied to all devices that # are found. Most users should comment out DEVICESCAN and explicitly # list the devices that they wish to monitor. #DEVICESCAN -H -m root DEVICESCAN -S on -o on -a -m email@example.com -s (S/../.././02|L/../../0603) -M test
Of course, don’t forget to replace it with your own email address. After this simply restart smartd service.
service smartd restart
Now wait for a while and check your email. According to my personal experience, it takes around 5-10 minutes to receive it. You will get a TEST email that says your disks have error. Now that we’ve established you can get the email when an error occurs, lets set it up to a real case.
Go back to /etc/smartd.conf and uncomment the line starting with DEVICESCAN. Don’t forget that there shouldn’t be any line starting with DEVICESCAN on this file, otherwise smartd will halt reading the conf file after it.
Now add the following lines to the /etc/smartd.conf
/dev/sda -H -C 0 -U 0 -m firstname.lastname@example.org /dev/sda -d scsi -s L/../../1/01 -m email@example.com
Of course, replace the /dev/sda and email address according to yours.
The first line tells smartd to run a silence check on the /dev/sda disk and email us on any error.
The second line indicates that a long check will be made every Monday and 1 a.m. and on any error it will be mailed to us. If we wanted to make the test every Sunday at 6 p.m. the setting would have been L/../../7/18 -m firstname.lastname@example.org
If you’d like to add a new disk, (for example /dev/sdb) simply add it as a new line.
/dev/sda -H -C 0 -U 0 -m email@example.com /dev/sda -d scsi -s L/../../1/01 -m firstname.lastname@example.org /dev/sdb -H -C 0 -U 0 -m email@example.com /dev/sdb -d scsi -s L/../../1/01 -m firstname.lastname@example.org
Now save the file and restart the service again.
service smartd restart
Normally, it is possible that the service won’t get started on reboot. You must add it with chkconfig in order to run it automatically in a CentOS box. To check it:
[root@emre ~]# chkconfig --list |grep smartd smartd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
[root@emre ~]# chkconfig smartd on [root@emre ~]# chkconfig --list |grep smartd smartd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
This means that it will run on user levels 2, 3, 4 and 5. What this means is a different story.
So that’s it for now.