While working on RDS database, several times we observe that RDS slowdowns due to high CPU load which can cause due to lots of queries hitting to DB and run at the same time. surely you can setup a monitoring system which can send you alerts related to CPU load. But manually checking the db, finding running queries is little bit lengthy task when the db is in production. isn’t it? Instead of manual work, we can setup or tweak monitoring system like Nagios to send reports with running database queries when Cpu load is high on RDS. This post exactly shows how to do it…! As we all know that Nagios is capable of event handling so it can send alerts depends of the service state change. So while monitoring RDS cpu load, when Nagios detects that cpu load is more than the defined value, it will login to RDS instance, collect the running queries and send it as a report with a alert email. It will be easy for administrator to find out faulty queries and perform the remaining task.To achieve this, we need working Nagios setup and Nagios AWS Cloud Watch metric plugin which can be found at this link ( click on link)There are 4 files,
a) check_cloudwatch_status.rb : The ruby script that retrieve the metric from AWS EC2, ELB or RDS.
b) check_cloudwatch_status.cfg : The definition of the nagios command.
c) sample_service.cfg : Sample of use for retrieving CPU usage of an instance on Amazon EC2.
d) encrypt_credentials.rb : Ruby script that encrypts Amazon Web Services Credentials.
check_cloudwatch_status.rb is the main monitoring script and you need ruby and rubygems packages installed on your system. Once you install these softwares, dont forget to run “gem install amazon-ec2”. This script uses amazon-ec2 to call instance api’s on AWS. Now we will do the real work.a) Log on to your AWS account with admin account and download your security keys using IAM.b) Copy AWS .pem key in directory where you downloaded cloudwatch metric files as it will be necessary to create credentials to connect the instance you wish to monitor.c) Edit encrypt_credentials.rb and check_cloudwatch_status.rb.Replace “/etc/cloutomate/cloutomate.pem ” with “/pathto/.pem” ( your .pem file path ) .d) Create credentials as below example command,
RUBYOPT=rubygems ruby encrypt_credentials.rb -A XXXXXXXXXXXXXXXXXXXX -S XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX -f ec2_credentials.cfg ( Where -A is your User ID Key and -S is your security key.) You should see your keys are stored in encrypted format inside ec2_credentials.cfg.
e) Unfortunately check_cloudwatch_status.rb has some coding bugs which I saw when I initially ran this script to check my aws instance so to fix it , edit check_cloudwatch_status.rb and replace all, + e with #{e}” .f) This version of check_cloudwatch_status.rb call apis on region 1 for example, us-east-1, us-west-1, So if you planning to monitor instance on region 2, add urls in check_cloudwatch_status.rb for respective regions for example, “ec2.us-west-2.amazonaws.com“.g) check_cloudwatch_status.rb has multiple options to check multiple metrics on aws instance, such as ec2, rds ,elb etc. You can list them by hitting “ruby check_cloudwatch_status.rb”. Here is the example command of monitoring Cpu usage of ec2 instance,
RUBYOPT=rubygems ruby lib/nagios/check_cloudwatch_status.rb -a ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com -i i-XXXXXXXX -f ec2_credentials_1.cfg -C CPUUtilization -c 90 -w 75 ( where -a is instance public dns url, -i is a instance id ) Output you get is , CloudWatch Metric: CPUUtilization, Average: 16.81, Maximum: 22.93, Minimum: 11.56|metric_average=16.81 metric_maximum=22.93 metric_minimum=11.56
At this point our initial setup is good and next part is integrating it with Nagios.You can refer check_cloudwatch_status.cfg and sample_service.cfg to get an idea about Cloudwatch metic Nagios setup. For this setup, assuming there is a working nagios setup, I am writing steps for adding RDS Cpu utilization monitoring with email alerts containing running database queries when nagios state changes.1) Create RDS instance if it is not already created. Connect to RDS database with mysql client and create user which will have access over all other users activities.2) Run check_cloudwatch_status.rb script on command line to check Cpu utilization of RDS database ,
RUBYOPT=rubygems ruby lib/nagios/check_cloudwatch_status.rb -a db1-XXX-XXX-XXX-XXX.compute-1.amazonaws.com -i db1 -f ec2_credentials_1.cfg -D CPUUtilization -c 90 -w 75 ( -D is a metric option for RDS ) and you should get result as shown above.
3) Install ssmtp package and configure your email settings in /etc/ssmtp dir by editing revalises and ssmtp.conf files ( ssmtp is a sendmail client).4) Nagios configuration Part : Change dir to Nagios config dir example : /usr/local/nagios/etc/objects.a) Edit command.cfg and add
define command{
command_name notify-mysql-by-email
command_line /usr/bin/printf “To: $CONTACTEMAIL$nFrom: nagios@fqdnnSubject: RDS – ** mysql CPU Load Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ *nn `mysqladmin -h $HOSTADDRESS$ -u dbuser -ppassword processlist`n”|/usr/sbin/ssmtp -t
}
b) Edit contacts.cfg and add
define contact{
contact_name mysqlcpu
alias mysqlcpu
use generic-contact
service_notification_commands notify-mysql-by-email
email user1@example.com
}
define contactgroup{
contactgroup_name mysql
members mysqlcpu
}
c) Create check_cloudwatch_status.cfg and add
define command {
command_name check_cloudwatch_status_rds
command_line /pathto/ruby /pathto/check_cloudwatch_status.rb -a $HOSTADDRESS$ -i $ARG1$ -f $ARG2$ -D $ARG3$ –warning $ARG4$ –critical $ARG5$
}
d) Create rds-service.cfg and add,
define host {
use generic-host
host_name db1.XXXXXXXX.us-west-2.rds.amazonaws.com
address db1.XXXXXXXX.us-west-2.rds.amazonaws.com
}
define service {
use generic-service
host_name db1.XXXXXXXX.us-west-2.rds.amazonaws.com
service_description Cpu Usage
check_command check_cloudwatch_status_rds!db1!/pathto/ec2_credentials_1.cfg!CPUUtilization!1!2
#max_check_attempts 3
normal_check_interval 3
notifications_enabled 1
contact_groups mysql
}
e) Edit /usr/local/nagios/etc/nagios.cfg file and add
cfg_file=/usr/local/nagios/etc/objects/check_cloudwatch_status.cfg
cfg_file=/usr/local/nagios/etc/objects/rds-service.cfg
f) Restart the nagios service and if there is no configuration error, You can see the new service added on the nagios web front end. Next time, when CPU load service state changes, you start with getting email reports with RDS running queries.Thats it, You can add similar services for other AWS services. Have Fun.