After a rather lengthy period of time, I've finally achieved the redundancy I've been
looking for...at least for now. The basic idea is to have two Raspberry Pi's running as a
active-passive failover 'cluster'. I say 'cluster' because it's not REALLY an enterprise cluster
with shared NAS/SAN.
Here are some of the details:
Using a bash shell script, I created a simple ping test from the secondary pi server. See code and
comments below:
#!/bin/bash
#---------------------------------------------------------------------------
# GLOBAL VAR # These are global variables I assigned to make future
PRIMARY=192.x.x.x # edits easier.
FAIL="100%" # PRIMARY = primary IP
NOW=$(date) # FAIL = indicates 100% ICMP failure if success, 0% loss
#---------------------------------------------------------------------------
# HERE COME THE FUNCTIONS...
function failOVER
{
rm -f ping.test
ping $PRIMARY -c 1 >> ping.test
.
.
.
# This is repeated 7 times. I learned if I used '-c 7' the count I set failed (only
# registered ICMP instance) so I stacked 7 seperate ICMP ping instances
# failover is designed to conduct a more intense ping scan to make sure thing are really
# broken
TEST=$(cat ping.test | grep $FAIL | wc -l)
if [ $TEST -eq 7 ]
then resetSERVER
fi
}
function resetSERVER
{
# reset network
# I have two interfaces files in /etc/network, one for current and one for failover...
rm -f /etc/network/interfaces
mv /etc/network/interfaces.failover /etc/network/interfaces
# reset hostname (delete /etc/hostname and recreate)
rm -f /etc/hostname
echo xiphos-tech.info >> /etc/hostname
# reset hosts file
rm -f /etc/hosts
touch /etc/hosts
echo 127.0.0.1 xiphos-tech.info >> /etc/hosts
echo ::1 localhost ip6-localhost ip6-loopback >> /etc/hosts
echo fe00::0 ip6-localnet >> /etc/hosts
echo ff00::0 ip6-mcastprefix >> /etc/hosts
echo ff02::1 ip6-allnodes >> /etc/hosts
echo ff02::2 ip6-allrouters >> /etc/hosts
echo 127.0.1.1 xiphos-tech.info >> /etc/hosts
echo $NOW FAILOVER: Server assuming XIPHOS-TECH.INFO Primary >> /var/log/messages
/sbin/shutdown -r now
# Had trouble with 'restart' or 'shutdown', but full path + actually stating the time 'now' worked
}
#----------------------------------------------------------------------------------------------------
# MAIN where we really start...if you've made it down this far...
ping $PRIMARY -c 1 >> ping.test
TEST=$(cat ping.test | grep $FAIL | wc -l)
if [ $TEST -eq 1 ]
then
failOVER
echo $NOW Heartbeat to XIPHOS-TECH.INFO - P failed >> /var/log/messages
else
echo $NOW Heartbeat to XIPHOS-TECH.INFO - P was successful >> /var/log/messages
fi
rm -f ping.test
#------------------------------------------------------------------------------------------------------
While my ping script is really exciting...*not*, what really makes this work is the cron job that runs every 5 seconds.
It's not very complicated, just
/5* * * * * /opt/my-heartbeat