5.2 KiB
5.2 KiB
Device Health Monitoring System
The Device Health Monitoring System automatically monitors all active and approved devices for heartbeat activity and sends alerts when devices go offline for extended periods.
Features
Automatic Health Monitoring
- Continuous Monitoring: Checks device health every 5 minutes
- Offline Detection: Devices are considered offline after 30 minutes without heartbeat
- Recovery Detection: Automatically detects when offline devices come back online
- Alert Integration: Uses the existing alert system for SMS/email/webhook notifications
Alert Capabilities
- SMS Alerts: Send SMS notifications when devices go offline or recover
- Email Alerts: Send email notifications (when configured)
- Webhook Integration: Send webhook notifications for external systems
- Recovery Notifications: Automatic "all clear" messages when devices recover
Configuration
- Customizable Thresholds: Configure offline detection timeouts
- Alert Rules: Use existing alert rule system to configure recipients
- Channel Selection: Choose SMS, email, webhook, or multiple channels
- Device-Specific Rules: Create rules for specific devices or all devices
Setup
1. Alert Rule Configuration
Create alert rules for device offline monitoring using the web interface or API:
{
"name": "Device Offline Alert",
"description": "Alert when security devices go offline",
"conditions": {
"device_offline": true,
"device_ids": [1941875381, 1941875382] // Optional: specific devices
},
"alert_channels": ["sms", "email"],
"sms_phone_number": "+46701234567",
"email": "admin@company.com",
"is_active": true,
"priority": "high"
}
2. Service Configuration
The service automatically starts with the server and can be configured with environment variables:
- Check Interval: How often to check device health (default: 5 minutes)
- Offline Threshold: How long without heartbeat before considering offline (default: 30 minutes)
3. SMS Configuration
For SMS alerts, configure Twilio credentials:
TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=your_twilio_phone
API Endpoints
Get Service Status
GET /api/device-health/status
Returns the current status of the device health monitoring service:
{
"success": true,
"data": {
"isRunning": true,
"checkIntervalMinutes": 5,
"offlineThresholdMinutes": 30,
"offlineDevicesCount": 1,
"offlineDevices": [
{
"deviceId": 1941875383,
"deviceName": "Guard Tower 3",
"offlineSince": "2025-09-07T10:00:00Z",
"alertSent": true
}
]
}
}
Trigger Manual Health Check
POST /api/device-health/check
Forces an immediate health check of all devices.
Start/Stop Service
POST /api/device-health/start
POST /api/device-health/stop
Control the health monitoring service (normally runs automatically).
Alert Messages
Offline Alert
🚨 DEVICE OFFLINE ALERT 🚨
📍 LOCATION: Stockholm Castle
🔧 DEVICE: Guard Tower 1
⏰ OFFLINE FOR: 45 minutes
📅 LAST SEEN: 2025-09-07 14:30:00
❌ Device has stopped sending heartbeats.
🔧 Check device power, network connection, or physical access.
⚠️ Security monitoring may be compromised in this area.
Recovery Alert
✅ DEVICE RECOVERED ✅
📍 LOCATION: Stockholm Castle
🔧 DEVICE: Guard Tower 1
⏰ RECOVERED AT: 2025-09-07 15:15:00
✅ Device is now sending heartbeats again.
🛡️ Security monitoring restored for this area.
Testing
Use the provided test script to verify the system is working:
python3 test_device_health.py
This will:
- Check the device health service status
- List all devices and their current health status
- Show configured alert rules for device offline monitoring
- Trigger a manual health check
Integration with Existing Systems
The device health monitoring integrates seamlessly with:
- Existing Alert System: Uses the same alert rules, channels, and logging
- Device Management: Works with the existing device approval and activation system
- Heartbeat System: Uses the existing heartbeat infrastructure
- Dashboard: Device status is already displayed in the device list
Troubleshooting
No Alerts Received
- Check if device offline alert rules are configured and active
- Verify SMS/email credentials are properly configured
- Check device health service status via API
- Ensure devices are marked as active and approved
False Positives
- Adjust the offline threshold if devices have irregular heartbeat patterns
- Check network connectivity between devices and server
- Verify heartbeat intervals are properly configured for each device
Service Not Running
- Check server logs for startup errors
- Verify database connectivity
- Restart the server to reinitialize the service
Monitoring and Logs
- Service status is logged to console with timestamps
- Alert sending is logged with recipient and status information
- Manual health checks can be triggered via API for testing
- Service automatically handles graceful shutdown on server restart