Fix jwt-token
This commit is contained in:
183
docs/DEVICE_HEALTH_MONITORING.md
Normal file
183
docs/DEVICE_HEALTH_MONITORING.md
Normal file
@@ -0,0 +1,183 @@
|
||||
# Device Health Monitoring System
|
||||
|
||||
The Device Health Monitoring System automatically monitors all active and approved devices for heartbeat activity and sends alerts when devices go offline for extended periods.
|
||||
|
||||
## Features
|
||||
|
||||
### Automatic Health Monitoring
|
||||
- **Continuous Monitoring**: Checks device health every 5 minutes
|
||||
- **Offline Detection**: Devices are considered offline after 30 minutes without heartbeat
|
||||
- **Recovery Detection**: Automatically detects when offline devices come back online
|
||||
- **Alert Integration**: Uses the existing alert system for SMS/email/webhook notifications
|
||||
|
||||
### Alert Capabilities
|
||||
- **SMS Alerts**: Send SMS notifications when devices go offline or recover
|
||||
- **Email Alerts**: Send email notifications (when configured)
|
||||
- **Webhook Integration**: Send webhook notifications for external systems
|
||||
- **Recovery Notifications**: Automatic "all clear" messages when devices recover
|
||||
|
||||
### Configuration
|
||||
- **Customizable Thresholds**: Configure offline detection timeouts
|
||||
- **Alert Rules**: Use existing alert rule system to configure recipients
|
||||
- **Channel Selection**: Choose SMS, email, webhook, or multiple channels
|
||||
- **Device-Specific Rules**: Create rules for specific devices or all devices
|
||||
|
||||
## Setup
|
||||
|
||||
### 1. Alert Rule Configuration
|
||||
|
||||
Create alert rules for device offline monitoring using the web interface or API:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "Device Offline Alert",
|
||||
"description": "Alert when security devices go offline",
|
||||
"conditions": {
|
||||
"device_offline": true,
|
||||
"device_ids": [1941875381, 1941875382] // Optional: specific devices
|
||||
},
|
||||
"alert_channels": ["sms", "email"],
|
||||
"sms_phone_number": "+46701234567",
|
||||
"email": "admin@company.com",
|
||||
"is_active": true,
|
||||
"priority": "high"
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Service Configuration
|
||||
|
||||
The service automatically starts with the server and can be configured with environment variables:
|
||||
|
||||
- **Check Interval**: How often to check device health (default: 5 minutes)
|
||||
- **Offline Threshold**: How long without heartbeat before considering offline (default: 30 minutes)
|
||||
|
||||
### 3. SMS Configuration
|
||||
|
||||
For SMS alerts, configure Twilio credentials:
|
||||
|
||||
```bash
|
||||
TWILIO_ACCOUNT_SID=your_account_sid
|
||||
TWILIO_AUTH_TOKEN=your_auth_token
|
||||
TWILIO_PHONE_NUMBER=your_twilio_phone
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Get Service Status
|
||||
```
|
||||
GET /api/device-health/status
|
||||
```
|
||||
|
||||
Returns the current status of the device health monitoring service:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"isRunning": true,
|
||||
"checkIntervalMinutes": 5,
|
||||
"offlineThresholdMinutes": 30,
|
||||
"offlineDevicesCount": 1,
|
||||
"offlineDevices": [
|
||||
{
|
||||
"deviceId": 1941875383,
|
||||
"deviceName": "Guard Tower 3",
|
||||
"offlineSince": "2025-09-07T10:00:00Z",
|
||||
"alertSent": true
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Trigger Manual Health Check
|
||||
```
|
||||
POST /api/device-health/check
|
||||
```
|
||||
|
||||
Forces an immediate health check of all devices.
|
||||
|
||||
### Start/Stop Service
|
||||
```
|
||||
POST /api/device-health/start
|
||||
POST /api/device-health/stop
|
||||
```
|
||||
|
||||
Control the health monitoring service (normally runs automatically).
|
||||
|
||||
## Alert Messages
|
||||
|
||||
### Offline Alert
|
||||
```
|
||||
🚨 DEVICE OFFLINE ALERT 🚨
|
||||
|
||||
📍 LOCATION: Stockholm Castle
|
||||
🔧 DEVICE: Guard Tower 1
|
||||
⏰ OFFLINE FOR: 45 minutes
|
||||
📅 LAST SEEN: 2025-09-07 14:30:00
|
||||
|
||||
❌ Device has stopped sending heartbeats.
|
||||
🔧 Check device power, network connection, or physical access.
|
||||
|
||||
⚠️ Security monitoring may be compromised in this area.
|
||||
```
|
||||
|
||||
### Recovery Alert
|
||||
```
|
||||
✅ DEVICE RECOVERED ✅
|
||||
|
||||
📍 LOCATION: Stockholm Castle
|
||||
🔧 DEVICE: Guard Tower 1
|
||||
⏰ RECOVERED AT: 2025-09-07 15:15:00
|
||||
|
||||
✅ Device is now sending heartbeats again.
|
||||
🛡️ Security monitoring restored for this area.
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
Use the provided test script to verify the system is working:
|
||||
|
||||
```bash
|
||||
python3 test_device_health.py
|
||||
```
|
||||
|
||||
This will:
|
||||
- Check the device health service status
|
||||
- List all devices and their current health status
|
||||
- Show configured alert rules for device offline monitoring
|
||||
- Trigger a manual health check
|
||||
|
||||
## Integration with Existing Systems
|
||||
|
||||
The device health monitoring integrates seamlessly with:
|
||||
|
||||
1. **Existing Alert System**: Uses the same alert rules, channels, and logging
|
||||
2. **Device Management**: Works with the existing device approval and activation system
|
||||
3. **Heartbeat System**: Uses the existing heartbeat infrastructure
|
||||
4. **Dashboard**: Device status is already displayed in the device list
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### No Alerts Received
|
||||
1. Check if device offline alert rules are configured and active
|
||||
2. Verify SMS/email credentials are properly configured
|
||||
3. Check device health service status via API
|
||||
4. Ensure devices are marked as active and approved
|
||||
|
||||
### False Positives
|
||||
1. Adjust the offline threshold if devices have irregular heartbeat patterns
|
||||
2. Check network connectivity between devices and server
|
||||
3. Verify heartbeat intervals are properly configured for each device
|
||||
|
||||
### Service Not Running
|
||||
1. Check server logs for startup errors
|
||||
2. Verify database connectivity
|
||||
3. Restart the server to reinitialize the service
|
||||
|
||||
## Monitoring and Logs
|
||||
|
||||
- Service status is logged to console with timestamps
|
||||
- Alert sending is logged with recipient and status information
|
||||
- Manual health checks can be triggered via API for testing
|
||||
- Service automatically handles graceful shutdown on server restart
|
||||
Reference in New Issue
Block a user