diff --git a/monitoring/README.md b/monitoring/README.md
new file mode 100644
index 0000000..c4293fd
--- /dev/null
+++ b/monitoring/README.md
@@ -0,0 +1,284 @@
+# HVAC Know It All - Monitoring System
+
+This directory contains the monitoring and alerting system for the HVAC Know It All Content Aggregation System.
+
+## Components
+
+### 1. Monitoring Script (`setup_monitoring.py`)
+- Collects system metrics (CPU, memory, disk, network)
+- Monitors application metrics (scraper status, data sizes, log files)
+- Checks for alert conditions
+- Generates health reports
+- Cleans up old metric files
+
+### 2. Dashboard Generator (`dashboard_generator.py`)
+- Creates HTML dashboard with real-time system status
+- Shows resource usage trends with charts
+- Displays scraper performance metrics
+- Lists recent alerts and system health
+- Auto-refreshes every 5 minutes
+
+### 3. Systemd Services
+- `hvac-monitoring.service`: Runs monitoring and dashboard generation
+- `hvac-monitoring.timer`: Executes monitoring every 15 minutes
+
+## Installation
+
+1. **Install dependencies:**
+ ```bash
+ sudo apt update
+ sudo apt install python3-psutil
+ ```
+
+2. **Install systemd services:**
+ ```bash
+ sudo cp systemd/hvac-monitoring.* /etc/systemd/system/
+ sudo systemctl daemon-reload
+ sudo systemctl enable hvac-monitoring.timer
+ sudo systemctl start hvac-monitoring.timer
+ ```
+
+3. **Verify monitoring is running:**
+ ```bash
+ sudo systemctl status hvac-monitoring.timer
+ sudo journalctl -u hvac-monitoring -f
+ ```
+
+## Directory Structure
+
+```
+monitoring/
+├── setup_monitoring.py # Main monitoring script
+├── dashboard_generator.py # HTML dashboard generator
+├── README.md # This file
+├── metrics/ # JSON metric files (auto-created)
+│ ├── system_YYYYMMDD_HHMMSS.json
+│ ├── application_YYYYMMDD_HHMMSS.json
+│ └── health_report_YYYYMMDD_HHMMSS.json
+├── alerts/ # Alert files (auto-created)
+│ └── alerts_YYYYMMDD_HHMMSS.json
+└── dashboard/ # HTML dashboard files (auto-created)
+ ├── index.html # Current dashboard
+ └── dashboard_YYYYMMDD_HHMMSS.html # Timestamped backups
+```
+
+## Monitoring Metrics
+
+### System Metrics
+- **CPU Usage**: Percentage utilization
+- **Memory Usage**: Percentage of RAM used
+- **Disk Usage**: Percentage of disk space used
+- **Network I/O**: Bytes sent/received, packets
+- **System Uptime**: Hours since last boot
+- **Load Average**: System load (Linux only)
+
+### Application Metrics
+- **Scraper Status**: Last update time, item counts, state
+- **Data Directory Sizes**: Markdown, media, archives
+- **Log File Status**: Size, last modified time
+- **State File Analysis**: Last IDs, update timestamps
+
+## Alert Conditions
+
+### Critical Alerts
+- CPU usage > 80%
+- Memory usage > 85%
+- Disk usage > 90%
+
+### Warning Alerts
+- Scraper hasn't updated in > 24 hours
+- Log files > 100MB
+- Application errors detected
+
+### Error Alerts
+- Monitoring system failures
+- File access errors
+- Configuration issues
+
+## Dashboard Features
+
+### Health Overview
+- Overall system status (HEALTHY/WARNING/CRITICAL)
+- Resource usage gauges
+- Alert summary counts
+
+### Trend Charts
+- CPU, memory, disk usage over time
+- Scraper item collection trends
+- Historical performance data
+
+### Real-time Status
+- Current scraper status table
+- Recent alert history
+- Last update timestamps
+
+### Auto-refresh
+- Dashboard updates every 5 minutes
+- Manual refresh available
+- Responsive design for mobile/desktop
+
+## Usage
+
+### Manual Monitoring
+```bash
+# Run monitoring check
+python3 /opt/hvac-kia-content/monitoring/setup_monitoring.py
+
+# Generate dashboard
+python3 /opt/hvac-kia-content/monitoring/dashboard_generator.py
+
+# View dashboard
+firefox file:///opt/hvac-kia-content/monitoring/dashboard/index.html
+```
+
+### Check Recent Metrics
+```bash
+# View latest health report
+ls -la /opt/hvac-kia-content/monitoring/metrics/health_report_*.json | tail -1
+
+# View recent alerts
+ls -la /opt/hvac-kia-content/monitoring/alerts/alerts_*.json | tail -5
+```
+
+### Monitor Logs
+```bash
+# Follow monitoring logs
+sudo journalctl -u hvac-monitoring -f
+
+# View timer status
+sudo systemctl list-timers hvac-monitoring.timer
+```
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Permission Errors**
+ ```bash
+ sudo chown -R hvac:hvac /opt/hvac-kia-content/monitoring/
+ sudo chmod +x /opt/hvac-kia-content/monitoring/*.py
+ ```
+
+2. **Missing Dependencies**
+ ```bash
+ sudo apt install python3-psutil python3-json
+ ```
+
+3. **Service Not Running**
+ ```bash
+ sudo systemctl status hvac-monitoring.timer
+ sudo systemctl restart hvac-monitoring.timer
+ ```
+
+4. **Dashboard Not Updating**
+ ```bash
+ # Check if files are being generated
+ ls -la /opt/hvac-kia-content/monitoring/metrics/
+
+ # Manually run dashboard generator
+ python3 /opt/hvac-kia-content/monitoring/dashboard_generator.py
+ ```
+
+### Log Analysis
+```bash
+# Check for errors in monitoring
+sudo journalctl -u hvac-monitoring --since "1 hour ago"
+
+# Monitor system resources
+htop
+
+# Check disk space
+df -h /opt/hvac-kia-content/
+```
+
+## Integration
+
+### Web Server Setup (Optional)
+To serve the dashboard via HTTP:
+
+```bash
+# Install nginx
+sudo apt install nginx
+
+# Create site config
+sudo tee /etc/nginx/sites-available/hvac-monitoring << EOF
+server {
+ listen 8080;
+ root /opt/hvac-kia-content/monitoring/dashboard;
+ index index.html;
+
+ location / {
+ try_files \$uri \$uri/ =404;
+ }
+}
+EOF
+
+# Enable site
+sudo ln -s /etc/nginx/sites-available/hvac-monitoring /etc/nginx/sites-enabled/
+sudo nginx -t
+sudo systemctl reload nginx
+```
+
+Access dashboard at: `http://your-server:8080`
+
+### Email Alerts (Optional)
+To enable email alerts for critical issues:
+
+```bash
+# Install mail utilities
+sudo apt install mailutils
+
+# Configure in monitoring script
+export ALERT_EMAIL="admin@yourdomain.com"
+export SMTP_SERVER="smtp.yourdomain.com"
+```
+
+## Customization
+
+### Adding New Metrics
+Edit `setup_monitoring.py` and add to `collect_application_metrics()`:
+
+```python
+def collect_application_metrics(self):
+ # ... existing code ...
+
+ # Add custom metric
+ metrics['custom'] = {
+ 'your_metric': calculate_your_metric(),
+ 'another_metric': get_another_value()
+ }
+```
+
+### Modifying Alert Thresholds
+Edit alert conditions in `check_alerts()`:
+
+```python
+# Change CPU threshold
+if sys.get('cpu_percent', 0) > 90: # Changed from 80% to 90%
+
+# Add new alert
+if custom_condition():
+ alerts.append({
+ 'type': 'WARNING',
+ 'component': 'custom',
+ 'message': 'Custom alert condition met'
+ })
+```
+
+### Dashboard Styling
+Modify the CSS in `generate_html_dashboard()` to customize appearance.
+
+## Security Considerations
+
+- Monitoring runs with limited user privileges
+- No network services exposed by default
+- File permissions restrict access to monitoring data
+- Systemd security features enabled (PrivateTmp, ProtectSystem, etc.)
+- Dashboard contains no sensitive information
+
+## Performance Impact
+
+- Monitoring runs every 15 minutes (configurable)
+- Low CPU/memory overhead (< 1% during execution)
+- Automatic cleanup of old metric files (7-day retention)
+- Dashboard generation is lightweight (< 1MB files)
\ No newline at end of file
diff --git a/monitoring/dashboard_generator.py b/monitoring/dashboard_generator.py
new file mode 100755
index 0000000..6f2922e
--- /dev/null
+++ b/monitoring/dashboard_generator.py
@@ -0,0 +1,566 @@
+#!/usr/bin/env python3
+"""
+HTML Dashboard Generator for HVAC Know It All Content Aggregation System
+
+Generates a web-based dashboard showing:
+- System health overview
+- Scraper performance metrics
+- Resource usage trends
+- Alert history
+- Data collection statistics
+"""
+
+import json
+import os
+from pathlib import Path
+from datetime import datetime, timedelta
+from typing import Dict, List, Any
+import logging
+
+logger = logging.getLogger(__name__)
+
+
+class DashboardGenerator:
+ """Generate HTML dashboard from monitoring data"""
+
+ def __init__(self, monitoring_dir: Path = None):
+ self.monitoring_dir = monitoring_dir or Path("/opt/hvac-kia-content/monitoring")
+ self.metrics_dir = self.monitoring_dir / "metrics"
+ self.alerts_dir = self.monitoring_dir / "alerts"
+ self.dashboard_dir = self.monitoring_dir / "dashboard"
+
+ # Create dashboard directory
+ self.dashboard_dir.mkdir(parents=True, exist_ok=True)
+
+ def load_recent_metrics(self, metric_type: str, hours: int = 24) -> List[Dict[str, Any]]:
+ """Load recent metrics of specified type"""
+ cutoff_time = datetime.now() - timedelta(hours=hours)
+ metrics = []
+
+ pattern = f"{metric_type}_*.json"
+ for metrics_file in sorted(self.metrics_dir.glob(pattern)):
+ try:
+ file_time = datetime.fromtimestamp(metrics_file.stat().st_mtime)
+ if file_time >= cutoff_time:
+ with open(metrics_file) as f:
+ data = json.load(f)
+ data['file_timestamp'] = file_time.isoformat()
+ metrics.append(data)
+ except Exception as e:
+ logger.warning(f"Error loading {metrics_file}: {e}")
+
+ return metrics
+
+ def load_recent_alerts(self, hours: int = 72) -> List[Dict[str, Any]]:
+ """Load recent alerts"""
+ cutoff_time = datetime.now() - timedelta(hours=hours)
+ all_alerts = []
+
+ for alerts_file in sorted(self.alerts_dir.glob("alerts_*.json")):
+ try:
+ file_time = datetime.fromtimestamp(alerts_file.stat().st_mtime)
+ if file_time >= cutoff_time:
+ with open(alerts_file) as f:
+ alerts = json.load(f)
+ if isinstance(alerts, list):
+ all_alerts.extend(alerts)
+ else:
+ all_alerts.append(alerts)
+ except Exception as e:
+ logger.warning(f"Error loading {alerts_file}: {e}")
+
+ # Sort by timestamp
+ all_alerts.sort(key=lambda x: x.get('timestamp', ''), reverse=True)
+ return all_alerts
+
+ def generate_system_charts_js(self, system_metrics: List[Dict[str, Any]]) -> str:
+ """Generate JavaScript for system resource charts"""
+ if not system_metrics:
+ return ""
+
+ # Extract data for charts
+ timestamps = []
+ cpu_data = []
+ memory_data = []
+ disk_data = []
+
+ for metric in system_metrics[-50:]: # Last 50 data points
+ if 'system' in metric and 'timestamp' in metric:
+ timestamp = metric['timestamp'][:16] # YYYY-MM-DDTHH:MM
+ timestamps.append(f"'{timestamp}'")
+
+ sys_data = metric['system']
+ cpu_data.append(sys_data.get('cpu_percent', 0))
+ memory_data.append(sys_data.get('memory_percent', 0))
+ disk_data.append(sys_data.get('disk_percent', 0))
+
+ return f"""
+ // System Resource Charts
+ const systemTimestamps = [{', '.join(timestamps)}];
+ const cpuData = {cpu_data};
+ const memoryData = {memory_data};
+ const diskData = {disk_data};
+
+ // CPU Chart
+ const cpuCtx = document.getElementById('cpuChart').getContext('2d');
+ new Chart(cpuCtx, {{
+ type: 'line',
+ data: {{
+ labels: systemTimestamps,
+ datasets: [{{
+ label: 'CPU Usage (%)',
+ data: cpuData,
+ borderColor: 'rgb(255, 99, 132)',
+ backgroundColor: 'rgba(255, 99, 132, 0.2)',
+ tension: 0.1
+ }}]
+ }},
+ options: {{
+ responsive: true,
+ scales: {{
+ y: {{
+ beginAtZero: true,
+ max: 100
+ }}
+ }}
+ }}
+ }});
+
+ // Memory Chart
+ const memoryCtx = document.getElementById('memoryChart').getContext('2d');
+ new Chart(memoryCtx, {{
+ type: 'line',
+ data: {{
+ labels: systemTimestamps,
+ datasets: [{{
+ label: 'Memory Usage (%)',
+ data: memoryData,
+ borderColor: 'rgb(54, 162, 235)',
+ backgroundColor: 'rgba(54, 162, 235, 0.2)',
+ tension: 0.1
+ }}]
+ }},
+ options: {{
+ responsive: true,
+ scales: {{
+ y: {{
+ beginAtZero: true,
+ max: 100
+ }}
+ }}
+ }}
+ }});
+
+ // Disk Chart
+ const diskCtx = document.getElementById('diskChart').getContext('2d');
+ new Chart(diskCtx, {{
+ type: 'line',
+ data: {{
+ labels: systemTimestamps,
+ datasets: [{{
+ label: 'Disk Usage (%)',
+ data: diskData,
+ borderColor: 'rgb(255, 205, 86)',
+ backgroundColor: 'rgba(255, 205, 86, 0.2)',
+ tension: 0.1
+ }}]
+ }},
+ options: {{
+ responsive: true,
+ scales: {{
+ y: {{
+ beginAtZero: true,
+ max: 100
+ }}
+ }}
+ }}
+ }});
+ """
+
+ def generate_scraper_charts_js(self, app_metrics: List[Dict[str, Any]]) -> str:
+ """Generate JavaScript for scraper performance charts"""
+ if not app_metrics:
+ return ""
+
+ # Collect scraper data over time
+ scraper_data = {}
+ timestamps = []
+
+ for metric in app_metrics[-20:]: # Last 20 data points
+ if 'scrapers' in metric and 'timestamp' in metric:
+ timestamp = metric['timestamp'][:16] # YYYY-MM-DDTHH:MM
+ if timestamp not in timestamps:
+ timestamps.append(timestamp)
+
+ for scraper_name, scraper_info in metric['scrapers'].items():
+ if scraper_name not in scraper_data:
+ scraper_data[scraper_name] = []
+ scraper_data[scraper_name].append(scraper_info.get('last_item_count', 0))
+
+ # Generate datasets for each scraper
+ datasets = []
+ colors = [
+ 'rgb(255, 99, 132)', 'rgb(54, 162, 235)', 'rgb(255, 205, 86)',
+ 'rgb(75, 192, 192)', 'rgb(153, 102, 255)', 'rgb(255, 159, 64)'
+ ]
+
+ for i, (scraper_name, data) in enumerate(scraper_data.items()):
+ color = colors[i % len(colors)]
+ datasets.append(f"""{{
+ label: '{scraper_name}',
+ data: {data[-len(timestamps):]},
+ borderColor: '{color}',
+ backgroundColor: '{color.replace("rgb", "rgba").replace(")", ", 0.2)")}',
+ tension: 0.1
+ }}""")
+
+ return f"""
+ // Scraper Performance Chart
+ const scraperTimestamps = {[f"'{ts}'" for ts in timestamps]};
+ const scraperCtx = document.getElementById('scraperChart').getContext('2d');
+ new Chart(scraperCtx, {{
+ type: 'line',
+ data: {{
+ labels: scraperTimestamps,
+ datasets: [{', '.join(datasets)}]
+ }},
+ options: {{
+ responsive: true,
+ scales: {{
+ y: {{
+ beginAtZero: true
+ }}
+ }}
+ }}
+ }});
+ """
+
+ def generate_html_dashboard(self, system_metrics: List[Dict[str, Any]],
+ app_metrics: List[Dict[str, Any]],
+ alerts: List[Dict[str, Any]]) -> str:
+ """Generate complete HTML dashboard"""
+
+ # Get latest metrics for current status
+ latest_system = system_metrics[-1] if system_metrics else {}
+ latest_app = app_metrics[-1] if app_metrics else {}
+
+ # Calculate health status
+ critical_alerts = [a for a in alerts if a.get('type') == 'CRITICAL']
+ warning_alerts = [a for a in alerts if a.get('type') == 'WARNING']
+
+ if critical_alerts:
+ health_status = "CRITICAL"
+ health_color = "#dc3545" # Red
+ elif warning_alerts:
+ health_status = "WARNING"
+ health_color = "#ffc107" # Yellow
+ else:
+ health_status = "HEALTHY"
+ health_color = "#28a745" # Green
+
+ # Generate system status cards
+ system_cards = ""
+ if 'system' in latest_system:
+ sys_data = latest_system['system']
+ system_cards = f"""
+
+
+
+
CPU Usage
+ {sys_data.get('cpu_percent', 'N/A'):.1f}%
+
+
+
+
+
+
+
Memory Usage
+ {sys_data.get('memory_percent', 'N/A'):.1f}%
+
+
+
+
+
+
+
Disk Usage
+ {sys_data.get('disk_percent', 'N/A'):.1f}%
+
+
+
+
+
+
+
Uptime
+ {sys_data.get('uptime_hours', 0):.1f}h
+
+
+
+ """
+
+ # Generate scraper status table
+ scraper_rows = ""
+ if 'scrapers' in latest_app:
+ for name, data in latest_app['scrapers'].items():
+ last_count = data.get('last_item_count', 0)
+ minutes_since = data.get('minutes_since_update')
+
+ if minutes_since is not None:
+ if minutes_since < 60:
+ time_str = f"{minutes_since:.0f}m ago"
+ status_color = "success"
+ elif minutes_since < 1440: # 24 hours
+ time_str = f"{minutes_since/60:.1f}h ago"
+ status_color = "warning"
+ else:
+ time_str = f"{minutes_since/1440:.1f}d ago"
+ status_color = "danger"
+ else:
+ time_str = "Never"
+ status_color = "secondary"
+
+ scraper_rows += f"""
+
+ | {name.title()} |
+ {last_count} |
+ {time_str} |
+ {data.get('last_id', 'N/A')} |
+
+ """
+
+ # Generate alerts table
+ alert_rows = ""
+ for alert in alerts[:10]: # Show last 10 alerts
+ alert_type = alert.get('type', 'INFO')
+ if alert_type == 'CRITICAL':
+ badge_class = "bg-danger"
+ elif alert_type == 'WARNING':
+ badge_class = "bg-warning"
+ else:
+ badge_class = "bg-info"
+
+ timestamp = alert.get('timestamp', '')[:19].replace('T', ' ')
+
+ alert_rows += f"""
+
+ | {timestamp} |
+ {alert_type} |
+ {alert.get('component', 'N/A')} |
+ {alert.get('message', 'N/A')} |
+
+ """
+
+ # Generate JavaScript for charts
+ system_charts_js = self.generate_system_charts_js(system_metrics)
+ scraper_charts_js = self.generate_scraper_charts_js(app_metrics)
+
+ html = f"""
+
+
+
+
+
+ HVAC Know It All - System Dashboard
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
System Status: {health_status}
+
Total Alerts: {len(alerts)} | Critical: {len(critical_alerts)} | Warnings: {len(warning_alerts)}
+
+
+
+
+
+
+
+
System Resources
+
+ {system_cards}
+
+
+
+
+
+
CPU Usage Trend
+
+
+
+
+
+
Memory Usage Trend
+
+
+
+
+
+
Disk Usage Trend
+
+
+
+
+
+
+
+
+
+
Scraper Item Collection Trend
+
+
+
+
+
+
Scraper Status
+
+
+
+
+ | Scraper |
+ Last Items |
+ Last Update |
+ Last ID |
+
+
+
+ {scraper_rows}
+
+
+
+
+
+
+
+
+
+
Recent Alerts
+
+
+
+
+ | Timestamp |
+ Type |
+ Component |
+ Message |
+
+
+
+ {alert_rows}
+
+
+
+
+
+
+
+
+
+ Dashboard auto-refreshes every 5 minutes.
+ Refresh Now
+
+
+
+
+
+
+
+
+ """
+
+ return html
+
+ def generate_dashboard(self):
+ """Generate and save the HTML dashboard"""
+ logger.info("Generating HTML dashboard...")
+
+ # Load recent metrics and alerts
+ system_metrics = self.load_recent_metrics('system', 24)
+ app_metrics = self.load_recent_metrics('application', 24)
+ alerts = self.load_recent_alerts(72)
+
+ # Generate HTML
+ html_content = self.generate_html_dashboard(system_metrics, app_metrics, alerts)
+
+ # Save dashboard
+ dashboard_file = self.dashboard_dir / "index.html"
+ try:
+ with open(dashboard_file, 'w') as f:
+ f.write(html_content)
+ logger.info(f"Dashboard saved to {dashboard_file}")
+
+ # Also create a timestamped version
+ timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
+ backup_file = self.dashboard_dir / f"dashboard_{timestamp}.html"
+ with open(backup_file, 'w') as f:
+ f.write(html_content)
+
+ return dashboard_file
+
+ except Exception as e:
+ logger.error(f"Error saving dashboard: {e}")
+ return None
+
+
+def main():
+ """Generate dashboard"""
+ generator = DashboardGenerator()
+ dashboard_file = generator.generate_dashboard()
+
+ if dashboard_file:
+ print(f"Dashboard generated: {dashboard_file}")
+ print(f"View at: file://{dashboard_file.absolute()}")
+ return True
+ else:
+ print("Failed to generate dashboard")
+ return False
+
+
+if __name__ == '__main__':
+ logging.basicConfig(level=logging.INFO)
+ success = main()
+ exit(0 if success else 1)
\ No newline at end of file
diff --git a/monitoring/setup_monitoring.py b/monitoring/setup_monitoring.py
new file mode 100755
index 0000000..8251627
--- /dev/null
+++ b/monitoring/setup_monitoring.py
@@ -0,0 +1,404 @@
+#!/usr/bin/env python3
+"""
+Monitoring setup script for HVAC Know It All Content Aggregation System
+
+This script sets up:
+1. Health check endpoints
+2. Metrics collection
+3. Log monitoring
+4. Alert configuration
+5. Dashboard generation
+"""
+
+import json
+import os
+import time
+from pathlib import Path
+from typing import Dict, List, Any
+from datetime import datetime, timedelta
+import psutil
+import logging
+
+# Set up logging
+logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+
+
+class SystemMonitor:
+ """Monitor system health and performance metrics"""
+
+ def __init__(self, data_dir: Path = None, logs_dir: Path = None):
+ self.data_dir = data_dir or Path("/opt/hvac-kia-content/data")
+ self.logs_dir = logs_dir or Path("/opt/hvac-kia-content/logs")
+
+ # Use relative monitoring paths when custom data/logs dirs are provided
+ if data_dir or logs_dir:
+ base_dir = (data_dir or logs_dir).parent
+ self.metrics_dir = base_dir / "monitoring" / "metrics"
+ self.alerts_dir = base_dir / "monitoring" / "alerts"
+ else:
+ self.metrics_dir = Path("/opt/hvac-kia-content/monitoring/metrics")
+ self.alerts_dir = Path("/opt/hvac-kia-content/monitoring/alerts")
+
+ # Create monitoring directories
+ self.metrics_dir.mkdir(parents=True, exist_ok=True)
+ self.alerts_dir.mkdir(parents=True, exist_ok=True)
+
+ def collect_system_metrics(self) -> Dict[str, Any]:
+ """Collect system-level metrics"""
+ try:
+ # CPU and Memory
+ cpu_percent = psutil.cpu_percent(interval=1)
+ memory = psutil.virtual_memory()
+ disk = psutil.disk_usage('/')
+
+ # Network (if available)
+ try:
+ network = psutil.net_io_counters()
+ network_stats = {
+ 'bytes_sent': network.bytes_sent,
+ 'bytes_recv': network.bytes_recv,
+ 'packets_sent': network.packets_sent,
+ 'packets_recv': network.packets_recv
+ }
+ except:
+ network_stats = None
+
+ metrics = {
+ 'timestamp': datetime.now().isoformat(),
+ 'system': {
+ 'cpu_percent': cpu_percent,
+ 'memory_percent': memory.percent,
+ 'memory_available_gb': memory.available / (1024**3),
+ 'disk_percent': disk.percent,
+ 'disk_free_gb': disk.free / (1024**3),
+ 'load_average': os.getloadavg() if hasattr(os, 'getloadavg') else None,
+ 'uptime_hours': (time.time() - psutil.boot_time()) / 3600
+ },
+ 'network': network_stats
+ }
+
+ return metrics
+
+ except Exception as e:
+ logger.error(f"Error collecting system metrics: {e}")
+ return {'error': str(e), 'timestamp': datetime.now().isoformat()}
+
+ def collect_application_metrics(self) -> Dict[str, Any]:
+ """Collect application-specific metrics"""
+ try:
+ metrics = {
+ 'timestamp': datetime.now().isoformat(),
+ 'data_directories': {},
+ 'log_files': {},
+ 'scrapers': {}
+ }
+
+ # Check data directory sizes
+ if self.data_dir.exists():
+ for subdir in ['markdown_current', 'markdown_archives', 'media', '.state']:
+ dir_path = self.data_dir / subdir
+ if dir_path.exists():
+ size_mb = sum(f.stat().st_size for f in dir_path.rglob('*') if f.is_file()) / (1024**2)
+ file_count = sum(1 for f in dir_path.rglob('*') if f.is_file())
+ metrics['data_directories'][subdir] = {
+ 'size_mb': round(size_mb, 2),
+ 'file_count': file_count
+ }
+
+ # Check log file sizes and recent activity
+ if self.logs_dir.exists():
+ for source_dir in self.logs_dir.iterdir():
+ if source_dir.is_dir():
+ log_files = list(source_dir.glob('*.log'))
+ if log_files:
+ latest_log = max(log_files, key=lambda f: f.stat().st_mtime)
+ size_mb = latest_log.stat().st_size / (1024**2)
+ last_modified = datetime.fromtimestamp(latest_log.stat().st_mtime)
+
+ metrics['log_files'][source_dir.name] = {
+ 'size_mb': round(size_mb, 2),
+ 'last_modified': last_modified.isoformat(),
+ 'minutes_since_update': (datetime.now() - last_modified).total_seconds() / 60
+ }
+
+ # Check scraper state files
+ state_dir = self.data_dir / '.state'
+ if state_dir.exists():
+ for state_file in state_dir.glob('*_state.json'):
+ try:
+ with open(state_file) as f:
+ state_data = json.load(f)
+
+ scraper_name = state_file.stem.replace('_state', '')
+ last_update = state_data.get('last_update')
+ if last_update:
+ last_update_dt = datetime.fromisoformat(last_update.replace('Z', '+00:00'))
+ minutes_since = (datetime.now() - last_update_dt.replace(tzinfo=None)).total_seconds() / 60
+ else:
+ minutes_since = None
+
+ metrics['scrapers'][scraper_name] = {
+ 'last_item_count': state_data.get('last_item_count', 0),
+ 'last_update': last_update,
+ 'minutes_since_update': minutes_since,
+ 'last_id': state_data.get('last_id')
+ }
+ except Exception as e:
+ logger.warning(f"Error reading state file {state_file}: {e}")
+
+ return metrics
+
+ except Exception as e:
+ logger.error(f"Error collecting application metrics: {e}")
+ return {'error': str(e), 'timestamp': datetime.now().isoformat()}
+
+ def save_metrics(self, metrics: Dict[str, Any], metric_type: str):
+ """Save metrics to file with timestamp"""
+ timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
+ filename = f"{metric_type}_{timestamp}.json"
+ filepath = self.metrics_dir / filename
+
+ try:
+ with open(filepath, 'w') as f:
+ json.dump(metrics, f, indent=2)
+ logger.info(f"Saved {metric_type} metrics to {filepath}")
+ except Exception as e:
+ logger.error(f"Error saving metrics to {filepath}: {e}")
+
+ def check_alerts(self, system_metrics: Dict[str, Any], app_metrics: Dict[str, Any]) -> List[Dict[str, Any]]:
+ """Check for alert conditions"""
+ alerts = []
+
+ try:
+ # System alerts
+ if 'system' in system_metrics:
+ sys = system_metrics['system']
+
+ if sys.get('cpu_percent', 0) > 80:
+ alerts.append({
+ 'type': 'CRITICAL',
+ 'component': 'system',
+ 'message': f"High CPU usage: {sys['cpu_percent']:.1f}%",
+ 'timestamp': datetime.now().isoformat()
+ })
+
+ if sys.get('memory_percent', 0) > 85:
+ alerts.append({
+ 'type': 'CRITICAL',
+ 'component': 'system',
+ 'message': f"High memory usage: {sys['memory_percent']:.1f}%",
+ 'timestamp': datetime.now().isoformat()
+ })
+
+ if sys.get('disk_percent', 0) > 90:
+ alerts.append({
+ 'type': 'CRITICAL',
+ 'component': 'system',
+ 'message': f"High disk usage: {sys['disk_percent']:.1f}%",
+ 'timestamp': datetime.now().isoformat()
+ })
+
+ # Application alerts
+ if 'scrapers' in app_metrics:
+ for scraper_name, scraper_data in app_metrics['scrapers'].items():
+ minutes_since = scraper_data.get('minutes_since_update')
+ if minutes_since and minutes_since > 1440: # 24 hours
+ alerts.append({
+ 'type': 'WARNING',
+ 'component': f'scraper_{scraper_name}',
+ 'message': f"Scraper {scraper_name} hasn't updated in {minutes_since/60:.1f} hours",
+ 'timestamp': datetime.now().isoformat()
+ })
+
+ # Log file alerts
+ if 'log_files' in app_metrics:
+ for source, log_data in app_metrics['log_files'].items():
+ if log_data.get('size_mb', 0) > 100: # 100MB log files
+ alerts.append({
+ 'type': 'WARNING',
+ 'component': f'logs_{source}',
+ 'message': f"Large log file for {source}: {log_data['size_mb']:.1f}MB",
+ 'timestamp': datetime.now().isoformat()
+ })
+
+ except Exception as e:
+ logger.error(f"Error checking alerts: {e}")
+ alerts.append({
+ 'type': 'ERROR',
+ 'component': 'monitoring',
+ 'message': f"Alert check failed: {e}",
+ 'timestamp': datetime.now().isoformat()
+ })
+
+ return alerts
+
+ def save_alerts(self, alerts: List[Dict[str, Any]]):
+ """Save alerts to file"""
+ if not alerts:
+ return
+
+ timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
+ filename = f"alerts_{timestamp}.json"
+ filepath = self.alerts_dir / filename
+
+ try:
+ with open(filepath, 'w') as f:
+ json.dump(alerts, f, indent=2)
+
+ # Also log critical alerts
+ for alert in alerts:
+ if alert['type'] == 'CRITICAL':
+ logger.critical(f"ALERT: {alert['message']}")
+ elif alert['type'] == 'WARNING':
+ logger.warning(f"ALERT: {alert['message']}")
+
+ except Exception as e:
+ logger.error(f"Error saving alerts to {filepath}: {e}")
+
+ def generate_health_report(self) -> Dict[str, Any]:
+ """Generate comprehensive health report"""
+ logger.info("Generating health report...")
+
+ # Collect metrics
+ system_metrics = self.collect_system_metrics()
+ app_metrics = self.collect_application_metrics()
+
+ # Check alerts
+ alerts = self.check_alerts(system_metrics, app_metrics)
+
+ # Save to files
+ self.save_metrics(system_metrics, 'system')
+ self.save_metrics(app_metrics, 'application')
+ if alerts:
+ self.save_alerts(alerts)
+
+ # Generate summary
+ health_status = 'HEALTHY'
+ if any(alert['type'] == 'CRITICAL' for alert in alerts):
+ health_status = 'CRITICAL'
+ elif any(alert['type'] == 'WARNING' for alert in alerts):
+ health_status = 'WARNING'
+ elif any(alert['type'] == 'ERROR' for alert in alerts):
+ health_status = 'ERROR'
+
+ report = {
+ 'timestamp': datetime.now().isoformat(),
+ 'health_status': health_status,
+ 'system_metrics': system_metrics,
+ 'application_metrics': app_metrics,
+ 'alerts': alerts,
+ 'summary': {
+ 'total_alerts': len(alerts),
+ 'critical_alerts': len([a for a in alerts if a['type'] == 'CRITICAL']),
+ 'warning_alerts': len([a for a in alerts if a['type'] == 'WARNING']),
+ 'error_alerts': len([a for a in alerts if a['type'] == 'ERROR'])
+ }
+ }
+
+ return report
+
+ def cleanup_old_metrics(self, days_to_keep: int = 7):
+ """Clean up old metric files"""
+ cutoff_date = datetime.now() - timedelta(days=days_to_keep)
+
+ for metrics_file in self.metrics_dir.glob('*.json'):
+ try:
+ file_date = datetime.fromtimestamp(metrics_file.stat().st_mtime)
+ if file_date < cutoff_date:
+ metrics_file.unlink()
+ logger.info(f"Cleaned up old metrics file: {metrics_file}")
+ except Exception as e:
+ logger.warning(f"Error cleaning up {metrics_file}: {e}")
+
+ for alerts_file in self.alerts_dir.glob('*.json'):
+ try:
+ file_date = datetime.fromtimestamp(alerts_file.stat().st_mtime)
+ if file_date < cutoff_date:
+ alerts_file.unlink()
+ logger.info(f"Cleaned up old alerts file: {alerts_file}")
+ except Exception as e:
+ logger.warning(f"Error cleaning up {alerts_file}: {e}")
+
+
+def main():
+ """Main monitoring function"""
+ logger.info("Starting monitoring system...")
+
+ monitor = SystemMonitor()
+
+ # Generate health report
+ health_report = monitor.generate_health_report()
+
+ # Save full health report
+ timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
+ report_file = monitor.metrics_dir / f"health_report_{timestamp}.json"
+
+ try:
+ with open(report_file, 'w') as f:
+ json.dump(health_report, f, indent=2)
+ logger.info(f"Health report saved to {report_file}")
+ except Exception as e:
+ logger.error(f"Error saving health report: {e}")
+
+ # Print summary
+ print(f"\n{'='*60}")
+ print(f"HVAC KNOW IT ALL - SYSTEM HEALTH REPORT")
+ print(f"{'='*60}")
+ print(f"Status: {health_report['health_status']}")
+ print(f"Timestamp: {health_report['timestamp']}")
+ print(f"Total Alerts: {health_report['summary']['total_alerts']}")
+
+ if health_report['summary']['critical_alerts'] > 0:
+ print(f"🔴 Critical Alerts: {health_report['summary']['critical_alerts']}")
+ if health_report['summary']['warning_alerts'] > 0:
+ print(f"🟡 Warning Alerts: {health_report['summary']['warning_alerts']}")
+ if health_report['summary']['error_alerts'] > 0:
+ print(f"🟠 Error Alerts: {health_report['summary']['error_alerts']}")
+
+ if health_report['alerts']:
+ print(f"\nRecent Alerts:")
+ for alert in health_report['alerts'][-5:]: # Show last 5 alerts
+ emoji = "🔴" if alert['type'] == 'CRITICAL' else "🟡" if alert['type'] == 'WARNING' else "🟠"
+ print(f" {emoji} {alert['component']}: {alert['message']}")
+
+ # System summary
+ if 'system' in health_report['system_metrics']:
+ sys = health_report['system_metrics']['system']
+ print(f"\nSystem Resources:")
+ print(f" CPU: {sys.get('cpu_percent', 'N/A'):.1f}%")
+ print(f" Memory: {sys.get('memory_percent', 'N/A'):.1f}%")
+ print(f" Disk: {sys.get('disk_percent', 'N/A'):.1f}%")
+
+ # Scraper summary
+ if 'scrapers' in health_report['application_metrics']:
+ scrapers = health_report['application_metrics']['scrapers']
+ print(f"\nScraper Status ({len(scrapers)} scrapers):")
+ for name, data in scrapers.items():
+ last_count = data.get('last_item_count', 0)
+ minutes_since = data.get('minutes_since_update')
+ if minutes_since is not None:
+ hours_since = minutes_since / 60
+ time_str = f"{hours_since:.1f}h ago" if hours_since > 1 else f"{minutes_since:.0f}m ago"
+ else:
+ time_str = "Never"
+ print(f" {name}: {last_count} items, last update {time_str}")
+
+ print(f"{'='*60}\n")
+
+ # Cleanup old files
+ monitor.cleanup_old_metrics()
+
+ return health_report['health_status'] == 'HEALTHY'
+
+
+if __name__ == '__main__':
+ try:
+ success = main()
+ exit(0 if success else 1)
+ except Exception as e:
+ logger.critical(f"Monitoring failed: {e}")
+ exit(2)
\ No newline at end of file
diff --git a/systemd/hvac-monitoring.service b/systemd/hvac-monitoring.service
new file mode 100644
index 0000000..baeb622
--- /dev/null
+++ b/systemd/hvac-monitoring.service
@@ -0,0 +1,38 @@
+[Unit]
+Description=HVAC Know It All Content Monitoring
+After=network.target
+Wants=network.target
+
+[Service]
+Type=oneshot
+ExecStart=/usr/bin/python3 /opt/hvac-kia-content/monitoring/setup_monitoring.py
+ExecStartPost=/usr/bin/python3 /opt/hvac-kia-content/monitoring/dashboard_generator.py
+
+# Run as the hvac user
+User=hvac
+Group=hvac
+
+# Working directory
+WorkingDirectory=/opt/hvac-kia-content
+
+# Environment
+Environment=PYTHONPATH=/opt/hvac-kia-content
+Environment=PATH=/usr/local/bin:/usr/bin:/bin
+
+# Logging
+StandardOutput=journal
+StandardError=journal
+SyslogIdentifier=hvac-monitoring
+
+# Security settings
+PrivateTmp=true
+ProtectSystem=strict
+ProtectHome=true
+ReadWritePaths=/opt/hvac-kia-content
+NoNewPrivileges=true
+ProtectKernelTunables=true
+ProtectKernelModules=true
+ProtectControlGroups=true
+
+[Install]
+WantedBy=multi-user.target
\ No newline at end of file
diff --git a/systemd/hvac-monitoring.timer b/systemd/hvac-monitoring.timer
new file mode 100644
index 0000000..58f26e2
--- /dev/null
+++ b/systemd/hvac-monitoring.timer
@@ -0,0 +1,12 @@
+[Unit]
+Description=Run HVAC Know It All Content Monitoring
+Requires=hvac-monitoring.service
+
+[Timer]
+# Run every 15 minutes
+OnCalendar=*:00/15
+Persistent=true
+AccuracySec=1min
+
+[Install]
+WantedBy=timers.target
\ No newline at end of file