- LF9-03 Virtualisierung: Docker Compose + Volume examples - LF6-02 Frontend: To-Do list practical example - LF8-04 ETL: Complete ETL pipeline example - LF6-04 Sicherheit: Express.js security headers - LF2-04 Nutzwertanalyse: Cloud provider selection example - LF9-04 Monitoring: Prometheus alerts + Python logging
244 lines
4.5 KiB
Markdown
244 lines
4.5 KiB
Markdown
# 9.4 Monitoring
|
|
|
|
## Monitoring - Grundlagen
|
|
|
|
### Was wird überwacht?
|
|
|
|
```
|
|
Monitoring-Bereiche
|
|
├── Verfügbarkeit (Ist das System erreichbar?)
|
|
├── Performance (CPU, RAM, Disk)
|
|
├── Netzwerk (Traffic, Latenz)
|
|
├── Anwendungen (Logs, Fehler)
|
|
└── Sicherheit (Eindringlinge)
|
|
```
|
|
|
|
### Wichtige Metriken
|
|
|
|
| Metrik | Beschreibung | Beispiel |
|
|
|--------|-------------|----------|
|
|
| CPU | Auslastung | < 80% |
|
|
| RAM | Speicherauslastung | < 85% |
|
|
| Disk | Festplattennutzung | < 90% |
|
|
| Network | Durchsatz | 100 Mbps |
|
|
| Latency | Antwortzeit | < 200ms |
|
|
|
|
---
|
|
|
|
## Monitoring-Tools
|
|
|
|
### Nagios
|
|
|
|
```
|
|
Nagios - Features
|
|
├── Host-Überwachung
|
|
├── Service-Überwachung
|
|
├── Alerting
|
|
├── Plugins
|
|
└── Web-Interface
|
|
```
|
|
|
|
### Prometheus + Grafana
|
|
|
|
```
|
|
Stack
|
|
├── Prometheus: Metriken sammeln
|
|
├── Alertmanager: Alarme
|
|
├── Grafana: Visualisierung
|
|
└── Exporters: Datenquellen
|
|
```
|
|
|
|
### Prometheus - Beispiel
|
|
|
|
```yaml
|
|
# prometheus.yml
|
|
global:
|
|
scrape_interval: 15s
|
|
|
|
scrape_configs:
|
|
- job_name: 'node'
|
|
static_configs:
|
|
- targets: ['localhost:9100']
|
|
```
|
|
|
|
### Praktisches Beispiel: Alert-Regeln
|
|
|
|
```yaml
|
|
# alerts.yml
|
|
groups:
|
|
- name: server_alerts
|
|
rules:
|
|
# Hohe CPU-Auslastung
|
|
- alert: HighCPU
|
|
expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Hohe CPU-Auslastung auf {{ $labels.instance }}"
|
|
description: "CPU Auslastung ist seit 5 Minuten über 80%"
|
|
|
|
# Wenig Speicherplatz
|
|
- alert: DiskSpaceLow
|
|
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
|
|
for: 2m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "Wenig Speicherplatz auf {{ $labels.instance }}"
|
|
|
|
# Service ausgefallen
|
|
- alert: ServiceDown
|
|
expr: up == 0
|
|
for: 1m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "{{ $labels.job }} Service ausgefallen"
|
|
```
|
|
|
|
### Praktisches Beispiel: Python Logging
|
|
|
|
```python
|
|
import logging
|
|
import logging.handlers
|
|
import sys
|
|
|
|
# Logger konfigurieren
|
|
logging.basicConfig(
|
|
level=logging.INFO,
|
|
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
|
handlers=[
|
|
logging.StreamHandler(sys.stdout),
|
|
logging.handlers.RotatingFileHandler(
|
|
'app.log',
|
|
maxBytes=10_000_000, # 10 MB
|
|
backupCount=5
|
|
)
|
|
]
|
|
)
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
# Log-Ebenen nutzen
|
|
logger.debug("Detaillierte Debug-Info")
|
|
logger.info("Anwendung gestartet")
|
|
logger.warning("Warnung: Konfiguration fehlt")
|
|
logger.error("Fehler: Datenbank nicht erreichbar")
|
|
logger.critical("Kritisch: System muss heruntergefahren werden")
|
|
```
|
|
|
|
### Grafana Dashboard
|
|
|
|
```
|
|
Grafana - Verwendung
|
|
1. Datenquelle hinzufügen (Prometheus)
|
|
2. Dashboard erstellen
|
|
3. Panels konfigurieren (Graph, Stat, Table)
|
|
4. Alerts einrichten
|
|
```
|
|
|
|
---
|
|
|
|
## Logging
|
|
|
|
### Log-Management
|
|
|
|
```
|
|
Log-Stufen
|
|
├── DEBUG: Detaillierte Informationen
|
|
├── INFO: Allgemeine Informationen
|
|
├── WARNING: Warnungen
|
|
├── ERROR: Fehler
|
|
└── CRITICAL: Kritische Fehler
|
|
```
|
|
|
|
### Zentrales Logging
|
|
|
|
```
|
|
ELK-Stack
|
|
├── Elasticsearch: Suchmaschine
|
|
├── Logstash: Verarbeitung
|
|
├── Kibana: Visualisierung
|
|
└── Beats: Sammlung
|
|
```
|
|
|
|
### Logging in Python
|
|
|
|
```python
|
|
import logging
|
|
|
|
logging.basicConfig(
|
|
level=logging.INFO,
|
|
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
|
)
|
|
|
|
logger = logging.getLogger(__name__)
|
|
logger.info("Anwendung gestartet")
|
|
logger.error("Fehler aufgetreten")
|
|
```
|
|
|
|
---
|
|
|
|
## Alerting
|
|
|
|
### Alert-Regeln
|
|
|
|
```yaml
|
|
groups:
|
|
- name: example
|
|
rules:
|
|
- alert: HighCPU
|
|
expr: cpu_usage > 80
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Hohe CPU-Auslastung auf {{ $labels.instance }}"
|
|
```
|
|
|
|
### Benachrichtigungswege
|
|
|
|
| Kanal | Einsatz |
|
|
|-------|----------|
|
|
| E-Mail | Standard |
|
|
| Slack | Team-Kommunikation |
|
|
| PagerDuty | Incident Management |
|
|
| SMS | Kritische Alarme |
|
|
|
|
---
|
|
|
|
## Verfügbarkeit
|
|
|
|
### Uptime
|
|
|
|
```
|
|
Uptime-Berechnung
|
|
99% → 7,3 Stunden/Jahr offline
|
|
99,9% → 8,7 Stunden/Jahr offline
|
|
99,99% → 52 Minuten/Jahr offline
|
|
99,999% → 5 Minuten/Jahr offline
|
|
```
|
|
|
|
### Checks
|
|
|
|
```
|
|
Verfügbarkeitsprüfung
|
|
├── Ping
|
|
├── Port-Check
|
|
├── HTTP-Response
|
|
├── Zertifikat
|
|
└── Transaktion
|
|
```
|
|
|
|
---
|
|
|
|
## Querverweise
|
|
|
|
- [[LF9-03-Virtualisierung|Zurück: Virtualisierung]]
|
|
- [[Wissen/Wirtschafts-Sozialkunde/WISO-Zusammenfassung|WISO: Betriebswirtschaft]]
|
|
|
|
---
|
|
|
|
*Stand: 2024*
|