Initial commit: IHK Ausbildung materials
This commit is contained in:
130
2-Ausbildungsjahr/LF8-Datenintegration/LF8-01-Datenquellen.md
Normal file
130
2-Ausbildungsjahr/LF8-Datenintegration/LF8-01-Datenquellen.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# 8.1 Datenquellen
|
||||
|
||||
## Arten von Datenquellen
|
||||
|
||||
### Datenbanken
|
||||
|
||||
```
|
||||
Datenbank-Typen
|
||||
├── Relational (MySQL, PostgreSQL, Oracle)
|
||||
├── NoSQL (MongoDB, Cassandra)
|
||||
├── Graph (Neo4j)
|
||||
└── Zeitreihen (InfluxDB)
|
||||
```
|
||||
|
||||
### Dateisysteme
|
||||
|
||||
| Format | Typ | Einsatz |
|
||||
|--------|-----|----------|
|
||||
| CSV | Text | Tabellarische Daten |
|
||||
| JSON | Text | Strukturierte Daten |
|
||||
| XML | Text | Konfiguration, Austausch |
|
||||
| Excel | Binär | Tabellenkalkulation |
|
||||
| PDF | Binär | Dokumente |
|
||||
|
||||
### Externe Quellen
|
||||
|
||||
```
|
||||
Externe Datenquellen
|
||||
├── APIs (REST, SOAP)
|
||||
├── Webservices
|
||||
├── Cloud-Speicher
|
||||
├── Stream-Daten (IoT)
|
||||
└── Drittanbieter
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Datenbank-Anbindung
|
||||
|
||||
### JDBC (Java)
|
||||
|
||||
```java
|
||||
// Verbindung zu Datenbank
|
||||
Connection conn = DriverManager.getConnection(
|
||||
"jdbc:mysql://localhost:3306/datenbank",
|
||||
"benutzer",
|
||||
"passwort"
|
||||
);
|
||||
|
||||
// Abfrage
|
||||
Statement stmt = conn.createStatement();
|
||||
ResultSet rs = stmt.executeQuery("SELECT * FROM tabelle");
|
||||
```
|
||||
|
||||
### Python (SQLAlchemy)
|
||||
|
||||
```python
|
||||
from sqlalchemy import create_engine
|
||||
|
||||
engine = create_engine(
|
||||
'postgresql://user:password@localhost:5432/database'
|
||||
)
|
||||
|
||||
# Daten lesen
|
||||
df = pd.read_sql("SELECT * FROM tabelle", engine)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dateien einlesen
|
||||
|
||||
### CSV
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
# CSV einlesen
|
||||
df = pd.read_csv('daten.csv')
|
||||
|
||||
# Mit Trennzeichen
|
||||
df = pd.read_csv('daten.txt', sep='\t')
|
||||
```
|
||||
|
||||
### JSON
|
||||
|
||||
```python
|
||||
import json
|
||||
|
||||
# JSON laden
|
||||
with open('daten.json') as f:
|
||||
daten = json.load(f)
|
||||
|
||||
# Zu DataFrame
|
||||
df = pd.json_normalize(daten)
|
||||
```
|
||||
|
||||
### Excel
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
# Excel einlesen
|
||||
df = pd.read_excel('daten.xlsx')
|
||||
|
||||
# Bestimmtes Sheet
|
||||
df = pd.read_excel('daten.xlsx', sheet_name='Tabelle1')
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Datenbanktypen im Vergleich
|
||||
|
||||
| Kriterium | Relational | NoSQL |
|
||||
|-----------|-----------|-------|
|
||||
| Schema | Fest | Flexibel |
|
||||
| Skalierung | Vertikal | Horizontal |
|
||||
| Transaktionen | ACID | Eventual Consistency |
|
||||
| Komplexität | Mittel | Niedrig |
|
||||
| Beispiele | MySQL, PostgreSQL | MongoDB, Redis |
|
||||
|
||||
---
|
||||
|
||||
## Querverweise
|
||||
|
||||
- [[LF8-02-Schnittstellen|Nächstes Thema: Schnittstellen]]
|
||||
- [[LF3-Datenbanken|Datenbanken]]
|
||||
|
||||
---
|
||||
|
||||
*Stand: 2024*
|
||||
Reference in New Issue
Block a user