Machine Learning¶
SWEN includes a self-contained ML service that automatically classifies transactions by predicting which counter-account should be assigned. For example, a transaction to REWE will be assigned to something like 'Groceries'. This task is intricate because the user has total freedom how to define the categories.

What the ML Service Does¶
When a BankTransaction is imported, SWEN calls the ML service to predict the most likely counter-account (e.g. "Groceries", "Salary", "Rent"). The prediction appears as a suggestion on the Draft transaction. You either accept it (post) or correct it.
Corrections are learning signals — every time you change a suggested account and post the transaction, that example is stored and improves future predictions.
Cold Start vs Learning Phase¶
| Phase | Behaviour |
|---|---|
| Cold start (no examples yet) | Uses the Anchor Classifier — embedding similarity between transaction text and account name embeddings. Web enrichment augments sparse descriptions |
| Early use (few examples) | Example Classifier may be unreliable with few examples; Anchor Classifier compensates |
| Steady state (50+ examples per account) | Example Classifier dominates; most transactions resolved before reaching the Anchor Classifier |
Confidence Thresholds¶
Each prediction comes with a confidence score (0–1):
| Score | Displayed as | Action required |
|---|---|---|
| ≥ 0.85 | High confidence | Auto-suggested; one click to post |
| 0.70–0.84 | Medium confidence | Suggested with caution indicator |
| 0.35–0.69 | Low confidence | Anchor Classifier result — shown with low-confidence indicator |
| < 0.35 | Unresolved | No suggestion aka “Needs review” |
Thresholds are configurable via environment variables (see ML Service internals).
What the ML Service Is Not¶
- Not a cloud service — runs entirely locally, no data leaves your machine
- Not a large language model — it uses a domain-specific German sentence-transformer, not GPT
- Not infallible — it makes mistakes, especially early on; your corrections are the feedback loop
- Not required — SWEN works without the ML service; you just classify manually
Architecture Overview¶
The ML service is a separate FastAPI microservice that the backend calls over HTTP. It maintains its own SQLite database of training examples and a loaded sentence-transformer model.
graph LR
backend["Backend<br>(FastAPI)"] -->|"POST /classify"| ml["ML Service<br>(FastAPI :8100)"]
ml --> pre["Preprocessing<br>(text cleaning)"]
pre --> example["Example Classifier<br>(embedding similarity)"]
example -->|unresolved| enrich["Enrichment<br>(keywords + SearXNG)"]
enrich --> anchor["Anchor Classifier<br>(cold start)"]
anchor -->|unresolved| fallback["Unresolved<br>(manual review)"]
enrich --> searxng["SearXNG<br>(web enrichment)"]
For a deep dive into the pipeline see Classification Pipeline.
Section Contents¶
-
The four-tier architecture in depth — when each tier fires, confidence scores, and the feedback loop.
-
The German sentence-transformer model, HuggingFace cache setup, and pooling strategies.
-
How SearXNG is used to look up merchant names and enrich sparse transaction descriptions.