ML Service¶

The ML service is a separate FastAPI microservice (swen-ml), communicating with the backend over HTTP.

Package Layout¶

services/ml/swen_ml/
├── api/            ← FastAPI app, routers (/classify, /examples, /health)
├── config/         ← pydantic-settings for ML-specific config
├── data_models/    ← domain models (Anchor, Example, Noise, Enrichment)
├── inference/
│   ├── shared.py               ← SharedInfrastructure dataclass
│   ├── _models/                ← Encoder protocol + backends (sentence-transformers, HuggingFace)
│   └── classification/
│       ├── orchestrator.py     ← ClassificationOrchestrator
│       ├── tiers.py            ← PreprocessingTier, ExampleTier, EnrichmentTier, AnchorTier
│       ├── context.py          ← PipelineContext, TransactionContext
│       ├── result.py           ← ClassificationResult
│       ├── classifiers/
│       │   ├── anchor.py       ← AnchorClassifier (account embedding similarity)
│       │   └── example.py      ← ExampleClassifier (user history k-NN)
│       ├── enrichment/
│       │   ├── service.py      ← EnrichmentService (keyword + SearXNG)
│       │   ├── keywords/       ← FileKeywordAdapter + keywords_de.txt
│       │   └── search/         ← SearXNGAdapter
│       └── preprocessing/
│           └── text_cleaner.py ← TextCleaner + NoiseModel
├── storage/        ← SQLAlchemy models + async repos (swen_ml DB)
├── training/       ← Example ingestion, embedding computation, storage
└── evaluation/     ← Offline evaluation tooling (__main__.py)

Lifespan¶

On startup, the ML service performs the following steps in order (FastAPI lifespan):

DB init — Create swen_ml schema if it doesn't exist
Encoder load — Load the configured sentence encoder from HuggingFace (or local cache); default: paraphrase-multilingual-MiniLM-L12-v2
Warm-up — Run one dummy inference to compile CUDA/CPU kernels
Enrichment init — Verify SearXNG connectivity (non-fatal if unreachable)
SharedInfrastructure — Assemble the shared object and attach to app.state

Until step 3 completes, the /health endpoint returns {"status": "loading"}. The backend waits for a healthy ML service before sending classification requests.

SharedInfrastructure¶

All request handlers receive a SharedInfrastructure object via FastAPI Depends:

@dataclass
class SharedInfrastructure:
    encoder: Encoder          # protocol — sentence-transformers or HuggingFace backend
    settings: Settings        # ML service config
    keyword_adapter: KeywordPort | None = None    # keyword enrichment (always loaded)
    searxng_adapter: SearXNGAdapter | None = None # web search enrichment (optional)

This avoids re-loading the model on every request and centralises resource management.

Storage¶

The ML service uses its own SQLite / PostgreSQL database (swen_ml), separate from the main swen database. This separation means:

The ML service can be scaled or replaced independently
ML training data (examples, embeddings) does not pollute the main DB
The main backend never reads ML storage directly

Tables: - user_examples — stored transaction texts + their known counter-account + embedding vector - anchor_embeddings — per-account anchor embeddings (account name/description encoded as vectors) - user_noise_models — per-user IDF noise model (boilerplate token frequencies) - enrichment_cache — SearXNG lookup results (keyed by query hash, with TTL)

Training Data Flow¶

sequenceDiagram
    participant User
    participant Backend
    participant MLService

    User->>Backend: POST /transactions/{id}/post
(with corrected account)
    Backend->>MLService: POST /examples
{text, account_id, account_number}
    MLService->>MLService: Encode text → embedding vector
    MLService->>MLService: Store in user_examples table
    Note over MLService: Available for ExampleClassifier on next classify request

The backend sends a training example whenever a transaction is posted with a correction (or on first post if no suggestion was made). No retraining loop — the example is immediately available for k-NN retrieval.

Evaluation Tooling¶

swen_ml/evaluation/__main__.py provides an offline evaluation script:

uv run --package swen-ml python -m swen_ml.evaluation \
    --test-set data/eval.jsonl \
    --output eval_results.json

This runs the full classification pipeline against a labelled test set and reports accuracy per tier, per account, and an overall precision/recall breakdown.