Advanced 6 topic areas 26+ exercises

ML Platform Engineer

ML Platform Engineers build the internal infrastructure that enables data scientists and ML engineers to train, deploy, monitor, and iterate on machine learning models at scale. Their English work spans technical documentation (feature store design specs, platform runbooks), cross-functional communication (explaining drift detection to product managers, presenting experiment tracking governance to compliance), and internal developer advocacy (teaching teams to use the platform correctly). This path focuses on the vocabulary of MLOps infrastructure from a platform ownership perspective.

Topics covered

  • Feature store design
  • Model registry & versioning
  • Model drift detection
  • Experiment tracking
  • Batch vs real-time inference
  • ML governance & reproducibility

Vocabulary spotlight

4 terms every ML Platform Engineer should know in English:

training-serving skew n.

The divergence between feature computation in a training pipeline (usually Python/pandas) and in the production serving layer (often a different language or framework), causing a model to perform differently in production than in evaluation

"The training-serving skew was traced to a normalisation function implemented differently in the Spark training job and the Java serving microservice — the feature store solved this by making both use the same feature definition."
point-in-time correctness n.

The property of retrieving feature values as they existed at the moment of a historical prediction, not their current values — required to prevent future data leakage into training labels

"Our feature store enforces point-in-time correctness for all training queries: the system joins features at the timestamp of each label, not the latest available value."
concept drift n.

A change in the statistical relationship between input features and the target variable over time, causing a model trained on historical data to degrade in production — requires ground truth labels to detect

"Concept drift was confirmed after the marketing team changed the customer segmentation strategy — the churn model's predictions degraded because the same features now corresponded to different behaviour patterns."
SBOM (ML context) n.

Software Bill of Materials applied to ML artefacts — a manifest of the model's training data version, code version, framework dependencies, and hardware environment, used for reproducibility and compliance auditing

"Our platform generates an SBOM for every model promoted to production, enabling full reproducibility of any training run and compliance with the organisation's AI governance policy."
Open full glossary →

📚 Vocabulary Reference

Key terms organised by category for ML Platform Engineers:

Feature Store

feature storeoffline storeonline storefeature pipelinefeature definitiontraining-serving skewpoint-in-time correctnessfeature registryfeature groupmaterialisation

Model Lifecycle

model registryexperiment runmodel versionmodel stagechampion/challengermodel promotionmodel rollbackreproducibility bundleSBOM (ML)artefact lineage

Drift & Monitoring

data driftconcept driftcovariate shiftPSI (Population Stability Index)KS testprediction distribution shiftground truth labellabel latencydrift scoreretraining trigger

Inference Infrastructure

batch inferencereal-time inferencemodel servingTriton Inference ServerONNXp99 latencycold startmodel warm-upinference pipelineshadow mode
Study full vocabulary modules →

Recommended exercises

Real-world scenarios you'll practise

  • Explaining training-serving skew to a data science team and presenting the feature store as the solution during a platform onboarding session
  • Writing a design spec for the experiment tracking governance policy: what must be logged before a model can be promoted to the registry
  • Presenting model drift detection monitoring to a product manager: explaining data drift vs. concept drift in non-technical terms
  • Justifying the two-store architecture (offline + online) for the feature store to a VP Engineering during a platform budget review

Recommended reading

Explore another role

⚙️ Developer Enablement Lead

Open path →