ML Platform Engineer
ML Platform Engineers build the internal infrastructure that enables data scientists and ML engineers to train, deploy, monitor, and iterate on machine learning models at scale. Their English work spans technical documentation (feature store design specs, platform runbooks), cross-functional communication (explaining drift detection to product managers, presenting experiment tracking governance to compliance), and internal developer advocacy (teaching teams to use the platform correctly). This path focuses on the vocabulary of MLOps infrastructure from a platform ownership perspective.
Topics covered
- Feature store design
- Model registry & versioning
- Model drift detection
- Experiment tracking
- Batch vs real-time inference
- ML governance & reproducibility
Vocabulary spotlight
4 terms every ML Platform Engineer should know in English:
The divergence between feature computation in a training pipeline (usually Python/pandas) and in the production serving layer (often a different language or framework), causing a model to perform differently in production than in evaluation
"The training-serving skew was traced to a normalisation function implemented differently in the Spark training job and the Java serving microservice — the feature store solved this by making both use the same feature definition."
The property of retrieving feature values as they existed at the moment of a historical prediction, not their current values — required to prevent future data leakage into training labels
"Our feature store enforces point-in-time correctness for all training queries: the system joins features at the timestamp of each label, not the latest available value."
A change in the statistical relationship between input features and the target variable over time, causing a model trained on historical data to degrade in production — requires ground truth labels to detect
"Concept drift was confirmed after the marketing team changed the customer segmentation strategy — the churn model's predictions degraded because the same features now corresponded to different behaviour patterns."
Software Bill of Materials applied to ML artefacts — a manifest of the model's training data version, code version, framework dependencies, and hardware environment, used for reproducibility and compliance auditing
"Our platform generates an SBOM for every model promoted to production, enabling full reproducibility of any training run and compliance with the organisation's AI governance policy."
📚 Vocabulary Reference
Key terms organised by category for ML Platform Engineers:
Feature Store
Model Lifecycle
Drift & Monitoring
Inference Infrastructure
Recommended exercises
Real-world scenarios you'll practise
- Explaining training-serving skew to a data science team and presenting the feature store as the solution during a platform onboarding session
- Writing a design spec for the experiment tracking governance policy: what must be logged before a model can be promoted to the registry
- Presenting model drift detection monitoring to a product manager: explaining data drift vs. concept drift in non-technical terms
- Justifying the two-store architecture (offline + online) for the feature store to a VP Engineering during a platform budget review