427033- Associate Director-AI Services | ScaleneWorks INC

full-time

Posted on 23-10-2025

Job Description

Machine Learning Platform Engineering Lead

Company Overview

(Company information not provided.)

Job Summary

The Machine Learning Platform Engineering Lead will be responsible for overseeing the development and operationalization of machine learning platforms within the organization. This role is crucial for ensuring that the team creates efficient, scalable, and compliant AI solutions that align with the organization's strategic objectives.

Responsibilities

Lead and mentor a high-performing team in machine learning platform engineering, fostering a dynamic work environment.
Design and implement MLOps/LLMOps strategies to streamline machine learning workflows.
Oversee the deployment and management of machine learning models using tools such as Weights & Biases (W&B), CI/CD pipelines (e.g., GitHub Actions, Argo), and Kubernetes.
Ensure observability and monitoring of machine learning systems using technologies like Prometheus, Grafana, and the ELK stack.
Convert telemetry data into executive-ready business insights through strong analytical capabilities.
Manage stakeholder relationships across product, data science, engineering, and risk/compliance teams to ensure alignment with regulatory and quality standards.

Qualifications

15+ years of experience in machine learning platform engineering, MLOps/LLMOps, or AI Site Reliability Engineering (SRE).
At least 3 years of experience in leading teams.
Proven experience with GenAI in production settings, including retrieval-augmented generation (RAG), prompt/versioning, evaluations, guardrails, and cost/latency management.
Strong cloud experience with Azure, AWS, and GCP as well as on-premises GPU orchestration.
Strong analytical skills to transform complex data into strategic insights.
Exceptional stakeholder management skills, particularly in regulated environments.

Preferred Skills

Experience with agent frameworks and evaluation, including multi-agent orchestration, task routing, and reliability tooling.
Familiarity with regulated environments (e.g., Good Practice (GxP), model risk governance) and data privacy regulations.
Knowledge of big data technologies like Ray, Spark, and Databricks; vector databases; and API gateways.
Proficiency in Python for accelerating experiments and proof-of-concept developments.

Experience

Candidates should have demonstrated experience in leading teams related to machine learning and AI technologies, specifically in regulated or quality-managed environments.

Environment

(Work setting, location, and physical conditions not provided.)

Salary

(Salary information not provided.)

Growth Opportunities

(Career advancement opportunities not specified.)

Benefits

(Benefits information not provided.)