ML Ops Engineer | Codersbrain

full-time

Posted on September 15, 2025

Job Description

AI Ops Engineer

Company Overview

Company details are not specified.

Job Summary

We are seeking a hands-on and proactive AI Ops Engineer to operationalize and support the deployment of large language model (LLM) workflows, including agentic AI applications, across Marvell’s enterprise ecosystem. This role requires strong prompt engineering capabilities, the ability to triage AI pipeline issues, and a deep understanding of how LLM-based agents interact with tools, memory, and APIs. The successful candidate will be expected to diagnose and remediate real-time problems, addressing issues such as prompt quality and model behavior anomalies.

Responsibilities

Design, fine-tune, and manage prompts for various LLM use cases tailored to Marvell’s enterprise operations.
Operate, monitor, and troubleshoot agentic AI applications, identifying whether issues stem from:
- Prompt quality or structure
- Model configuration or performance
- Tool usage, API failures, or memory/recall issues
Build diagnostics and playbooks to triage LLM-driven failures, including handling fallback strategies, retries, or re-routing to human workflows.
Collaborate with architects, machine learning engineers, and DevOps to optimize agent orchestration across platforms like LangGraph, CrewAI, AutoGen, or similar.
Support integration of agentic systems with enterprise applications such as Jira, ServiceNow, Glean, or Confluence using REST APIs, webhooks, and adapters.
Implement observability and logging best practices for model outputs, latency, and agent performance metrics.
Contribute to building self-healing mechanisms and alerting strategies for production-grade AI workflows.

Qualifications

Experience: 3–6 years of experience in software engineering, DevOps, or ML Ops with exposure to AI/LLM workflows.
Skills:
- Strong foundation in prompt engineering and experience with LLMs like GPT, Claude, LLaMA, etc.
- Practical understanding of AIOps platforms or operational AI use cases (incident triage, log summarization, root cause analysis, etc.).
- Exposure to agentic AI architectures, such as LangGraph, AutoGen, CrewAI, etc.
- Familiarity with scripting (Python), RESTful APIs, and basic system debugging.
- Strong analytical skills and the ability to trace issues across multi-step pipelines and asynchronous agents.

Preferred Skills

Familiarity with tools and platforms such as:
- Glean
- DevRev
- Codium
- Cursor
- Atlassian AI
- Databricks Mosaic AI

Experience

3–6 years of experience in software engineering, DevOps, or ML Ops with specific exposure to AI/LLM workflows.

Environment

Work location and environment details are not specified.

Salary

Salary details are not specified.

Growth Opportunities

Potential growth opportunities within the company are not specified.

Benefits

Details regarding benefits are not specified.