Production Operations Engineer | ScaleneWorks INC
full-time
Posted on August 28, 2025
Job Description
Data Platform ProdOps Engineers
Company Overview
(Company details are not specified.)
Job Summary
We are seeking a proactive and detail-oriented Production Operations Engineer to join our Data Platforms team. This role plays a critical part in maintaining the stability and reliability of large-scale data platforms by providing hands-on operational support, system monitoring, and first-line incident triage. The ideal candidate is comfortable working in containerized environments, has solid Linux fundamentals, and demonstrates a strong discipline in executing standard operating procedures and documenting outcomes. This position requires close collaboration with developers and product teams in a cross-timezone environment.
Responsibilities
- Monitor production systems and job pipelines; respond promptly to alerts and anomalies.
- Troubleshoot operational issues in collaboration with the development team.
- Investigate incidents using logs, metrics, and observability tools (e.g., Grafana, Kibana).
- Perform recovery actions such as restarting pods, rerunning jobs, or applying known mitigations.
- Operate in Kubernetes environments to inspect, debug, and manage components.
- Support deployment activities through post-release validations and basic checks.
- Validate data quality and flag anomalies to the relevant engineering teams.
- Maintain clear documentation of incidents, actions taken, and resolution outcomes.
- Communicate effectively with remote teams for operational handoffs and follow-ups.
Qualifications
- 3 years of experience in production operations, system support, or DevOps roles.
- Solid Linux skills (e.g., file system navigation, log analysis, process/network troubleshooting).
- Hands-on experience with Kubernetes and Docker in production environments.
- Familiarity with observability tools (e.g., Grafana, Kibana, Prometheus).
- English proficiency for reading, writing, and asynchronous communication.
- Strong execution discipline and ability to follow structured operational procedures.
Preferred Skills
- Scripting ability (Python or Shell) for log parsing and automation.
- Basic SQL skills for data verification or debugging.
- Experience with Hadoop and Flink pipelines for batch and stream processing is a strong plus.
- Experience with large-scale distributed data systems or job scheduling frameworks.
Experience
- Minimum of 3 years of relevant experience in production operations or DevOps roles.
Environment
- Location: Bangalore.
- Often involves collaboration with remote teams and requires effective communication across different time zones.
Salary
(Salary specifics are not mentioned.)
Growth Opportunities
- Opportunity to work with a highly experienced global engineering team.
- Exposure to enterprise-scale data systems and platform operations.
- Structured onboarding and mentoring support.
- Long-term growth potential in DevOps, platform, or data infrastructure domains.
Benefits
(Benefits details are not specified.)