SITE Reliability Engineer
Company Overview
Company details are not specified.
Job Summary
The Site Reliability Engineer (SRE) plays a critical role in ensuring the reliability, availability, and performance of our systems. This position requires a deep understanding of cloud infrastructure, automation, and monitoring to effectively manage large-scale systems. The SRE will collaborate with development and operations teams to troubleshoot issues, implement automation solutions, and support cloud-based services.
Responsibilities
- Monitor and maintain the reliability, performance, and availability of large-scale systems.
- Implement automation for repetitive tasks and workflows to enhance operational efficiency.
- Conduct incident response, root cause analysis, and post-incident reviews to ensure system reliability.
- Collaborate with development teams to support continuous integration and delivery pipelines.
- Optimize system performance, scalability, and availability using industry best practices.
- Document processes and procedures, including creating runbooks for operational tasks.
Qualifications
Preferred Skills
- Experience with specific cloud platforms, particularly Azure.
- Relevant certifications in cloud engineering or DevOps.
- Familiarity with microservices architecture and supporting AI/ML solutions.
- Previous experience in large-scale system management and configuration.
Experience
- Minimum of 10 years of relevant experience in site reliability engineering, cloud engineering, or a related field.
Environment
- Position based in Andheri, Mumbai. Details about work setting and flexibility (remote, in-office, hybrid) are not specified.
Salary
Salary details are not specified.
Growth Opportunities
Information on career advancement opportunities within the company is not specified.
Benefits
Benefits offered are not specified.