Site Reliability Engineer | Codersbrain
full-time
Posted on September 5, 2025
Job Description
SITE Reliability Engineer
Company Overview
Company details are not specified.
Job Summary
The Site Reliability Engineer (SRE) plays a critical role in ensuring the reliability, availability, and performance of our systems. This position requires a deep understanding of cloud infrastructure, automation, and monitoring to effectively manage large-scale systems. The SRE will collaborate with development and operations teams to troubleshoot issues, implement automation solutions, and support cloud-based services.
Responsibilities
- Monitor and maintain the reliability, performance, and availability of large-scale systems.
- Implement automation for repetitive tasks and workflows to enhance operational efficiency.
- Conduct incident response, root cause analysis, and post-incident reviews to ensure system reliability.
- Collaborate with development teams to support continuous integration and delivery pipelines.
- Optimize system performance, scalability, and availability using industry best practices.
- Document processes and procedures, including creating runbooks for operational tasks.
Qualifications
-
Education: Bachelor’s degree in Computer Science, Engineering, or a related field.
-
Technical Skills:
- Proficiency in programming languages, particularly Python.
- Deep understanding of Linux/Windows operating systems and networking concepts.
- Experience with Azure, including services, architecture, and best practices.
- Hands-on experience with containerization and orchestration tools like Docker and Kubernetes.
- Familiarity with Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Azure CLI.
- Experience with monitoring tools like Splunk, New Relic, or Azure Monitoring.
- Knowledge of CI/CD practices and tools like GitHub and GitHub Actions.
- Expertise in supporting Azure Machine Learning, Databricks, and other related SaaS tools.
-
Soft Skills:
- Strong problem-solving ability to troubleshoot complex distributed systems independently.
- Excellent written and verbal communication skills to facilitate collaboration across teams.
Preferred Skills
- Experience with specific cloud platforms, particularly Azure.
- Relevant certifications in cloud engineering or DevOps.
- Familiarity with microservices architecture and supporting AI/ML solutions.
- Previous experience in large-scale system management and configuration.
Experience
- Minimum of 10 years of relevant experience in site reliability engineering, cloud engineering, or a related field.
Environment
- Position based in Andheri, Mumbai. Details about work setting and flexibility (remote, in-office, hybrid) are not specified.
Salary
Salary details are not specified.
Growth Opportunities
Information on career advancement opportunities within the company is not specified.
Benefits
Benefits offered are not specified.