Site Reliability Engineer | Scrabble
full-time
Posted on July 1, 2025
Job Description
Leadership – Site Reliability & Platform Architect
Job Summary
This role is pivotal in scaling a high-performance SaaS platform that powers logistics automation for over 500 global enterprises. As the Leadership – Site Reliability & Platform Architect, you will lead the evolution of our cloud infrastructure, DevOps maturity, and backend platform architecture. You will blend deep DevOps/SRE expertise with backend architectural thinking to build resilient, observable, and scalable systems from the ground up.
Responsibilities
- Own Infrastructure Architecture: Design and evolve cloud-native systems to ensure scalability, high availability, cost efficiency, and security.
- Lead Backend Platform Design: Collaborate with product and engineering teams to design performant, modular, and reliable backend systems.
- CI/CD & Deployment Strategy: Build and scale deployment pipelines, optimize rollouts with blue-green/canary deployments, and ensure smooth delivery processes.
- Orchestrate Systems: Manage containerized workloads using orchestration tools such as Kubernetes (EKS/GKE), ECS, or others.
- Observability & Performance: Standardize monitoring, tracing, and logging across systems; lead capacity planning and performance tuning.
- Infrastructure as Code (IaC): Define and maintain scalable infrastructure using tools such as Terraform and Helm.
- Mentor & Lead: Guide engineering teams in cloud architecture, system design, and operational excellence.
- Champion Reliability & Security: Define SLOs, SLIs, and incident response processes while enforcing best practices for infrastructure and application security.
Qualifications
- 4+ years of experience in backend or infrastructure roles working with high-scale, production-grade systems.
- 3+ years of hands-on backend development experience with languages such as Ruby, Node.js, Python, or Java.
- 3+ years of experience in system design, API development, and performance optimization.
- 1+ years in a technical leadership role focusing on DevOps/SRE/platform engineering.
- Proven experience architecting and running infrastructure on AWS (preferred), GCP, or Azure.
- Deep understanding of cloud-native architecture, microservices, and distributed systems.
- Hands-on experience with Docker, Kubernetes, Terraform, and observability tools (e.g., Prometheus, Grafana, ELK, OpenTelemetry).
- Strong programming/scripting skills in Python, Go, or Bash and the ability to review production backend code (Ruby/Node).
- Experience with relational and NoSQL databases such as Postgres, MongoDB, and Redis.
Preferred Skills
- Experience with service mesh, multi-region high availability (HA) systems, or event-driven architectures.
- Background in security, compliance, or cost optimization.
- Prior experience leading backend engineering teams and being deeply involved in designing and scaling core systems.
- A comprehensive grasp of backend fundamentals (data modeling, API design, asynchronous jobs, caching) and a passion for building fast, resilient, and observable systems.
- A builder, architect, and operator mindset with strong business context awareness.
Experience
- 7–14 years of total industry experience with significant exposure to backend, infrastructure, and technical leadership roles in high-scale production environments.
Environment
- Location: Bangalore
- Type: Full-time
You will work within a dynamic, modern engineering environment, engaging with cross-functional teams to drive continuous improvement in cloud infrastructure, deployment methodologies, and system performance.