Site Reliability Engineer | Scrabble

full-time

Posted on 01-07-2025

Job Description

Leadership – Site Reliability & Platform Architect

Job Summary

This role is pivotal in scaling a high-performance SaaS platform that powers logistics automation for over 500 global enterprises. As the Leadership – Site Reliability & Platform Architect, you will lead the evolution of our cloud infrastructure, DevOps maturity, and backend platform architecture. You will blend deep DevOps/SRE expertise with backend architectural thinking to build resilient, observable, and scalable systems from the ground up.

Responsibilities

Own Infrastructure Architecture: Design and evolve cloud-native systems to ensure scalability, high availability, cost efficiency, and security.
Lead Backend Platform Design: Collaborate with product and engineering teams to design performant, modular, and reliable backend systems.
CI/CD & Deployment Strategy: Build and scale deployment pipelines, optimize rollouts with blue-green/canary deployments, and ensure smooth delivery processes.
Orchestrate Systems: Manage containerized workloads using orchestration tools such as Kubernetes (EKS/GKE), ECS, or others.
Observability & Performance: Standardize monitoring, tracing, and logging across systems; lead capacity planning and performance tuning.
Infrastructure as Code (IaC): Define and maintain scalable infrastructure using tools such as Terraform and Helm.
Mentor & Lead: Guide engineering teams in cloud architecture, system design, and operational excellence.
Champion Reliability & Security: Define SLOs, SLIs, and incident response processes while enforcing best practices for infrastructure and application security.

Qualifications

4+ years of experience in backend or infrastructure roles working with high-scale, production-grade systems.
3+ years of hands-on backend development experience with languages such as Ruby, Node.js, Python, or Java.
3+ years of experience in system design, API development, and performance optimization.
1+ years in a technical leadership role focusing on DevOps/SRE/platform engineering.
Proven experience architecting and running infrastructure on AWS (preferred), GCP, or Azure.
Deep understanding of cloud-native architecture, microservices, and distributed systems.
Hands-on experience with Docker, Kubernetes, Terraform, and observability tools (e.g., Prometheus, Grafana, ELK, OpenTelemetry).
Strong programming/scripting skills in Python, Go, or Bash and the ability to review production backend code (Ruby/Node).
Experience with relational and NoSQL databases such as Postgres, MongoDB, and Redis.

Preferred Skills

Experience with service mesh, multi-region high availability (HA) systems, or event-driven architectures.
Background in security, compliance, or cost optimization.
Prior experience leading backend engineering teams and being deeply involved in designing and scaling core systems.
A comprehensive grasp of backend fundamentals (data modeling, API design, asynchronous jobs, caching) and a passion for building fast, resilient, and observable systems.
A builder, architect, and operator mindset with strong business context awareness.

Experience

7–14 years of total industry experience with significant exposure to backend, infrastructure, and technical leadership roles in high-scale production environments.

Environment

Location: Bangalore
Type: Full-time
You will work within a dynamic, modern engineering environment, engaging with cross-functional teams to drive continuous improvement in cloud infrastructure, deployment methodologies, and system performance.