Job Description
Sr TechOps Lead Engineer (AWS Cloud)
Department: Technology / Engineering
Role Overview
We are seeking a highly experienced TechOps SME/Lead Engineer with deep expertise in Cloud to lead our cloud infrastructure, DevOps practices, reliability engineering, and operational excellence initiatives. This role is both strategic and hands-on - responsible for designing scalable architectures, improving automation, ensuring system reliability, and leading the TechOps team.
Key Responsibilities
• Architect and manage secure, scalable, and highly available infrastructure on AWS.
• Design multi-account AWS environments using AWS Organizations.
• Implement VPC architecture, IAM policies, networking, and security best practices.
• Oversee EC2, ECS/EKS, Lambda, RDS, S3, CloudFront, and related AWS services.
• Optimize AWS cost management and resource utilization.
Reliability & Production Operations
• Implement Site Reliability Engineering (SRE) best practices.
• Define SLIs, SLOs, and error budgets.
• Manage monitoring and alerting (CloudWatch, Datadog, Prometheus, Grafana).
• Lead incident response, root cause analysis (RCA), and postmortems.
• Ensure 24/7 uptime and operational resilience.
Security & Compliance
• Implement IAM best practices and least-privilege access controls.
• Manage secrets and key management (AWS KMS, Secrets Manager).
• Conduct vulnerability management and patching.
• Support compliance initiatives (SOC 2, ISO 27001, GDPR as applicable).
• Lead disaster recovery planning and backup strategies.
Leadership & Strategy
• Lead and mentor a team of DevOps/TechOps engineers.
• Establish operational KPIs and performance benchmarks.
• Manage on-call rotations and escalation processes.
• Collaborate with Engineering, Product, Security, and Data teams.
• Contribute to long-term infrastructure strategy and cloud roadmap.
<>Required Qualifications
• Bachelor's degree in Computer Science, Engineering, or equivalent experience.
• 10+ years in DevOps, Cloud Engineering, or Infrastructure roles.
• 5+ years leading technical teams.
• Strong hands-on experience with AWS services (EC2, EKS, RDS, S3, IAM, VPC, Lambda).
• Deep knowledge of networking, Linux systems, and distributed systems.
• Experience with Infrastructure-as-Code (Terraform or CloudFormation).
• Strong scripting skills (Python, Bash, or similar).
• Experience with containerization (Docker) and Kubernetes (EKS preferred).
Key Competencies
• Strong architectural thinking
• Hands-on technical leadership
• Crisis and incident management
• Strategic planning and execution
• Excellent cross-functional communication
Success Metrics
• 99.9%+ production uptime
• Reduced deployment lead time
• Reduced incident frequency and MTTR
• Improved cost efficiency
• High-performing and scalable TechOps function