Site Reliability Engineering Lead
WORK LOCATION: Onsite 3 days, remote 2 (ONSITE either SCC - Victoria Park and Mc Nicoll OR Yonge and Dundas)
HOURLY RATE: $70-87
LENGTH: June 2,2025– August 19, 2026 (OVER A YEAR!!!)
- 5-7 years of hands-on SRE experience, including proven leadership in automating operations and reducing toil. OTEL and APM tools experience.
- Minimum 3 years' experience providing L2/L3 engineering support.
- Team leadership experience required
Your New Company
Be part of a global technology leader recognized for delivering innovative solutions to major financial institutions. As a trusted vendor partner, we are hiring for a multi-year contract with a prestigious client in the banking sector. You will join a high-performing team focused on Site Reliability Engineering (SRE) excellence and digital transformation.
Your New Role
As the Site Reliability Engineering Lead, you will oversee the design, implementation, and continuous improvement of SRE practices for critical banking applications and infrastructure. This is a hands-on leadership position, where you will drive automation, observability, and reliability, while mentoring a talented technical team. This contract runs from June 2025 to August 2026, based in the Greater Toronto Area (GTA), with a hybrid work model—three days onsite (Victoria Park & McNicoll, or Yonge & Dundas) and two days remote.
Key Responsibilities:
- Lead SRE initiatives to automate operations, reduce toil, and enhance system reliability.
- Provide expert L2/L3 engineering support and troubleshooting for performance and non-functional issues.
- Apply advanced SRE concepts such as chaos engineering, observability, self-healing, error budgets, and disaster recovery exercises.
- Implement and optimize monitoring using APM tools (Grafana, Prometheus, Splunk, AppDynamics, or similar) and observability platforms (Dynatrace, ELK stack).
- Leverage OTEL (OpenTelemetry) for distributed tracing and metrics.
- Develop and maintain solutions using at least one programming language (Node.js, Python, or Java).
- Utilize AWS services (EC2, Lambda, ELB, S3, CloudWatch, IAM, KMS, VPC, DynamoDB, RDS) to support scalable and secure cloud infrastructure.
What You'll Need to Succeed
- 5-7 years of hands-on SRE experience, including proven leadership in automating operations and reducing toil.
- Minimum 3 years' experience providing L2/L3 engineering support.
- Demonstrated expertise in troubleshooting performance and non-functional issues.
- Strong working knowledge of OTEL and APM tools, with experience applying them to SRE monitoring.
- Advanced proficiency in at least one programming language: Node.js, Python, or Java.
- Deep understanding of SRE concepts, tools, and practices, including chaos engineering, observability, self-healing, error budgets, DR exercises, Dynatrace, and ELK stack.
- Solid experience with AWS technologies (EC2, Lambda, ELB, S3, CloudWatch, IAM, KMS, VPC, DynamoDB, RDS).
- Excellent communication, leadership, and problem-solving skills.
What You'll Get in Return
- Competitive hourly rate
- Long-term contract opportunity with a leading technology provider and a major financial institution.
- Hybrid work model (3 days onsite, 2 days remote) in central GTA locations.
- Exposure to the latest SRE, cloud, and observability technologies.
- The chance to lead and make a significant impact on mission-critical banking systems.
What You Need to Do Now
Ready to take your SRE leadership skills to the next level? Apply today with your updated resume highlighting your relevant experience and your hourly rate. Our recruitment team will contact you to discuss your fit for this high-profile opportunity.
Note: Only candidates eligible to work in Canada will be considered. This is a contract position from June 2025 to August 2026.