We're seeking a Junior to Mid Level Observability Engineer (3+ years of experience) to support the tools and systems that enable platform visibility across a large-scale, cloud-native environment. This role ensures that our product and infrastructure teams can effectively monitor performance, detect issues, and maintain the reliability of the platform while continuing to ship high-quality features at speed.
If you've worked in Site Reliability Engineering (SRE), Platform Operations, or DevOps and have a strong interest in observability tooling this opportunity offers the chance to deepen your expertise in monitoring, automation, and infrastructure reliability at scale.
Open to candidates that reside in Alberta or Ontario - Fully Remote (Canada Only)
Not open to sponsorship / worker must be a T4
Contract role: 6+ months
Rate: $40 - 45 CAD hourly rate (this is the client's max budget - waiting to confirm if there is any flexibility)
RESPONSIBILITIESSupport production-grade services with a focus on availability and performance.
Help design and operate resilient systems capable of scaling with business growth.
Deploy and manage observability tools across metrics, logs, and traces.
Automate and improve processes related to infrastructure monitoring and reliability.
Potential Projects:
Diagnose and resolve performance bottlenecks or system incidents.
Build or refine automation scripts and operational workflows.
Provide input on load testing, performance tuning, and tooling enhancements.
Collaborate with engineering teams to improve visibility into their systems and services.
Support infrastructure provisioning using infrastructure-as-code practices.
3+ years of experience in SRE, DevOps, or platform operations roles.
Solid background with public cloud platforms such as AWS, GCP, or Azure.
Hands-on experience with observability tools such as Datadog (preferred) or similar.
Familiarity with error monitoring tools like Sentry (preferred) or alternatives.
Proficiency in infrastructure-as-code tools like Terraform.
Experience coding in Golang, Node.js, or Python.
Proven ability to monitor and support distributed systems at scale.
Strong understanding of microservices and container orchestration using Kubernetes.
Collaborative mindset with a willingness to contribute ideas and challenge assumptions.