Role: Site Reliability Engineer (SRE)
Location: Montreal (Hybrid, 3 Days Onsite)
Type: Full-time
Overview:
We're seeking a dependable Site Reliability Engineer with 5+ years of software development experience to support and enhance our ServiceNow platform and related systems. The role requires strong Python skills, collaboration, and a focus on reliability, automation, and operational excellence.
Key Responsibilities:
- Improve system availability and performance via automation and tool development.
- Troubleshoot ServiceNow and Linux-based on-prem issues.
- Enhance observability (metrics, logs, alerts, tracing).
- Participate in on-call rotation (with time-off in lieu).
- Document ServiceNow instances and dependencies.
- Identify/prioritize technical debt impacting efficiency.
- Contribute to policy/procedure improvements for SRE practices.
Requirements:
- 2–5 years in software development.
- Proficient in Python and/or ServiceNow.
- Strong communication and team collaboration.
- Ability to handle occasional production emergencies.