Titre du poste ou emplacement

Site Reliability Engineer

CYNET SYSTEMS - 540 emplois
Toronto, ON
Posté hier
Détails de l'emploi :
Temps plein
Expérimenté

Job Description:
  • Provide hands-on SRE technical support on squad level, providing 24x7 SRE support.
  • Drive transformation by continuously looking for ways to automate existing processes.
  • Track, audit, monitor and implement on technical work streams.
  • Act as portfolio SME (Subject Matter Expert) understand & document common components, core functionalities, infrastructure of supported applications.
  • Be an escalation point in the on-call rotation, and support our maintenance, scheduled work, support and release deployment requirements.
  • Help in incident management and problem management for applications in scope and RCA Action items fulfillment/ownership.
  • Focus on Continuous improvement and technical standards Drive improvements in productivity, monitoring, tooling and best practices.
  • Manage technology currency (server patching, certificate renewal, compliance, etc.) with keen eye on automating opportunities.
  • Drive best-in-class technical solutions by tracking closely industry leading solutions and applying to client environment and needs.
  • Leverage the value in unit, department, and enterprise wide teams to develop better solutions and achieve a cross enterprise mindset.
  • Contribute to drive the overall SRE strategy, owning roadmap build.
  • 2-5 years of experience as SRE.
  • 4-5 years of experience in related field.
  • A Bachelor s degree in Computer Science or related technical field (Example: Mathematics/Engineering/Physics), or equivalent practical experience.
  • Advanced knowledge of the following SRE practices and technologies:
  • Python, YAML, Shell scripting.
  • Azure, Linux.
  • Dynatrace, Prometheus, Pager Duty, Moog, Client, Elastic, Azure monitor.
  • Chaos Engineering.
  • MQ, Kafka.
  • Perform production support role, including off-hours support.
  • Ability to influence at the Senior and/or Principal level.
  • In-depth hands-on experience in a variety of SRE tools (Ansible, Azure Automation, Catchpoint).
  • Provide hands-on SRE with 24x7 SRE support, including incident management, problem management, root cause analysis, monitoring, alerting, and maintenance of infrastructure, compliance.
  • Track, audit, monitor and implement on technical work streams.
  • Act as portfolio SME (Subject Matter Expert) understand & document common components, core functionalities, infrastructure of supported applications.
  • Be an escalation point in the on-call rotation, and support our maintenance, scheduled work, support and release deployment requirements.
  • Lead in incident management and problem management for applications in scope and RCA Action items fulfillment/ownership.
  • Focus on Continuous improvement and technical standards Drive improvements in productivity, monitoring, tooling and best practices.
  • Manage technology currency (server patching, certificate renewal, compliance, etc.) with keen eye on automating opportunities.
  • Drive best-in-class technical solutions by tracking closely industry leading solutions and applying to client environment and needs.
  • Leverage the value in unit, department, and enterprise wide teams to develop better solutions and achieve a cross enterprise mindset.
Engineering:
  • Develop SRE solutions (monitoring and alerting, machine learning anomaly detection, self-healing and reliability testing).
  • Apply design-thinking and agile mindset in working with SREs, Scrum Masters and Incident Leads.
  • Contribute to and leverage best practices in SRE.
  • Simplifies development by building repeatable solutions to manual tasks.
  • Supports unit's goals to adopt automation solutions for applications in scope.
Production Support:
  • Perform production support role, including off-hours support and rotational on-call support to be compensated accordingly with overtime pay, lieu time, and on-call allowance.
  • Assist in incident management and problem management for applications in scope.
  • Evaluate continuously what went well, what went wrong, what can be done to improve and prevent in future.
  • Maintain technology currency (perform server patching, certificate renewal, etc.) with keen eye on automating opportunities.
  • Ensure availability and uptime of applications in scope, as per service level objectives.
  • Ensure compliance of all systems and applications in scope, including maintaining segregation of duties.
Technical Consultation:
  • Support initiatives outside of application or squad level scope Consult on products build to other teams in RBPT and enterprise.
Innovation and Learning:
  • Stay abreast of technology change and learn constantly, through official training assignments and self-assigned learning.
  • Provide demos to team at large of new technology findings.

Partager un emploi :