Job Title or Location
RECENT SEARCHES

Site Reliability Engineer (SRE with java development)

Compunnel Inc. - 33 Jobs
Toronto, ON
Posted today
Job Details:
Full-time
Experienced

Job Description

SRE

Toronto, ON

Contract: 6+ months (extendable)

Client is expecting a profile with development experience in Java or cloud with SRE.

Mandatory Skills: AWS, Cloud Watch, Lambda, Python , Monitoring tools like Dynatrace and Observability.

Responsibilities:

•Work in collaboration with Application Development, Quality, Product and Data Engineering teams to Champion SRE/ DevOps culture and practices.

•Strategic approach with clear objectives to improve service / product Availability, Performance Optimization, improve Incident MTTR, Change Success Rate and ensure feedback loop to Dev

•Build and maintain Reliable Systems and platforms using SRE and DevSecOps principles with special focus on Observability, Resiliency (proactive impact prevention), Self Healing and Reliability testing

•Work with App & Business teams to establish (SLO/SLI), SRE Dashboards that provide multiple views (LOB, business process or App) view to track value and enable effective decision making

•Innovative approach to Reliability, from Arch and feasibility phase to Operation & Continuous Improvement following product model and Agile methodologies.

•Focus on latest technology trends when it comes to Observability, Automation, Platform technology and tools including AIOps & MLOps reliability and resiliency.

•Ensure Toil is addressed from inception and addressed in Operations (self healing, self config, self Provision and optimization) by leveraging Sense & response, advanced monitoring (synthetic & RUM)

•Lead / Participate in Community of Practice (CoP) to connect and collaborate with like minded teams, set objectives, roadmaps, and implementation. SRE office hours and CoP leadership and participation.

Qualifications:

•SRE: In depth knowledge and experience in Observability, Toil Management, Monitoring tools (Dynatrace, CW, Azure Monitor), Resilient Arch, IaC, CaC, JSON, Typescript, API and Webhook development using Python, Node.js, Ruby, PowerShell, and Shell Scripting languages.

•Cloud Experience: In depth knowledge in Cloud Native tools / services: CDK, Cloud Watch, EKS, EC2, ELB, S3, Lambda, & SSM.

•In depth understanding of Dynatrace advanced features (DT Guardian, RUM, Synthetic testing and monitoring, AI event correlation)

•Experience in Logs ingestion (AWS Firehose, DT Open Pipeline), Reporting and Dashboard tools, Operational Metrics and analytics

•Automation: Leverage Ansible Tower, AWS SSM, BitBucket / GitHub to build automated workflow that eliminate Toil, improve response time and streamline deployment pipeline.

•Cloud Orchestration tools (AWS Step functions, Containers, Apache Airflow) with special focus on Data Batch Processing and Pipelines

•Deep knowledge in Data Management, Data Warehouse, Data lakes, & Database reliability (RedShift, RDS, Aurora), PostgreSQL, SQL Server, Oracle with DevOps experience.

•Exceptional Problem-Solving skills, Knowledge Management and effective communicator that can speak the language of people, process and technology.

•Decisive, energetic, focused team player who builds and leads high-performing teams / CoP and foster a culture of diversity, inclusion, recognition and growth.

Share This Job: