Job Title or Location

Cloud Site Reliability Engineer

TEEMA - 20 Jobs
Toronto, ON
Job Details:
Full-time
Experienced


Primary Functions
    • Collaborate with our Security Operations teams to help define and implement best practices around Cloud Service Provider configuration for AWS, Azure and other cloud providers.

    • Develop, implement and coordinate a multi-tenant approach around service offerings for DB, Container platform, Authentication, Certificates, and Product Registries etc.

    • Develop and maintain cost/utilization tracking and attribution processes for all Cloud Service Providers.

    • Create documentation around Cloud Service Provider offerings detailing use cases, best practices, and implementation details.

    • Develop and maintain technical relationships with our core Cloud Service Providers.

    • Implement and maintain a secure and scalable infrastructure platform for delivering Cloud Services applications.

    • Ensure that internal and external SLA's meet and exceed expectations, and ensure that system centric KPIs are continuously monitored and improved.

    • Create tools for automating deployment, monitoring and operations of the overall platform.

    • Participate in an on-call rotation to provide application support, incident management, and troubleshooting.

    • Provide ongoing maintenance and support of internal tools, improve system health and reliability.

    • Assist customers with the On-premise deployments when needed.

    • Ongoing compliance with organizational policies, procedures and practices (such as but not limited to security policies) are an ongoing requirement of the employment or contractual agreement.

    • Comply with the privacy, security and confidentiality policies.

Prerequisites
    • Demonstrated expertise of cloud service providers and best practices around implementation and configuration, preferably managing Azure on behalf of multiple teams for a company that delivers SaaS products.

    • Experience with Kubernetes, Openshift, Kafka, Elastic stack.

    • Proven experience with Security and Compliance (SOC2, HIPAA, ISO27001) best practices and how to implement controls that support high-velocity software delivery teams.

    • Proficiency in Terraform, Ansible or Chef.

    • Expertise in troubleshooting support escalation, on-Call process optimization and documenting knowledge.

    • Passionate about Infrastructure as code, automation, and developing solutions that help developers move quickly and safely.

    • Familiarity with infrastructure management and operations lifecycle concepts and ecosystem.

    • Experience operating and maintaining production systems in a Linux and public cloud environment.

    • You have prior experience working in high performance or distributed systems; while we strive to hire at a variety of experience levels.

    • Working knowledge of industry best practices with regard to information security

    • Previous experience building or maintaining a large scale Cloud service.

    • Proven ability to prioritize and track multiple projects in parallel.

    • Proven ability to be highly responsive and customer-focused.