Cloud Site Reliability Engineer Job at TEEMA

Cloud Site Reliability Engineer

TEEMA - 20 Jobs

Toronto, ON

Closed

This position has been closed and is no longer accepting applicants.

Job Details:

Full-time

Experienced

Primary Functions

- Collaborate with our Security Operations teams to help define and implement best practices around Cloud Service Provider configuration for AWS, Azure and other cloud providers.
- Develop, implement and coordinate a multi-tenant approach around service offerings for DB, Container platform, Authentication, Certificates, and Product Registries etc.
- Develop and maintain cost/utilization tracking and attribution processes for all Cloud Service Providers.
- Create documentation around Cloud Service Provider offerings detailing use cases, best practices, and implementation details.
- Develop and maintain technical relationships with our core Cloud Service Providers.
- Implement and maintain a secure and scalable infrastructure platform for delivering Cloud Services applications.
- Ensure that internal and external SLA's meet and exceed expectations, and ensure that system centric KPIs are continuously monitored and improved.
- Create tools for automating deployment, monitoring and operations of the overall platform.
- Participate in an on-call rotation to provide application support, incident management, and troubleshooting.
- Provide ongoing maintenance and support of internal tools, improve system health and reliability.
- Assist customers with the On-premise deployments when needed.
- Ongoing compliance with organizational policies, procedures and practices (such as but not limited to security policies) are an ongoing requirement of the employment or contractual agreement.
- Comply with the privacy, security and confidentiality policies.

Prerequisites

- Demonstrated expertise of cloud service providers and best practices around implementation and configuration, preferably managing Azure on behalf of multiple teams for a company that delivers SaaS products.
- Experience with Kubernetes, Openshift, Kafka, Elastic stack.
- Proven experience with Security and Compliance (SOC2, HIPAA, ISO27001) best practices and how to implement controls that support high-velocity software delivery teams.
- Proficiency in Terraform, Ansible or Chef.
- Expertise in troubleshooting support escalation, on-Call process optimization and documenting knowledge.
- Passionate about Infrastructure as code, automation, and developing solutions that help developers move quickly and safely.
- Familiarity with infrastructure management and operations lifecycle concepts and ecosystem.
- Experience operating and maintaining production systems in a Linux and public cloud environment.
- You have prior experience working in high performance or distributed systems; while we strive to hire at a variety of experience levels.
- Working knowledge of industry best practices with regard to information security
- Previous experience building or maintaining a large scale Cloud service.
- Proven ability to prioritize and track multiple projects in parallel.
- Proven ability to be highly responsive and customer-focused.

#Information Technology jobs