GuruLink - 139 emplois
Toronto, ON
Détails de l'emploi :
Location: REMOTE / Nashville, Tennessee
This job allows you to work remotely.
Our client is transforming the experience of specialty care. Their comprehensive care program takes a profoundly personal, evidence-based approach to improving patient outcomes for joint, back, and muscle conditions. By carefully assessing patients' symptoms, health histories, preferences, and goals with predictive data and the latest evidence-based guidelines, they help patients choose and navigate the most effective treatment pathway every step of the way.
The company values the experiences and perspectives of individuals from all backgrounds. They are a highly collaborative, curious, and determined team passionate about scaling a high-growth start-up to improve the lives of those in pain.
The Role:
The Principal DevOps Engineer owns the infrastructure function end-to-end: reliability, security, scalability, and operational governance of the company's infrastructure, plus the team that delivers it. You will be a peer to the Director of Software Engineering, Director of Data Engineering, and Director of Data Science, own the Infrastructure & SRE scorecard in front of the executive team, and lead vendor escalations.
This is a player-coach role. In year one you will spend roughly 60% of your time hands-on (writing Terraform, leading incidents, doing architecture work) and 40% building the team and the practice. As the team scales, that ratio shifts toward leadership, but you will never stop being technical.
You will:
•Converge all AWS resources to Terraform, eliminate manual provisioning, and establish reproducible dev/staging/production environments with proper isolation and parity
•Standardize CI/CD pipelines across all engineering teams
•Define and operate SLOs, SLIs, and error budgets for all production systems; build observability across AWS, Salesforce, telephony, and integrations
•Stand up on-call rotation, incident management, and post-incident review discipline, including RCAs; own uptime, MTTR, and incident-volume trends as published metrics
•Design, implement, and validate a tested DR strategy with documented RPO/RTO commitments aligned to HITRUST and HIPAA expectations
•Stabilize Salesforce, telephony/omni-channel, and Cresta integrations; close persistent gaps in skills-based routing, warm transfers, and telephony data parity
•Partner with Data Engineering on reliability of data ingest paths (Fivetran, SFTP, S3) and Salesforce bulk API flows
•Translate Security & Compliance policy into enforced infrastructure controls: IAM, encryption, network segmentation, secrets management, and audit logging
•Partner with the Security & Compliance team on HITRUST evidence, audit readiness, and remediation; own vulnerability management across cloud and application layers
•Build and maintain test, staging, and ephemeral environments engineers actually use; reduce cycle time, remove infrastructure friction from the SDLC, and establish self-service tooling
•Hire, level, develop, and retain the Infrastructure & SRE team
Must Have Skills:
•10+ years in Infrastructure Engineering, SRE, or DevOps
•Experience in Terraform, AWS, and production incident response
•Track record of hiring, leveling, and developing infrastructure or SRE engineers
•Deep AWS expertise: VPC, IAM, ECS/EKS, Lambda, RDS, DynamoDB, S3, API Gateway, WAF, Connect
•Production Terraform experience at scale (modules, state management, multi-environment)
•Hands-on with observability stacks (CloudWatch, Datadog, Grafana, or equivalents)
•Demonstrated experience standing up SRE practices: SLOs, on-call, incident management, blameless postmortems
•Experience operating in a HIPAA or comparably regulated environment (PCI, SOC 2 Type II, HITRUST, FedRAMP)
•CI/CD pipeline design (GitHub Actions, GitLab CI, or equivalent)
•Amazon Connect or comparable contact center telephony platforms