- Provide hands-on SRE technical support on squad level, providing 24x7 SRE support.
- Drive transformation by continuously looking for ways to automate existing processes.
- Track, audit, monitor and implement on technical work streams.
- Act as portfolio SME (Subject Matter Expert) understand & document common components, core functionalities, infrastructure of supported applications.
- Be an escalation point in the on-call rotation, and support our maintenance, scheduled work, support and release deployment requirements.
- Help in incident management and problem management for applications in scope and RCA Action items fulfillment/ownership.
- Focus on Continuous improvement and technical standards Drive improvements in productivity, monitoring, tooling and best practices.
- Manage technology currency (server patching, certificate renewal, compliance, etc.) with keen eye on automating opportunities.
- Drive best-in-class technical solutions by tracking closely industry leading solutions and applying to client environment and needs.
- Leverage the value in unit, department, and enterprise wide teams to develop better solutions and achieve a cross enterprise mindset.
- Contribute to drive the overall SRE strategy, owning roadmap build.
- 2-5 years of experience as SRE.
- 4-5 years of experience in related field.
- A Bachelor s degree in Computer Science or related technical field (Example: Mathematics/Engineering/Physics), or equivalent practical experience.
- Advanced knowledge of the following SRE practices and technologies:
- Python, YAML, Shell scripting.
- Azure, Linux.
- Dynatrace, Prometheus, Pager Duty, Moog, Client, Elastic, Azure monitor.
- Chaos Engineering.
- MQ, Kafka.
- Perform production support role, including off-hours support.
- Ability to influence at the Senior and/or Principal level.
- In-depth hands-on experience in a variety of SRE tools (Ansible, Azure Automation, Catchpoint).
- Provide hands-on SRE with 24x7 SRE support, including incident management, problem management, root cause analysis, monitoring, alerting, and maintenance of infrastructure, compliance.
- Track, audit, monitor and implement on technical work streams.
- Act as portfolio SME (Subject Matter Expert) understand & document common components, core functionalities, infrastructure of supported applications.
- Be an escalation point in the on-call rotation, and support our maintenance, scheduled work, support and release deployment requirements.
- Lead in incident management and problem management for applications in scope and RCA Action items fulfillment/ownership.
- Focus on Continuous improvement and technical standards Drive improvements in productivity, monitoring, tooling and best practices.
- Manage technology currency (server patching, certificate renewal, compliance, etc.) with keen eye on automating opportunities.
- Drive best-in-class technical solutions by tracking closely industry leading solutions and applying to client environment and needs.
- Leverage the value in unit, department, and enterprise wide teams to develop better solutions and achieve a cross enterprise mindset.
- Develop SRE solutions (monitoring and alerting, machine learning anomaly detection, self-healing and reliability testing).
- Apply design-thinking and agile mindset in working with SREs, Scrum Masters and Incident Leads.
- Contribute to and leverage best practices in SRE.
- Simplifies development by building repeatable solutions to manual tasks.
- Supports unit's goals to adopt automation solutions for applications in scope.
- Perform production support role, including off-hours support and rotational on-call support to be compensated accordingly with overtime pay, lieu time, and on-call allowance.
- Assist in incident management and problem management for applications in scope.
- Evaluate continuously what went well, what went wrong, what can be done to improve and prevent in future.
- Maintain technology currency (perform server patching, certificate renewal, etc.) with keen eye on automating opportunities.
- Ensure availability and uptime of applications in scope, as per service level objectives.
- Ensure compliance of all systems and applications in scope, including maintaining segregation of duties.
- Support initiatives outside of application or squad level scope Consult on products build to other teams in RBPT and enterprise.
- Stay abreast of technology change and learn constantly, through official training assignments and self-assigned learning.
- Provide demos to team at large of new technology findings.