Job Description
What does a day look like for you here?
- Use the key practices of SRE to provide operational support to customers.
- Work with the customer to establish the SLO/I/A and appropriate monitoring process to support these service levels.
- Manage the release of new features/components against the pre-agreed error budget.
- Work with the customer to establish an effectiveness process for Pre-Production Reviews
- Spend approximately 50% of time Developing tools and automation to streamline deployment, monitoring, and maintenance processes.
- Support the engineering team in developing automated operational tests to demonstrate a reliability baseline.
- Interface directly with the Change Squad to address poorly performing services.
- Collaborate with cross-functional teams to identify and address performance bottlenecks and reliability issues.
- Conduct regular performance analysis and capacity planning to ensure optimal system performance and resource utilisation.
- Implement and maintain monitoring, alerting, and logging solutions to proactively identify and address issues.
- Serve as a technical point of contact for clients, providing guidance on their infrastructure, technology selection, and best practices.
- Participate in client meetings and project discussions to understand business objectives and requirements and aligning technical solutions accordingly.
- Provide ongoing support and troubleshooting assistance to address clients' technical issues and concerns (including out-of-hours support where required)
Qualifications
So, what are we looking for?
- Proven experience as a customer facing Site Reliability Engineer (SRE).
- Experience working with IaC tools such as Terraform, Git, and CI/CD.
- Working knowledge of a configuration manager such as Azure DevOps.
- Experience in implementing and managing monitoring and logging solutions.
- Experience in implementing and automating solutions on Public Cloud platforms (Azure, GCP, AWS).
- Exposure to containerisation technologies such as Docker and container orchestration platforms like Kubernetes.
- Understanding of security, networking, cloud computing, and distributed systems concepts.
See more jobs at Daisy Group
Apply for this job