Senior Site Reliability Engineer

پست شده در: مهندسی در انگلستان | آخرین بروزرسانی: 2024-06-20 |

نوع شغل: Full-time

Salary:

72٬000 - 88٬000 GBP/Year

Senior Site Reliability Engineer - مهندسی

Senior Site Reliability Engineer (1) - مهندسی

Senior Site Reliability Engineer (SRE)
Mainly remote - 1 trip per month to office in Manchester
Perm
£72,000pa - £88,000pa
My client is currently looking to recruit a Senior SRE for a role working mainly remote, with occasional travel to Manchester.
This role will report to technical lead
Candidates must have a strong experience with Azure and Kubernetes along with strong networking concept knowledge. Azure and Kubernetes experience is crucial.
ideally candidates will have experience working in a start up company and will be used to the fast paced environment that comes with smaller companies.
Candidate must be senior, they will be leading the platform and will be the go-to when things need escalating.
Skills required
* Detailed knowledge of public cloud, primarily Azure.
* Experience of modern hosting options like Functions, Logic Apps, ADF, Container Apps, etc.
* Experience with Kubernetes, preferable AKS and the connections needed to use external Azure resources within the clusters.
* Programming and/or scripting skills in bash/ powershell
* Strong networking concept knowledge, including TCP/IP, DNS, load balancing, and routing.
* In-depth knowledge of monitoring and logging tools, such as Data Dog, Grafana, ELK stack, etc., for effective system monitoring and troubleshooting.
* Knowledge of continuous integration and deployment practices, preferably with Azure DevOps
* Confidence to work independently, using your initiative and guide/train more inexperienced engineers.
* Enjoys solving technical problems.
* Experience in the use of Windows and Linux Operating System
* Coaching other engineers on infrastructure engineering principles
* Good communication skills, both written and verbal.
* Good collaboration skills with people of all levels of technical ability
* Good understanding of industry best practices for SRE
* Exposure to Terraform is a plus
Working behaviour
* Collaborate with development teams to define and implement infrastructure solutions, focusing on reliability, scalability, and performance.
* Design and develop automated tools and scripts for continuous monitoring, deployment, and management of production systems.
* Troubleshoot and resolve complex production issues in a timely manner, applying root cause analysis to prevent recurrence.
* Manage and maintain cloud-based infrastructure and services.
* Develop and maintain documentation for processes, procedures, and best practices to enable knowledge sharing and upskilling of other engineers.
* Helping projects implement better ways of working using DevOps principles.
* Help the team identify engineering tasks and prioritise the backlog in relation to project and support need.
* Enhance operational reliability and scalability of existing products.
* Identify simple innovative technical solutions to complex engineering problems.
* Improve the Infrastructure as Code testing capabilities through examples, documentation and industry best practice.
* Managing your time to ensure relevant projects are completed within deadlines whilst ensuring support tasks are completed.
* Expected to be in the on-call rota with the rest of the teams