Site Reliability Engineer (Remote)

Company: M3USA
Job type: Full-time

Job Description
Design and implement improvements to NAS’ system infrastructure, to meet performance, availability, resilience, security, and compliance objectives.
Monitor and improve system performance, identifying potential enhancements and troubleshooting issues as necessary.
Collaborate with application developers to reduce and mitigate errors and improve quality of service for users and customers.
Develop automated alerting and response systems to manage reliability risks.
Deploy and maintain cloud infrastructure, particularly on Microsoft Azure, using Infrastructure-as-Code and automated scripts whenever possible.
Work alongside developers to ensure that systems are reliable and performant.
Lead scalability and reliability enhancement projects.
Document system architecture and maintenance procedures.
Create runbooks for common fault scenarios and lead incident postmortems.
Monitor critical third-party services and aid in the selection of new services as needed.
Proactively work to improve cost efficiency while meeting service level objectives.
Write scripts and integrate services to automate repetitive work and reduce toil.
 

Apply for this job