Site Reliability Engineer at Remote at Remote, Remote, USA |
Email: [email protected] |
http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=725704&uid= From: Sindhuja, USG [email protected] Reply to: [email protected] Primary Responsibilities: Defining and setting up best industry alert and monitoring practices across line of business and design/architect efficient monitoring dashboards on Splunk/DTSaas/DataDog/Grafana common for all applications/products across line of business Participating in 5-9 program and other peak season readiness initiatives and collaboration with application teams evaluating applications from resiliency, availability, and reliability perspective Act as a gatekeeper for changes rolling into production Embrace continuous learning of engineering practices to ensure industry best practices and technology adoption, including DevOps, Cloud and Agile thinking Tech debt reduction/Tech transformation including opensource/inner source adoption, Cloud adoption, HCP assessment and adoption Improve processes/runbooks and lead automation efforts of any manual items around support cutting down manual toil Participate in on-call rotation Improve operational tooling, frameworks, perform chaos engineering activities Respond to platform emergencies, alerts, and escalations from Customer Support Required Qualifications: Overall, 10-12 years of experience in IT industry across entire SDLC Proven work experience as a Site Reliability Engineer or similar role 5+ years of experience in integrating monitoring and alerting into cloud software solutions 5+ years of coding experience with one or more of the follow languages Java, C#, C/C++, Go, Python, Perl, PowerShell or JavaScript with a willingness and ability to learn new ones 2+ years of experience building and programmatically consuming REST APIs 3+ years of experience in Splunk / Dynatrace / DataDog/Grafana/ Telemetry or similar for monitoring tools Experience with programmatic interaction with a relational database SQL Server/MySQL/PostgreSQL Experience planning and supporting 99.999% availability against critical applications in production Solid understanding of engineering fundamentals: unit testing, performance testing, code reviews, telemetry, agile and DevOps Strong understanding of continuous integration / continuous delivery tools, serverless architecture, containerization, public / private cloud, application observability and/or messaging / stream architecture Ability to communicate effectively to both technical and non-technical, globally distributed audiences Technical writing skills (creating flow diagrams, end user documentation, etc.) Keywords: cprogramm cplusplus csharp information technology golang http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=725704&uid= |
[email protected] View All |
11:39 PM 06-Oct-23 |