Sr Site Reliability Engineer at Plano, Texas, USA |
Email: [email protected] |
http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=356887&uid= From: bhumika, adifice tehnologies [email protected] Reply to: [email protected] Sr Site Reliability Engineer Location: Plano, TX or Jersey City, NJ (Onsite from Day 1) Duration: Long Term Visa: No H1B/OPT/CPT Client: Cognizant/JPMC (Need 10+Years of experience candidate) THIS IS NOT A DEVOPS ROLE. Required Qualifications: Bachelors degree or equivalent experience in a software engineering discipline. Highly skilled SRE with 9+ years of experience. SRE mindset (Exposure to SRE tenants that includes - Observability, Monitoring, Alerting, Logging, Toil, Automation, SLO/SLI/SLA/Error Budgets). Experience deploying and managing services on modern platforms (AWS, GCP, Azure, PCF). In-Depth OS experience e.g., RHEL, Ubuntu, Windows Server with strong debugging, troubleshooting, and problem-solving skills. Background as a software developer (Experience in cloud native, distributed application design and implementation), proficiency in languages like Java, Python, C++, Go, etc. Strong experience in using industry standard monitoring tools e.g., AppDynamics, Dynatrace, APICA, Splunk, ELK, FluentD, Prometheus, Kibana, Elasticsearch, Grafana, Nagios, Datadog, New Relic, etc. Expertise in modern development technologies and tools e.g., Agile, CI/CD, Git, Terraform and Jenkins. Knowledge of Internet protocols and web services technologies e.g., HTTP, DNS, TCP/UDP, SOAP, JSON and REST. Responsibilities: Design and Development tasks like creating new resiliency features, scaling the system, and implementing code to improve efficiency and Observability. Establish SLOs that capture end-user experiences and defend them so users happy. Monitoring SLOs and testing them in pre-production with intelligent quality gates to detect issues earlier in the development cycle. Responsible for how code and applications are monitored, as well as the availability, latency, change management, emergency response, and capacity management of services already in and going to production. Complex incident resolution across public cloud, private cloud, 3rd party, and on-premises technology platforms. AIOps/Automation/Design efforts for self-service, auto-detection and auto-healing. Use Chaos Engineering to find and prevent future problems and to confirm fixes from past incidents function as intended. Partner with development teams to implement changes to increase availability and performance based on empirical evidence. Keywords: cplusplus continuous integration continuous deployment golang New Jersey Texas http://bit.ly/4ey8w48 https://jobs.nvoids.com/job_details.jsp?id=356887&uid= |
[email protected] View All |
01:06 AM 11-Feb-23 |