Job Details

Home

Sr Site Reliability Engineer at Plano, Texas, USA

http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=356887&uid=

From:

bhumika,

adifice tehnologies

[email protected]

Reply to: [email protected]

Sr Site Reliability Engineer

Location: Plano, TX or Jersey City, NJ (Onsite from Day 1)

Duration: Long Term

Visa: No H1B/OPT/CPT

Client: Cognizant/JPMC

(Need 10+Years of experience candidate)

THIS IS NOT A DEVOPS ROLE.

Required Qualifications:

Bachelors degree or equivalent experience in a software engineering discipline.

Highly skilled SRE with 9+ years of experience.

SRE mindset (Exposure to SRE tenants that includes - Observability, Monitoring, Alerting, Logging, Toil, Automation, SLO/SLI/SLA/Error Budgets).

Experience deploying and managing services on modern platforms (AWS, GCP, Azure, PCF).

In-Depth OS experience e.g., RHEL, Ubuntu, Windows Server with strong debugging, troubleshooting, and problem-solving skills.

Background as a software developer (Experience in cloud native, distributed application design and implementation), proficiency in languages like Java, Python, C++, Go, etc.

Strong experience in using industry standard monitoring tools e.g., AppDynamics, Dynatrace, APICA, Splunk, ELK, FluentD, Prometheus, Kibana, Elasticsearch, Grafana, Nagios, Datadog, New Relic, etc.

Expertise in modern development technologies and tools e.g., Agile, CI/CD, Git, Terraform and Jenkins.

Knowledge of Internet protocols and web services technologies e.g., HTTP, DNS, TCP/UDP, SOAP, JSON and REST.

Responsibilities:

Design and Development tasks like creating new resiliency features, scaling the system, and implementing code to improve efficiency and Observability.

Establish SLOs that capture end-user experiences and defend them so users happy.

Monitoring SLOs and testing them in pre-production with intelligent quality gates to detect issues earlier in the development cycle.

Responsible for how code and applications are monitored, as well as the availability, latency, change management, emergency response, and capacity management of services already in and going to production.

Complex incident resolution across public cloud, private cloud, 3rd party, and on-premises technology platforms.

AIOps/Automation/Design efforts for self-service, auto-detection and auto-healing.

Use Chaos Engineering to find and prevent future problems and to confirm fixes from past incidents function as intended.

Partner with development teams to implement changes to increase availability and performance based on empirical evidence.

Keywords: cplusplus continuous integration continuous deployment golang New Jersey Texas
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=356887&uid=

[email protected]
View All

01:06 AM 11-Feb-23

To remove this job post send "job_kill 356887" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

bhumika@adificeusa.com wrote:
From:

bhumika,

adifice tehnologies

bhumika@adificeusa.com

Reply to:   bhumika@adificeusa.com

Sr Site Reliability Engineer

Location: Plano, TX or Jersey City, NJ (Onsite from Day 1)

Duration: Long Term

Visa: No H1B/OPT/CPT

Client: Cognizant/JPMC

(Need 10+Years of experience candidate)

THIS IS NOT A DEVOPS ROLE.

Required Qualifications:

Bachelors degree or equivalent experience in a software engineering discipline.

Highly skilled SRE with 9+ years of experience.

SRE mindset (Exposure to SRE tenants that includes - Observability, Monitoring, Alerting, Logging, Toil, Automation, SLO/SLI/SLA/Error Budgets).

Experience deploying and managing services on modern platforms (AWS, GCP, Azure, PCF).

In-Depth OS experience e.g., RHEL, Ubuntu, Windows Server with strong debugging, troubleshooting, and problem-solving skills.

Background as a software developer (Experience in cloud native, distributed application design and implementation), proficiency in languages like Java, Python, C++, Go, etc.

Strong experience in using industry standard monitoring tools e.g., AppDynamics, Dynatrace, APICA, Splunk, ELK, FluentD, Prometheus, Kibana, Elasticsearch, Grafana, Nagios, Datadog, New Relic, etc.

Expertise in modern development technologies and tools e.g., Agile, CI/CD, Git, Terraform and Jenkins.

Knowledge of Internet protocols and web services technologies e.g., HTTP, DNS, TCP/UDP, SOAP, JSON and REST.

Responsibilities:

Design and Development tasks like creating new resiliency features, scaling the system, and implementing code to improve efficiency and Observability.

Establish SLOs that capture end-user experiences and defend them so users happy.

Monitoring SLOs and testing them in pre-production with intelligent quality gates to detect issues earlier in the development cycle.

Responsible for how code and applications are monitored, as well as the availability, latency, change management, emergency response, and capacity management of services already in and going to production.

Complex incident resolution across public cloud, private cloud, 3rd party, and on-premises technology platforms.

AIOps/Automation/Design efforts for self-service, auto-detection and auto-healing.

Use Chaos Engineering to find and prevent future problems and to confirm fixes from past incidents function as intended.

Partner with development teams to implement changes to increase availability and performance based on empirical evidence.

Keywords: cplusplus continuous integration continuous deployment golang New Jersey Texas

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]

Time Taken: 8

Location: Plano, Texas