Job Details

Home

Site Reliability Engineer (SRE) at Remote, Remote, USA

http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=1244671&uid=

From:

Vikrama Rao,

ValiantIQ INC

[email protected]

Reply to: [email protected]

Position: Site Reliability Engineer (SRE)

Duration: 9 Months

Location: 100% Remote

Duration: 6 to 12 Months

Site Reliability Engineer (SRE)

Site Reliability Engineer will be playing a key role in building Observability and Resilience capabilities on cloud platform (Azure). Responsibilities of the SRE will be:

Build and configure alerts, tracing, telemetry, and instrumentation required for Infrastructure Monitoring and Application Performance Management.

Role entails implementing dashboards to monitor and share Observability at various levels (engineering teams, portfolio, Senior management).

Support resilience engineering (application and infrastructure resilience) to meet availability requirements.

Work with development engineers, cloud engineers, product teams, and support engineers to gather requirements, implement, and evolve observability and resilience solutions.

Key Skillsets:

Extensive knowledge on Observability and Application Performance Monitoring best practices, KPIs/metrics on Cloud platforms

Experience in monitoring tools - Dynatrace and Splunk

Experience with incident resolution (on-call support), application errors and performance troubleshooting using Dynatrace and Splunk to assist application team on root cause analysis

Experience working with SLO and Error budget, understanding of SLA/SLI/SLO

Expertise with Splunk Query Language

Experience building monitoring solutions for container-based workloads (Java / Spring boot desirable), databases, Kafka and Kubernetes

Experience in resilience engineering, and implementing high availability solutions

Experience creating Monitoring dashboards using Dynatrace and Splunk

Ability to work in a fast paced and agile environment

SRE Maturity Level 3 (Expectation)

DevOps Observability

DORA Metrics are visible.

Deployment frequency, Mean Time to Restore (MTTR), Cycle time, Change failure rate

IaC (Infrastructure as Code)

Platforms leverage IaC.

Test / Release automation

Unit tests

Test in a vacuum

Integration tests

Load test results validated against SLOs.

Test run as part of CI/CD pipeline.

Automated rollback

Business Continuity Plan for Recovering Service(s)

Capacity planning review

Show saturation of service as compared to load test and production peak load.

Product Management (Security)

Security scanning

Documented procedures for Vulnerability Management

Integrated into CI/CD pipeline (partner with security)

SRE Maturity Level 4 (Advanced)

Modernized application.

Deployment to Kubernetes, Azure, or SaaS via CI/CD pipeline

Synthetic Monitoring

Canary / Blue Green Deployment

Self-Healing

Auto scaling

Identify KPIs for business performance.

Chaos Engineering

Thanks & Regards,

Vikrama Rao

Recruitment Executive-

ValiantIQ Inc

.

"Searching Best Minds

Searching Best Minds"

Email:

[email protected]

P. 704-249-2259 F. (302) 482-3672

Disclaimer:

If you are not interested in receiving our e-mails then please reply with a "REMOVE" in the subject line for automatic removal. And mention all the e-mail addresses to be removed with any e-mail addresses, which might be diverting the e-mails to you. We are sorry for the inconvenience

Keywords: continuous integration continuous deployment
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=1244671&uid=

[email protected]
View All

09:42 PM 22-Mar-24

To remove this job post send "job_kill 1244671" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

vrao@valiant-iq.com wrote:
From:

Vikrama Rao,

ValiantIQ INC

vrao@valiant-iq.com

Reply to:   vrao@valiant-iq.com

Position: Site Reliability Engineer (SRE)

Duration: 9 Months

Location: 100% Remote

Duration: 6 to 12 Months

Site Reliability Engineer (SRE)

Site Reliability Engineer will be playing a key role in building Observability and Resilience capabilities on cloud platform (Azure). Responsibilities of the SRE will be:

Build and configure alerts, tracing, telemetry, and instrumentation required for Infrastructure Monitoring and Application Performance Management.

Role entails implementing dashboards to monitor and share Observability at various levels (engineering teams, portfolio, Senior management).

Support resilience engineering (application and infrastructure resilience) to meet availability requirements.

Work with development engineers, cloud engineers, product teams, and support engineers to gather requirements, implement, and evolve observability and resilience solutions.

Key Skillsets:

Extensive knowledge on Observability and Application Performance Monitoring best practices, KPIs/metrics on Cloud platforms

Experience in monitoring tools - Dynatrace and Splunk

Experience with incident resolution (on-call support), application errors and performance troubleshooting using Dynatrace and Splunk to assist application team on root cause analysis

Experience working with SLO and Error budget, understanding of SLA/SLI/SLO

Expertise with Splunk Query Language

Experience building monitoring solutions for container-based workloads (Java / Spring boot desirable), databases, Kafka and Kubernetes

Experience in resilience engineering, and implementing high availability solutions

Experience creating Monitoring dashboards using Dynatrace and Splunk

Ability to work in a fast paced and agile environment

SRE Maturity Level 3 (Expectation)

DevOps Observability

DORA Metrics are visible.

Deployment frequency, Mean Time to Restore (MTTR), Cycle time, Change failure rate

IaC (Infrastructure as Code)

Platforms leverage IaC.

Test / Release automation

Unit tests

Test in a vacuum

Integration tests

Load test results validated against SLOs.

Test run as part of CI/CD pipeline.

Automated rollback

Business Continuity Plan for Recovering Service(s)

Capacity planning review

Show saturation of service as compared to load test and production peak load.

Product Management (Security)

Security scanning

Documented procedures for Vulnerability Management

Integrated into CI/CD pipeline (partner with security)

SRE Maturity Level 4 (Advanced)

Modernized application.

Deployment to Kubernetes, Azure, or SaaS via CI/CD pipeline

Synthetic Monitoring

Canary / Blue Green Deployment

Self-Healing

Auto scaling

Identify KPIs for business performance.

Chaos Engineering

Thanks & Regards,

Vikrama Rao

Recruitment Executive-

ValiantIQ Inc

"Searching Best Minds

Searching Best Minds"

Email:

vrao@valiant-iq.com

P. 704-249-2259 F. (302) 482-3672

Disclaimer:

If you are not interested in receiving our e-mails then please reply with a "REMOVE" in the subject line for automatic removal. And mention all the e-mail addresses to be removed with any e-mail addresses, which might be diverting the e-mails to you. We are sorry for the inconvenience

Keywords: continuous integration continuous deployment

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]

Time Taken: 24

Location: , Remote