Home

:Lead Site Reliability Engineer: at Hartford, Connecticut, USA
Email: [email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=968564&uid=

From:

Satyam gond,

Tek Inspirations LLC

[email protected]

Reply to:   [email protected]

Hi,

Hope you are doing great!!!

Please share suitable profiles.

Job Description -

please mention visa and loacation

Lead Site Reliability Engineer: 

6 months 

Location: Hartford, CT - Hybrid

MUST BE LOCAL TO CT ONLY - No Fake Representation

A high communication skillset is the priority here, with technical and process skillset as a slightly lower priority.

MUST HAVE LEAD EXPERIENCE WITH INSURANCE DOMAIN EXPERIENCE AND PROPERTY AND CASUALITY INSURANCE 

Key Responsibilities:

Looking for more of a Process guy to work as an SRE to setup observability and Monitoring metrics for each LOB to support Cloud systems.   

Wants them to lead Transformations end to end from development to deployment Blue/Green/Canary.

Recomending SLO/SLIs/SLAs- Setup Holistic, Open Source processes from beginning to end IaaC/IaaS, Automation, DevOps, Observability, CI/CD Pipelines, use metrics, create Dashboards.

Someone to Champion migration to Open-Source Platforms to establish standards-

Agile managing Backlogs/ Backlog refinement, metrics, golden signals.

The ideal candidate should have a strong background in SRE and IT operations, as well as proficiency in various programming languages. Position requires a strong technical understanding of complex IT environments, cloud, and evolving technologies.

Skills:

Solid understanding of AWS, DevSecOps practices, SAFe Agile methodologies 
Knowledgeable of Amazon Web Services including but not limited to EC2, S3, ECS, RDS, CloudWatch, SNS, CloudTrail, SQS, Service Catalog. 
Expertise with cloud platforms like AWS and microservices architecture 
  Familiarity with enterprise software solutions such as GitHub, Jenkins, Nexus, Ansible, Jira, Rally.. etc. 
Observability and Monitoring Tools and Metrics- Dynatrace, Splunk,Nagios, Cloudwatch, ELK, Grafana,Prometheus.....
Familiarity with programming languages (Python, Lambda, Go ) 
Experience in Infrastructure as Code (IaC) using CloudFormation & Terraform templates, YAML files, build specifications 
Must have exceptional communication skills (written, oral, presentation and facilitation) 
 Solid understanding of technologies that support the services offered for cloud applications

Qualifications:

 8 + years of relevant technical experience

BS degree in Engineering, Computer Science, or equivalent practical experience

Expertise designing, analyzing, and troubleshooting large-scale distributed systems. 

Experience in implementing Infrastructure as code 

Experience building software and maintaining systems in a highly secure, regulated or compliant industry 

Experience in monitoring infrastructure and application service level objectives to ensure functional and performance objectives.

Experience in implementing service dashboards for monitoring. objectives, and metrics

Experience developing and/or administering software in AWS cloud infrastructure 

System administration skills, including automation and orchestration of environments using Terraform or CloudFormation and configuration management 

3-5 years of experience in languages such as Python, Ruby, Bash, Power

Experience with container orchestration tools and container management (Docker, Kubernetes, etc.)

Proficiency with continuous integration and continuous delivery tooling and practices 

Must have exceptional communication skills (written, oral, presentation and facilitation) 

Responsibilities:

Influence and design architecture, infrastructure, standards and methods for large-scale cloud systems 

Engage in and improve the software development life-cycle through CI/CD; Improve build to deployment process to establish greater reliability and a sustainable release process; 

Oversee release gating; establish deployment metrics (DORA). 

Monitor and develop SLOs and SLIs through customer user journey; Advise on SLA; Establish error budgets What is SLI SLO and SLA

Observability and custom monitoring tool integrations; introduce telemetry to support SLOs

Automate system scalability and continually work to improve system resiliency, performance, and efficiency; Makes recommendations for design changes for improved reliability for HA Systems 

Deploy software through highly available deployments; rolling, blue-green or canary 

Provide mentorship to reliability engineering squads under a consistent framework for the Development, Testing and Alerting processes

Practice sustainable incident response through blameless RCA and postmortems 

Advise performance testing and capacity planning

Communicate proactively with colleagues and formally present work product outcomes and risk analysis to product team and management. 

Follow the Agile/Scrum working methodologies 

Establish dashboarding for monitoring capabilities and metrics

Regards,

Satyam Gond

Keywords: continuous integration continuous deployment sthree information technology golang Connecticut
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=968564&uid=
[email protected]
View All
07:57 PM 28-Dec-23


To remove this job post send "job_kill 968564" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]


Time Taken: 9

Location: Hartford, Connecticut