Job Details

Home

Site Reliability Engineer Need Owings Mills, Maryland at Maryland, New York, USA

http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=2239407&uid=

From:

Satyajit Nayak,

tekinspirations

[email protected]

Reply to: [email protected]

Job Description -
Job Description -Senior Site Reliability Engineer (SRE) Qual Call Notes
Interview Process: Initial Zoom interview, followed by an on-site interview ( Must be local no representation & need legit Linkedin profile)
Work Schedule: Hybrid (2 days on-site, 3 remote) in Owings Mills Monday and Tuesday.
Degree Requirement: Mandatory
They will need to use their own device to connect via citrix
Must have old Linkedin with profile photo; need write why he is fit for this role Key Requirements & Skills
Primary Focus:
Strong analytical skills in monitoring and automation within an SRE role
Experience with Grafana, Prometheus, and other monitoring tools listed in the JD
Must have performed the SRE role in a cloud environment (AWS), particularly in the last two projects
Scripting:
Python & Bash are critical
Perl is NOT needed (considered outdated)
AWS Tech Stack:
CloudWatch (metrics and observability)
ECS & EKS (primary container orchestration tools)
Infrastructure Automation:
Terraform is preferred, but Ansible is also acceptable
Industry Experience:
Financial services experience is NOT required; SRE principles are consistent across industries Expectations for Candidates/ Resumes
Concise, focused resumesno long resumes; highlight key skills and value-added roles!! Does not want to see 10 page resumes.
Candidates must have hands-on experience with monitoring, automation, and cloud reliability
Not a DevOps rolewhile full-stack engineers who have done SRE work are welcome, they will not be doing development work
Looking for self-motivated individuals who can bring new ideas to the team
Candidates should clearly articulate their hands-on SRE experience, including daily tasks vs. prototype work
On-call rotation: Approximately every 4-6 weeks, but may vary Why the Role is Open
Expansion hire (not a backfill) to support a growing team and scope TRP Job Description:
Work Location: Owing Mills, Hybrid, must be onsite day 1This resource will be required to sit onsite in Owings Mills, 2 days onsite, 3 days remote. They will need to use their own device to connect via citrix. First round of interviews will be via Zoom and second round of interviews will be in person onsite.

Senior Site Reliability Engineer

Overview

The Technology Engineering team is looking for an experienced Site Reliability Engineer to join us as we are reimagining the production application and infrastructure management. The team is responsible for engineering scalable and resilient hybrid cloud solutions (both AWS and On-prem). You will be responsible for creating tooling and software that monitors and improves the reliability of our systems. In this role, you will research problems, evaluate modern technologies, create prototypes, develop (integrated process, automation, define standards) observability tooling, and provide SRE consulting on complex projects.

Requires specialized in-depth knowledge and expertise in your own job discipline, Amazon Web Services (AWS) platform and/or other cloud-based platforms and deep experience in integrating related disciplinary knowledge
Works independently, receives minimal guidance
Accountable for work of yourself and others; sets standards around which others will operate
Proactively identifies problems and can present and implement solutions to these problems

Role summary and job responsibilities
Design and implement highly automated systems/services that ensure the availability, reliability, and scalability of infrastructure and applications.
Build and maintain monitoring and alerting to provide timely feedback on the performance and health of systems, network, and applications.
Design and implement automation tools to reduce manual toil, streamline repetitive tasks, and enhance overall operational efficiency.
Design and build Service Level Indicator (SLIs) metrics, including but not limited to Service Level Objectives (SLOs), Error Budget, Burn Rate Alerts
Work closely with development teams to embed reliability best practices into the software development process. Provide mentorship and training to cross-functional teams on SRE principles, encouraging a shared responsibility for the reliability of our services.
Collaborating with our support, operations and engineering teams to investigate and troubleshoot complex problems
Observe and monitor systems to make sure you have the insight into system performance, health, availability and what is happening internally in the system.
Understands what to monitor based on the system(s) you are managing, how the monitoring data is stored, and how to look at the data to make determinations about future actions.
Participates in continuous improvement efforts that span multiple multi-functional domains and informs the generation of new standards
Be a part of an on-call rotation, continuously enhance automation & documentation, and mentor others on the standard methodologies of infrastructure automation to encourage adoption.
Able to overcome differences of opinion and drive team alignment around a specific goal or solution
Holds associates and teams accountable for adhering to practices and policies

Business knowledge
Demonstrates deep knowledge of products/flows within supported businesses
Decomposes the most complex problems into discrete work units.
Identifies non-obvious relationships and anomalies often overlooked by others.
Balances strategic and pragmatic concerns when solving problems.
Makes sound decisions with limited facts or resources.
Makes decisions that are cognizant of the firms broader business strategy.
Demonstrates deep knowledge of products/flows within the businesses they support.
Articulates broader business concerns and/or regulatory landscape, including key risks and controls (e.g., GDPR, MIFID, SOX).

Requirements
Strong experience with Monitoring and Alerting tools such as Prometheus, Grafana, New Relic
Experience in container orchestration solutions in AWS with ECS, Fargate
Docker container development experience
Scripting languages like Python, Groovy, Power, Bash, Perl etc.
Skilled in building and maintaining dashboards using tools like Grafana, Prometheus and Statsd to provide critical insights
Worked with Service Reliability Engineering team to design SLI and SLO for respective applications
Strong experience with AWS cloud infrastructure and container orchestration operating in a GitOps framework
A solid core foundation in infrastructure and systems engineering including Unix/Linux compute, networking, storage, and monitoring stacks.
Have experience using automation tools such as Terraform, Ansible
Excellent written and oral communication skills
Strong interpersonal skills, adaptable and able to learn quickly
Off-hour implementations are required
Ability to build positive working relationships with the business contacts, within our IT team, and other IT departments
Ability to identify tasks and help develop project plans for medium and large-scale projects

Preferred
College degree in computer science or related technical field with 7+ years of systems design, programming, implementation, and integration experience
3+ years of experience within the Amazon Web Services platform
AWS, Kubernetes Certifications

Regards,

Satyajit nayak

Sr. Technical Recruiter

TEK Inspirations LLC |
13573 Tabasco Cat Trail, Frisco, TX 75035

E
:

[email protected]

Linkedin:

linkedin.com/in/satyajeet-nayak-85751625b

Keywords: information technology Texas
Site Reliability Engineer Need Owings Mills, Maryland
[email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=2239407&uid=

[email protected]
View All

03:06 AM 08-Mar-25

To remove this job post send "job_kill 2239407" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

satyajit@tekinspirations.com wrote:
From:

Satyajit Nayak,

tekinspirations

satyajit@tekinspirations.com

Reply to:   satyajit@tekinspirations.com

Job Description -
Job Description -Senior Site Reliability Engineer (SRE)  Qual Call Notes 
Interview Process: Initial Zoom interview, followed by an on-site interview ( Must be local no representation & need legit Linkedin profile)
Work Schedule: Hybrid (2 days on-site, 3 remote) in Owings Mills  Monday and Tuesday. 
Degree Requirement: Mandatory
They will need to use their own device to connect via citrix 
Must have old Linkedin with profile photo; need write why he is fit for this role Key Requirements & Skills 
Primary Focus: 
Strong analytical skills in monitoring and automation within an SRE role
Experience with Grafana, Prometheus, and other monitoring tools listed in the JD
Must have performed the SRE role in a cloud environment (AWS), particularly in the last two projects
Scripting: 
Python & Bash are critical
Perl is NOT needed (considered outdated)
AWS Tech Stack: 
CloudWatch (metrics and observability)
ECS & EKS (primary container orchestration tools)
Infrastructure Automation:  
Terraform is preferred, but Ansible is also acceptable
Industry Experience: 
Financial services experience is NOT required; SRE principles are consistent across industries Expectations for Candidates/ Resumes
Concise, focused resumesno long resumes; highlight key skills and value-added roles!! Does not want to see 10 page resumes. 
Candidates must have hands-on experience with monitoring, automation, and cloud reliability
Not a DevOps rolewhile full-stack engineers who have done SRE work are welcome, they will not be doing development work
Looking for self-motivated individuals who can bring new ideas to the team
Candidates should clearly articulate their hands-on SRE experience, including daily tasks vs. prototype work
On-call rotation: Approximately every 4-6 weeks, but may vary Why the Role is Open
Expansion hire (not a backfill) to support a growing team and scope TRP Job Description: 
Work Location: Owing Mills, Hybrid, must be onsite day 1This resource will be required to sit onsite in Owings Mills, 2 days onsite, 3 days remote. They will need to use their own device to connect via citrix. First round of interviews will be via Zoom and second round of interviews will be in person onsite.

Senior Site Reliability Engineer

Overview

The Technology Engineering team is looking for an experienced Site Reliability Engineer to join us as we are reimagining the production application and infrastructure management. The team is responsible for engineering scalable and resilient hybrid cloud solutions (both AWS and On-prem). You will be responsible for creating tooling and software that monitors and improves the reliability of our systems. In this role, you will research problems, evaluate modern technologies, create prototypes, develop (integrated process, automation, define standards) observability tooling, and provide SRE consulting on complex projects.

Requires specialized in-depth knowledge and expertise in your own job discipline, Amazon Web Services (AWS) platform and/or other cloud-based platforms and deep experience in integrating related disciplinary knowledge
 Works independently, receives minimal guidance
 Accountable for work of yourself and others; sets standards around which others will operate
 Proactively identifies problems and can present and implement solutions to these problems

Role summary and job responsibilities
 Design and implement highly automated systems/services that ensure the availability, reliability, and scalability of infrastructure and applications.
 Build and maintain monitoring and alerting to provide timely feedback on the performance and health of systems, network, and applications.
 Design and implement automation tools to reduce manual toil, streamline repetitive tasks, and enhance overall operational efficiency.
 Design and build Service Level Indicator (SLIs) metrics, including but not limited to Service Level Objectives (SLOs), Error Budget, Burn Rate Alerts
 Work closely with development teams to embed reliability best practices into the software development process. Provide mentorship and training to cross-functional teams on SRE principles, encouraging a shared responsibility for the reliability of our services.
 Collaborating with our support, operations and engineering teams to investigate and troubleshoot complex problems
 Observe and monitor systems to make sure you have the insight into system performance, health, availability and what is happening internally in the system.
 Understands what to monitor based on the system(s) you are managing, how the monitoring data is stored, and how to look at the data to make determinations about future actions.
 Participates in continuous improvement efforts that span multiple multi-functional domains and informs the generation of new standards
 Be a part of an on-call rotation, continuously enhance automation & documentation, and mentor others on the standard methodologies of infrastructure automation to encourage adoption.
 Able to overcome differences of opinion and drive team alignment around a specific goal or solution
 Holds associates and teams accountable for adhering to practices and policies

Business knowledge
 Demonstrates deep knowledge of products/flows within supported businesses
 Decomposes the most complex problems into discrete work units.
 Identifies non-obvious relationships and anomalies often overlooked by others.
 Balances strategic and pragmatic concerns when solving problems.
 Makes sound decisions with limited facts or resources.
 Makes decisions that are cognizant of the firms broader business strategy.
 Demonstrates deep knowledge of products/flows within the businesses they support.
 Articulates broader business concerns and/or regulatory landscape, including key risks and controls (e.g., GDPR, MIFID, SOX).

Requirements
 Strong experience with Monitoring and Alerting tools such as Prometheus, Grafana, New Relic
 Experience in container orchestration solutions in AWS with ECS, Fargate
 Docker container development experience
 Scripting languages like Python, Groovy, Power, Bash, Perl etc.
 Skilled in building and maintaining dashboards using tools like Grafana, Prometheus and Statsd to provide critical insights
 Worked with Service Reliability Engineering team to design SLI and SLO for respective applications
 Strong experience with AWS cloud infrastructure and container orchestration operating in a GitOps framework
 A solid core foundation in infrastructure and systems engineering including Unix/Linux compute, networking, storage, and monitoring stacks.
 Have experience using automation tools such as Terraform, Ansible
 Excellent written and oral communication skills
 Strong interpersonal skills, adaptable and able to learn quickly
 Off-hour implementations are required
 Ability to build positive working relationships with the business contacts, within our IT team, and other IT departments
 Ability to identify tasks and help develop project plans for medium and large-scale projects

Preferred
 College degree in computer science or related technical field with 7+ years of systems design, programming, implementation, and integration experience
 3+ years of experience within the Amazon Web Services platform
 AWS, Kubernetes Certifications

Regards,

Satyajit nayak

Sr. Technical Recruiter

TEK Inspirations LLC | 
13573 Tabasco Cat Trail, Frisco, TX 75035

E
:

satyajit@tekinspirations.com

Linkedin:

linkedin.com/in/satyajeet-nayak-85751625b

Keywords: information technology Texas 
Site Reliability Engineer  Need Owings Mills, Maryland
satyajit@tekinspirations.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]

Time Taken: 25

Location: , Indiana