Reliability Engineer I (REMOTE)

Full Time

Website DICK'S Sporting Goods

The Reliability Engineer I will work with other Reliability Engineers (RE), Product Managers, Software Engineers, and Architects to produce mission-critical infrastructure, tools, performance improvements, actionable and meaningful performance measurements, and communication to stakeholders. The RE will work with the business and technical teams to close gaps between solution design and business processes and assess impacts to services/processes across the enterprise. The RE role at Dick’s Sporting Goods (DSG) provides an opportunity to blend system design and software engineering skills with passion for troubleshooting and defects elimination to address an ever-changing applications and environments with scalability and reliability challenges.
This is a full time remote opportunity. We are looking to hire immediately.
Responsibilities:
Perform engineering and technical tasks as assigned by applying general engineering principles.
Perform independent research in support of technical tasks.
Contribute positively to open-source projects developed by DSG and join existing communities. Navigate this broader ecosystem and structure projects with upstream/ downstream opportunities in mind.
Participate in an on- call rotation, have strong written communication skills, and be able to develop working relationships with coworkers.
Provide technical expertise and consultation through direct involvement to identify and resolve problems.
Work frequently with Product teams on shared goals.
Troubleshoot infrastructure and application performance and availability issues.
Drive root cause analysis/investigations through identifying, analyzing and remediating service(s) performance and availability issues to ensure maximum service uptime and availability.
Bring experience, pragmatism, empathy, and composure to interactions with teams outside of the RE organization.
Work frequently with Product teams on shared goals.
Balance planned and reactive work using basic project planning techniques and technical roadmaps.
Identify and integrate with third-party solutions where it makes the most sense.
Use data to understand the availability, reliability, and sustainability of our software.
Qualifications
Bachelor’s Degree in Computer Science
1-3 years of relevant experience
Valuable Technologies Like: Cloud computing, Web Services, Kubernetes, (Repository Management git/svn/etc), Ansible, Terraform, Virtualization, Docker Containers, Kafka, RabbitMQ, Redis, Netbox, Akamai/Apigee
Valuable Methodologies Like: Agile, SCRUM, Reliability Engineering, 12 factor apps, microservice architecture, public cloud architecture
Valuable Languages Like: React, Node, Kotlin, C#, Java, JavaScript, Linux shell, Powershell, SQL, HTML, CSS
Valuable Databases/OS Systems Like: Non-relational databases (NoSQL, Elasticsearch, CosmosDB), MySQL, Postgres, SQLServer, Oracle, DB2, Windows, Linux
Valuable Monitoring Tools Like: Valuable Databases/OS Systems Like: Non-relational databases (NoSQL, Elasticsearch, CosmosDB), MySQL, Postgres, SQLServer, Oracle, DB2, Windows, Linux
Service Management Tools Like: Jira, Pivotal Tracker, Xmatters
Intellectual curiosity, problem solving and openness is key to its success. Mindset for solving production systems issues and understanding root cause while providing “Detective work” and automating away toil – doesn’t like boring repetitive tasks. Enjoys digging into new problems.
Capable of digging into common system performance issues, such as “this is slow”, and developing metrics and driving measurable improvements.
Can work on different tasks in different systems week to week
Knows when to ask for help and when to dig more on their own
Understanding of and comfort with the GNU/Linux operating system.
Proficiency in high-level languages such as Ruby, Python, and Bash.
Exposure to system-level languages such as Go, Python
Familiarity with configuration management software such as Puppet, Chef, Ansible, or Salt.
Source control, branching, & merging: git/svn/etc (Repository Management)
Familiarity with Infrastructure as Code
Databases – at a minimum understands the basics – select/insert
Familiarity with standard infrastructure concepts like load balancers, firewalls, object storage and where/when they might be used.
Service Management – Incident Response, Change, and Problem Management.
Experience with Kubernetes and Docker.
Networking basics: TCP vs UDP, basic troubleshooting, HTTP – load balancing, firewall, private networks, multi-tier design, scale-out, persistent data
Cloud computing concepts (not necessarily provider specific) – VMs vs Docker Containers, block storage vs object storage, infra automation vs install automation.
Experience operating a platform, software as a service, or shipping software.
First-hand experience with Prometheus and Istio.
Experience as an open-source contributor.Show more

Maximum file size: 256 MB.
Upload your CV/resume or any other relevant file. Max. file size: 256 MB.