Returning Candidate?

Site Reliability/DevOps Engineer

ID: 2025-3472
Category: Technology-Engineering
Position Type: Regular Full-Time

Overview

ViaPath is seeking a Site Reliability Engineer in our Enterprise Operations department. SRE personnel combine engineering experience and an innate drive to improve existing systems and processes with the creativity to develop novel solutions to evolving challenges. SRE is responsible for the availability and reliability of critical platform services and applications, including launching product updates, locating production errors and issues and building integrations that improve users’ experience.

The SRE will support our Product Engineering pipeline, cloud, and datacenter environments. This position requires participation in an on-call rotation to provide 24/7 operations support.

This position is a hybrid based position (office/home based) based out of one of the following ViaPath offices: Altoona, PA, Dallas, TX, Fruitland, ID or Pittsburgh, PA

Responsibilities

Run the production environment by monitoring availability and taking a holistic view of system health
Build software and systems to manage platform infrastructure and applications
Improve reliability, quality, and time-to-market of our suite of software solutions
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
Provide primary operational support and engineering for multiple large, distributed software applications
Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Balance feature development speed and reliability with well-defined service level objectives

Qualifications

Bachelor’s degree in computer science or other highly technical, scientific discipline preferred; related equivalent years of experience will be considered in lieu of a degree.
A minimum of 2 years of site reliability, NetOps, DevOps, or similar experience, including responsibility for supporting production systems.
Experience administrating Linux, installing, configuring, and maintaining Linux operating systems. Analyze and resolve problems associated with the operating systems, hardware, applications, and software.
A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
Language and communication, possess excellent written and verbal skills. Ability to actively listen and to identify essential issues. Ability to read and interpret technical instructions and documentation.
Excellent problem solving skills

Preferred Experience with the following technologies

Experience with the core AWS services, including ALB, ELB, EC2, RDS, and S3 is preferred.
Experience with distributed storage technologies like NFS, HDFS, S3 as well as dynamic resource management frameworks (Kubernetes, Cinc, Jenkins) is preferred.
Coding experience beyond simple scripts. Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript
Previous success in technical engineering
Cinc
GITLab
Kubernetes
Proxmox
MySql

Options

Apply for this job onlineApply

Email this job to a friendRefer

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.

Share on your newsfeed

Application FAQs