Site Reliability/DevOps Engineer

Job Locations US-TX-Dallas
ID
2025-3472
Category
Technology-Engineering
Position Type
Regular Full-Time

Overview

ViaPath is seeking a Site Reliability Engineer in our Enterprise Operations department. SRE personnel combine engineering experience and an innate drive to improve existing systems and processes with the creativity to develop novel solutions to evolving challenges. SRE is responsible for the availability and reliability of critical platform services and applications, including launching product updates, locating production errors and issues and building integrations that improve users’ experience.

 

The SRE will support our Product Engineering pipeline, cloud, and datacenter environments. This position requires participation in an on-call rotation to provide 24/7 operations support.

 

This position is a hybrid based position (office/home based) based out of one of the following ViaPath offices: Altoona, PA, Dallas, TX, Fruitland, ID or Pittsburgh, PA

 

 

Responsibilities

  • Run the production environment by monitoring availability and taking a holistic view of system health
  • Build software and systems to manage platform infrastructure and applications
  • Improve reliability, quality, and time-to-market of our suite of software solutions
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
  • Provide primary operational support and engineering for multiple large, distributed software applications
  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
  • Partner with development teams to improve services through rigorous testing and release procedures
  • Participate in system design consulting, platform management, and capacity planning
  • Balance feature development speed and reliability with well-defined service level objectives

Qualifications

  • Bachelor’s degree in computer science or other highly technical, scientific discipline preferred; related equivalent years of experience will be considered in lieu of a degree. 
  • A minimum of 2 years of site reliability, NetOps, DevOps, or similar experience, including responsibility for supporting production systems.
  • Experience administrating Linux, installing, configuring, and maintaining Linux operating systems. Analyze and resolve problems associated with the operating systems, hardware, applications, and software.
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
  • Language and communication, possess excellent written and verbal skills. Ability to actively listen and to identify essential issues. Ability to read and interpret technical instructions and documentation.
  • Excellent problem solving skills

 Preferred Experience with the following technologies

  • Experience with the core AWS services, including ALB, ELB, EC2, RDS, and S3 is preferred.
  • Experience with distributed storage technologies like NFS, HDFS, S3 as well as dynamic resource management frameworks (Kubernetes, Cinc, Jenkins) is preferred.
  • Coding experience beyond simple scripts. Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript
  • Previous success in technical engineering
  • Cinc
  • GITLab
  • Kubernetes
  • Proxmox
  • MySql

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed