We are seeking a talented and experienced Site Reliability Engineer (SRE) to ensure the uptime and reliability of our mission-critical systems. In this role, you’ll automate and streamline operational tasks, continuously looking for ways to improve performance, efficiency, and scalability.
You’ll work closely with developers to provide infrastructure-focused feedback that enhances product performance. This is a unique opportunity to sharpen your SRE skill set and become an invaluable member of the core Operations team.
Responsibilities:
- Maintain uptime of LogicMonitor’s SaaS-based platform and implement technical and process improvements to enhance system reliability.
- Ensure the security and stability of the production environment through proactive monitoring and risk mitigation strategies.
- Design, deploy, and manage scalable infrastructure and system integrations to support business growth and technical innovation.
- Write code to automate infrastructure maintenance, deployments, and routine operational tasks to increase efficiency and reduce manual effort.
- Partner closely with development teams to support and influence operational architecture and design changes.
- Lead cross-functional, technically complex projects, driving execution and alignment across teams.
- Act as a strategic technical resource across the organization, developing and delivering presentations for internal teams, customers, and external conferences.
- Mentor junior team members, fostering growth, knowledge sharing, and operational excellence.
- Set a high standard for documentation and runbook quality, leading by example to promote clarity, consistency, and operational readiness.
Requirements:
- 3+ years of experience in a Linux engineering role, preferably in a SaaS-based company.
- Solid understanding of Linux system administration in distributed environments.
- Experience with configuration management tools such as Chef, Puppet, or Ansible.
- Experience with virtualization and container technologies (e.g., Docker, Kubernetes).
- Programming/scripting experience (Python, Shell, Go).
- Knowledge of security as it relates to Linux systems, applications, and networking.
- High-level understanding of networking technologies, including routing, switching, firewalls, and iptables.
- Able to work independently and self-direct projects.
Benefits:
- Comprehensive health, dental and vision coverage
- Generous parental leave policies
- Access to our Employee Assistance Program and various Wellness programs
- A 401K with company matching
- A learning and development stipend
- An unlimited vacation policy
How to Apply
Interested in this position? Please submit your resume and cover letter through the application portal.
Apply Now