Cybereason Production team provides a high level of availability and
performance by using Infrastructure as code methodologies ( Terraform &
Ansible ) to build and maintain a large scale multi cloud production
environment.
Responsibilities
* Write playbooks and scripts to build, deploy and maintain our production infrastructure over GCP, AWS and OCI.
* Analyze complex system behavior, performance, and application issues.
* Ensure all infrastructure and application alerts are “actionable” alerts and/or self-healing automation.
* Work closely with the Ops team – offering education and guidance on integration, support, and monitoring across the toolset.
* Demonstrate complex troubleshooting skills, deep knowledge of the services running on the infrastructure.
* Live Site Management – as an SRE you will play a crucial role in a global team driving huge-scale live sites 24/7 and gaining deep understanding of availability, performance, and security
Requirements
* At least 2 years of experience as a DevOps/SRE/Developer
* Experience in cloud environments (AWS/GCP).
* Experience with Docker, Kubernetes and Helm.
* Experience with infrastructure as code automation ( Terraform, Ansible ).
* Experience in service automation using scripting tools – Python/Bash/Golang.
* Experience in Linux system administration
* Experience with distributed systems, networking, hardware.
* Capable of technical deep-dives into networking, service design, operating systems, and storage.
* BS in Computer Science or related technical certifications or experience.