Purpose of the role
In your role as a Site Reliability Engineer you become a key part of our agile DevOps team that designs, builds and maintains company’s internal production solutions based on Kubernetes and Infrastructure as Code. You will make sure that production services meet SLA and design monitoring/alerting mechanisms, which prevent systems from outages. You will work closely with products teams in order to constantly improve products and deployment configurations.
Key responsibilities and tasks
- Making sure that production services meet SLA
- Design monitoring and alerting mechanism that prevents systems from outages
- Reviewing incidents and documenting the findings to enable informed decision-making
- Apply best practices for deploying and updating cloud-native solutions for on-premise Kubernetes environment (Rancher, Kubevirt, Ceph, Vault)
- Working closely with products teams in order to constantly improve the quality of product-specific build & deployment configurations.
Professional skills and knowledge
- Docker and Kubernetes.
- Linux OS.
- TCP/IP networking, DNS and data storage
- Monitoring tools (Grafana, Prometheus)
- Infrastructure as Code and provisioning tools (i.e. Terraform, Ansible).
- Programming (i.e. Python) and Shell Scripting
- SFTP, SSH, TLS/SSL, REST protocols.
- Experience with on-call duties and resolving production incidents.
- Source code version control systems (Git).
- CI /CD tools (e.g. GitLab, Jenkins, Artifactory)
- Database knowledge (e.g. PostgreSQL)
Additional knowhow and skills in the following areas are a plus:
- Cloud platforms (AWS, GCP, MS Azure)
- Programming languages (Java/Go)
- NoSQL databases (i.e. MongoDB)
Required domain knowledge
- Performance, scalability and security of distributed systems
- Cloud-native design principles
- Infrastructure as code (design, configuration and programming)
- Linux OS
Personal skills
Value driven professional
Humble
- Ready to listen and to share expertise and best practice across own team
- Open for feedback and flexibility to adapt
Ambitious
- Taking accountability to drive results
- Going the extra mile for great customer experience
Smart
- Ability to prioritize multiple tasks and to set focus
- Passionate to find pragmatic solutions and to innovate
Humane
- Caring for other’s well-being in a remote working environment
- Advocates diversity and fairness
Qualification
- At least 2+ years of hand-on experience in building, monitoring and troubleshooting reliable systems based on container technologies (Kubernetes, Docker).
- At least 5+ years of experience with Linux OS, software development and infrastructure automation.
Why join us
- Individual freedom – tell us what’s important for YOU!
- Culture of trust, appreciation, innovation and opportunities
- The opportunity to make a difference in an agile and fun environment
- Work with clients in various industries around the world
- The opportunity to grow continuously as our technology field is so dynamic
- Competitive compensation and benefits package including profit sharing