Description
Responsibilities:
- Design, implement, and maintain robust platform infrastructure using Infrastructure as Code (IaC) tools such as Terraform, ensuring secure and scalable environments in our private cloud ecosystem.
- Embrace DevSecOps principles to bake security best practices into every aspect of the platform, from infrastructure design to pipeline automation.
- Drive automation for deployment and management processes using GitOps workflows as well as CI/CD pipelines.
- Build and maintain orchestration environments with Kubernetes, enabling containerized workflows and hybrid solutions that meet the platform's scientific workload requirements.
- Administer platform tools like Ansible, Vault, Consul, Prometheus, and Grafana to support core functions like configuration management, secrets management, monitoring, and observability.
- Respond to and resolve operational incidents; identify root causes for critical issues; implement strategies to prevent recurrence and improve platform resiliency.
- Proactively create and manage monitoring, logging, and alerting systems to ensure high availability, performance, and visibility across all services.
- Constantly evaluate and implement best practices and emerging tools/technologies that improve DevOps processes, platform capabilities, and development workflows.
- Write and maintain comprehensive documentation for platform processes and configurations to support both team members and end-users.
- Mentor and coach junior engineers in the team; foster a collaborative high-performing culture.
Qualifications:
- Deep knowledge of Ansible, Vault, Terraform, Consul, Prometheus,and Grafana including implementing robust monitoring observability,and secrets management practices.
- Solid foundation in Linux system administrationand most common related servicesand network protocols.
- Proven success in implementing DevSecOps practices including secure development compliance automationand minimizing risks across the development lifecycle.