Description
Company Overview
Paymentology is the first truly global issuer-processor, giving banks and fintechs the technology, team and experience to rapidly issue and process Mastercard, Visa and UnionPay cards across more than 60 countries, at scale. Our advanced, multi-cloud platform, offering both shared and dedicated processing instances, vast global presence and richer, real-time data, set us apart as the leader in payments.
Position Overview
We're on the hunt for an exceptional Site Reliability Engineer (SRE) to join our dedicated team. As an SRE at Paymentology, you'll be the superhero responsible for maintaining, improving, and ensuring the high availability, scalability, and performance of our platform.
Key Responsibilities
Platform Reliability and Scalability:
- Build software that enhances Paymentology services' scalability and reliability.
- Ensure platform services meet required uptime and service quality levels.
- Contribute to the design of reliable cloud infrastructure and implement reusable cloud-uptime components as code.
- Regularly review and optimise SRE practices, tools, and methodologies to enhance overall system reliability and team efficiency.
Observability and Automation:
- Contribute to the design, implementation, and maintenance of observability and monitoring solutions to track platform health.
- Develop automation scripts/tools to streamline operations/reduce manual interventions.
- Enable product teams to self-serve by participating in developing a developer platform.
Production Issue Resolution:
- Play an active role with incident response teams diagnosing/resolving production issues quickly.
Standards Compliance:
- Support product teams in building services that adhere to security/quality standards.
Cross-team Collaboration:
- Work closely with engineering/operations/product teams ensuring reliability is considered throughout software development lifecycle through advocacy/developing a culture of reliability.
Requirements
Education & Experience:
- Bachelorβs Degree in Computer Science or related field.
- Minimum of 3 years in a dedicated SRE role; 5+ years prior software development experience.
Technical Skills:
- Comprehensive understanding of large-scale distributed platform architecture.
- Extensive hands-on cloud experience (particularly AWS).
- Proven experience developing scalable modular infrastructure-as-code projects using Terraform/CloudFormation/Puppet/Ansible.
- Practical experience with Docker/container orchestrators (AWS ECS & EKS/Kubernetes).
- Experience administering/integrating identity management systems for SSO (AWS IAM/Okta/Active Directory).
- Experience with disaster recovery/redundancy strategies in cloud/on-premises environments.
Monitoring & Programming Skills:
- Proficiency with leading monitoring tools (Datadog/Splunk/Prometheus/Grafana/ELK Stack/New Relic).
- Programming expertise especially in systems programming languages (Java/Kotlin/Scala) & databases (SQL Server/PostgreSQL).
CI/CD Tools Knowledge:
- Familiarity with industry-leading CI/CD tools such as Jenkins/GitHub Actions/Gitlab CI/CodePipelines/CircleCI/ArgoCD.
Performance Metrics:
- Track record achieving platform-level/end-to-end SLIs/SLOs/SLA fostering accountability.
Incident Management:
- Ability navigating complex situations leading effective post incident reviews (PIRs).
Knowledge implementing solutions reducing Mean Time Identify (MTTI)/Mean Time Resolve (MTTR).
Best Practices Implementation:
Expertise implementing best practices load balancing/fault tolerance/resource allocation maintaining service quality/effectiveness at scale.
Understanding security best practices within cloud environments.
Soft Skills:
Collaborative mindset working seamlessly across teams driving innovative solutions; exceptional communication skills in English conveying ideas/recommendations clearly.
Additional Information
As a key member of our technical team you will be expected maintain high availability ready address critical incidents ensuring continuous performance systems including being part on-call schedule support 24/7 operations.
Benefits
Full-time remote position with flexible hours; inclusive/supportive work environment valuing diversity; chance work cutting-edge technology projects making difference; opportunities continuous learning/development.
Ready to Join Us?
If you're a gadget guru who thrives on optimizing infrastructure automating all things delivering sky-high availability/performance we want hear from you! Apply now be part company values your skills fosters growth.
At Paymentology we value making difference lives people who work us live communities where operate you can look forward working diverse/global team where Paymentologists all levels play important part global mission advance world through payments make difference global scale.