Description
Platform Reliability and Scalability:
- Build software that enhances Paymentology services' scalability and reliability.
- Ensure platform services meet required uptime and service quality levels.
- Contribute to the design of reliable cloud infrastructure and implement reusable cloud-uptime components as code.
- Regularly review and optimise SRE practices, tools, and methodologies to enhance overall system reliability and team efficiency.
Observability and Automation:
- Contribute to the design, implementation, and maintenance of observability and monitoring solutions to track the platform health, its cost-effectiveness, the reliability, and scalability, and identify potential issues which can be fed back to product and platform engineering in a continuous improvement loop.
- Develop and implement automation scripts and tools to streamline operations and reduce manual interventions.
- Enable product teams to self-serve by participating in the development of a developer platform.
Production Issue Resolution:
- Play an active role with the incident response teams, diagnosing and resolving production issues quickly to minimise downtime.
Standards Compliance:
- Support product teams in building services that adhere to our security ΠΈ quality standards.
Cross-team Collaboration: