Lead SRE

As the Lead SRE Engineer at Paysera, you will be responsible for ensuring the availability, performance, and security of Paysera's IT infrastructure, systems, and applications. You will work closely with our development teams and system administrators to provide guidance and support for designing and deploying applications that meet the high availability and reliability standards of Paysera. The ideal candidate is someone who is passionate about building scalable systems, possesses a deep understanding of system architecture, and is committed to improving uptime and service quality. We are on the lookout for individuals who are committed to self-improvement and are not afraid to employ innovative AI tools in their daily work to drive progress.

What you will do:

Design and implement processes that ensure the high availability and performance of Paysera's systems;

Collaborate with the engineering teams to advocate for and implement reliability practices during system design and development;

Establish proactive monitoring systems and practices to detect and prevent potential issues before they escalate;

Analyse system trends and usage to predict potential future issues;

Build and lead the incident management processes;

Lead efforts to quickly resolve any system outages, ensuring minimal impact on customers;

Drive the improvement of Mean Time To Detection (MTTD) and Mean Time To Recovery (MTTR) through effective monitoring, alerting, and response processes;

Set and work towards achieving targets for Mean Time Between Failures (MTTB) and system Service Level Agreements (SLAs) – aiming for an SLA of 99.9% for critical systems;

Regularly review and report on performance metrics, ensuring that systems are consistently meeting set standards and goals;

Conduct post-mortem reviews of any system outages, derive insights, and drive process and tooling improvements;

Foster a culture of continuous improvement within the team and across the organisation.

Expect to perform routine daily tasks using ChatGPT or a similar tool to enhance efficiency and productivity.

What we expect:

A minimum of 5 years of experience in Site Reliability Engineering, System Administration, Incident management or a closely related field;

Demonstrated experience in designing and managing the reliability of large-scale systems;

Familiarity with modern infrastructure technologies and deployment processes;

Strong proficiency in monitoring tools and methodologies: ELK, Grafana, New Relic, Datadog, Zabbix;

Strong knowledge of networking and security protocols such as TCP/IP, HTTP/S, SSL, TLS, etc;

Strong experience with containerisation technologies such as Docker and Kubernetes;

Strong problem-solving skills with a proactive approach to issue resolution;

Ability to work efficiently under pressure and manage multiple priorities;

Excellent communication skills, with the ability to explain complex technical issues to non-technical stakeholders;

A collaborative team player with a strong desire to mentor and share knowledge;

Fluency in English.

Proven familiarity and experience with AI tools like ChatGPT and other technologies, demonstrating a capability to seamlessly integrate these into daily tasks.

For candidates

If you would like to join our team, please send your CV with the subject "Lead SRE" to the email address [email protected]. Only selected candidates will be contacted, but we are grateful to all who send their CV.

Apply now

DETAILS

Full time

LOCATION

Poland, Ukraine, Georgia, Bulgaria, Belarus

SALARY

Depends on candidate's experience and competence


Interested?

APPLY NOW