View Jobs at Interswitch

Permanent

Lagos

Posted 2 years ago

JOB TITLE: Site Reliability Engineer

JOB LOCATION: Lagos
Employment Type: Permanent
Department: Technology

Advertisements

JOB DETAILS:

Manage Availability and Capacity on the Core Applications. Provide support for the Applications and ensure their optimal performance. Implement setup of new Applications in the company’s environment.

Duties and Responsibilities

Deployment of Applications
Support the deployment of Applications on the production environment
Implement projects involving Setup and deployment of new Applications and enhancement of existing applications
Automation
Implement Automations of Activities that are involved in the management of Applications.
Application Environment Management
Ensure 24×7 Availability of all Core Applications
Carry out Capacity planning to ensure Applications are always available to meet demands.
Create visibility into site health and key performance indicators of the Application Systems
Ensure up-to date patching and full compliance to security standards of the Application Systems.
Ensure up-to date documentation on all Core Applications as well as changes made
Balance feature development speed and reliability with well-defined Service Level Objectives (SLO) and Service Level Indicators (SLI)
Monitor Systems
Monitor the performance, health, and capacity of:
- Servers
- Databases
- Services
- Storage
- Network Links
Use a variety of monitoring tools like Nagios, Solarwinds, Kibana, PagerDuty, AppDynamics, etc.
Troubleshooting.
Troubleshoot reported issues, and proactively identify areas in need of optimization
Working with technical support engineers to resolve critical incidents
Create and update clear troubleshooting guides for Applications
Requests Fulfilment.
Implement Requests relevant to the operation and enhancement of the Core Processing Applications.

Qualifications

Academic Qualification(s) – Good First Degree in Computer Science / Computer Engineering or other related fields
Professional Qualification(s) – Service Management Certifications (eg ITIL) is an advantage.
Experience (Number of relevant years) – Minimum of (1) year relevant experience.

Requirements:

Expertise in Linux and Windows Operating systems and Shell scripting
Technical experience working with cloud technologies
Build and Deployment Management (Jenkins) in a CI/CD workflow
Experience with Chef, Puppet or Ansible, automating all aspects of system and server management
Good understanding of distributed systems and container technologies like Docker/Kubernetes container infrastructure and orchestration
Good understanding of SLO and SLI for Applications
Experience with DNS, Networking and High Availability solutions
Proficient in at least one of the following languages: Python, Ruby, Go Ability to work across teams to continuously analyze system performance in production, troubleshoot reported issues, and proactively identify areas in need of optimization
Previous experience with developing and driving real time monitoring solutions that provide visibility into site health and key performance indicators
Working knowledge of databases
Working understanding of Load balancing technologies.
Working understanding of IT service management (Incident, Problem, Change and Knowledge management).
Ability to work within a technical team of support engineers through day-to-day operations and critical incidents.