Full Time

Lagos

Posted 2 years ago

JOB TITLE: Site Reliability Engineer

JOB LOCATION: Lekki Phase 1, Lagos
Employment Type: Full-time

JOB DETAILS:

Kudi software engineers build solutions that will forever change the face of finance and banking in Africa by bringing affordable banking services to the doorstep of people across the continent.
We’re looking for engineers that can bring fresh ideas and experience to the table from all areas of expertise including distributed system design, mobile development, systems architecture, networking, security and more.
As a Site Reliability Engineer, you will view operations as a software problem and use programming and automation extensively to complete operations tasks including configuring, deploying and provisioning applications and dependencies across all environments.
You will be responsible for ensuring our services and applications are always consistently and reliably serving customers.
You will be part of an operations team which works closely with software engineers using DevOps processes and principals to quickly and reliably deliver value to customers.
You will react in real time to production incidents and work to contain and resolve them as quickly as possible.
You will build and maintain CI pipelines which entirely automate the build, test and deployment of all software changes throughout the organisation.

About the Position

Ensure your team is immediately aware of production errors and prioritizes their repair.
Provide architectural input to the teams’ development process from an operations and infrastructure POV, including but not limited to monitoring, alerting, persistence, tradeoffs given the state the available hardware, etc.
Provision cluster resources, repositories, CI/CD pipelines, and credentials for your responsible team and systems to consume.
Providing updates to the entire company during outages and downtime, scheduled maintenance and more in a professional, respectful, and timely manner.
Strive to work at the highest standards possible along with the rest of your team.

About You

Bachelor’s Degree or Higher in STEM courses.
3 years working as a software engineer/site reliability engineer professionally.
3 years developing Python + Linux/Mac/Unix environments + git professionally.
3 years working with Linux/Unix user environments, e.g. bash, grep, awk, sed, etc.
2 years of experience working with cloud infrastructure, e.g GCP, AWS.
2 years working with CI/CD tools, e.g. Jenkins, CircleCI, TravisCI, Semaphore.
2 years working with SQL and NoSQL databases, e.g. PostgreSQL, Cassandra, MongoDB.
2 years working with code as infrastructure tools such as Terraform, Ansible, Saltstack, Chef, Puppet.
Solid knowledge and experience in networking, e.g. HTTP, TCP, UDP, DNS, VPN ( IPSec, Wireguard), routing, firewalls, etc.
Solid knowledge and experience in encryption and security, e.g. AES, ECC, PKCS, PKI, OpenSSL, JWT.
Experience with Linux system administration, e.g. systems, iptables, top, stat commands, kernel tuning, user management.
Experience working with containers & container orchestration, e.g. Docker, Kubernetes.
Experience with logging, monitoring, and incident management tools, e.g. Prometheus, Grafana, Cloud Logging, Opsgenie, Pagerduty.
Experience working with Web Servers/Load Balancers, e.g. Nginx, Apache, HAProxy.
Love for automation.
Ability and willingness to pick up new technologies quickly and be productive.

Nice to have:

Multilingual (programming) skills, in particular Python, Java, Javascript/Typescript, Golang.
Experience with Bazel.
Experience with identity and access management solutions eg. Keycloak.
Experience implementing PCI DSS, ISO 27001, ISO 22301 policies/standards.
Experience with Google BigTable.
Experience managing Github organizations and repositories.

Application Closing Date
Not Specified.