We are building and operating infrastructure for a broad blockchain ecosystem, including a wallet infrastructure, stablecoin products, and supporting financial services. We are looking for a senior DevOps / Infrastructure Engineer who can help design, maintain, and scale secure production environments across cloud and on-premise setups.
The role combines hands-on infrastructure engineering, operational support, reliability planning, incident response, and coordination across multiple technical and business stakeholders.
Requirements:
7+ years of professional experience in infrastructure support, infrastructure engineering, DevOps, SRE, or platform engineering.
Deep understanding of Microsoft Azure, including hands-on production experience with its core infrastructure, networking, security, compute, storage, and managed service offerings.
Experience with on-premise infrastructure, including environments with and without Kubernetes.
Strong hands-on experience with Kubernetes and containerized workloads, including production operations, troubleshooting, scaling, and maintenance.
Experience with modern delivery tooling, including GitHub or GitLab, CI/CD pipelines, Argo CD, and related automation practices.
Strong understanding of infrastructure-as-code practices, preferably with Terraform.
Solid understanding of network architecture, including segmentation, segregation, VPNs, routing, DNS, load balancing, NSGs, firewalls, private connectivity, and secure access patterns.
Experience with security practices and the ability to interpret and implement compliance requirements in infrastructure and operational processes.
Experience maintaining infrastructure for financial systems or other high-reliability, regulated, or business-critical platforms.
Strong experience with monitoring, logging, observability, alerting, and traceability using modern stacks such as Prometheus, Grafana, Loki, ELK/OpenSearch, OpenTelemetry, or similar tools.
Understanding of RTO, RPO, backup strategies, restore procedures, failover planning, and their practical implications for system design.
Proven experience participating in on-call rotations, incident response, post-incident reviews, and operational improvements.
Hands-on participation in infrastructure planning, reliability planning, disaster recovery planning, and failover testing.
Strong communication skills, with the ability to coordinate infrastructure work involving multiple teams, vendors, clients, and stakeholders with different levels of technical understanding.
Ability to identify infrastructure and operational risks early, explain trade-offs clearly, and drive mitigation actions to completion.
Responsibilities:
Design, maintain, and improve production infrastructure across Azure, on-premise, and hybrid environments.
Plan and operate Kubernetes-based and non-Kubernetes infrastructure depending on project needs.
Build and maintain CI/CD and GitOps workflows using GitHub/GitLab CI, Argo CD, and related tools.
Develop and maintain infrastructure-as-code modules for reproducible and auditable environments.
Define and maintain network, compute, storage, access control, and security configurations for production and pre-production systems.
Implement and improve monitoring, logging, alerting, tracing, and operational visibility across services and infrastructure components.
Support backup, restore, disaster recovery, and failover planning, including practical validation of RTO and RPO assumptions.
Participate in incident response, on-call escalation, root cause analysis, and follow-up remediation.
Help project teams resolve infrastructure, deployment, networking, security, and reliability issues.
Participate in infrastructure planning for new systems, releases, migrations, integrations, and production launches.
Translate compliance, security, and operational requirements into practical infrastructure changes.
Coordinate operational work involving engineering teams, delivery teams, vendors, clients, and other stakeholders.
Evaluate and introduce tools or practices that improve reliability, security, delivery speed, or operational maintainability.