Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to,……
Deep understanding of software engineering and processes: design patterns, algorithms, data structures, schemas and queries, system design, unit testing, code……
This role blends DevOps automation, site reliability principles, cloud architecture, and security-by-design practices to ensure operational excellence and……
Implement security best practices and ensure alignment with industry standards and regulatory requirements. Stay current with emerging technologies, tooling,……
Partner with engineering, DevOps/SRE, security, product, and business teams to translate engineering metrics into business-ready insights and drive adoption of……
You will have experience working on high-scale platforms that serve millions of users or process large volumes of real-time transactions with strict uptime and……
We are looking for a highly experienced Senior SRE with strong expertise in AWS to help design, operate, and scale the infrastructure powering our product……
They are translated in our business activity (ESG assessment, reporting, project management or IT activities), our work environment and in our responsible……
Strong Kubernetes and cloud-native ecosystem expertise. We are seeking a DevOps/SRE Engineer to build and manage CI/CD pipelines, Kubernetes infrastructure,……
We are looking for a highly motivated Observability Engineer to design, implement, and operate end to end observability solutions for modern, cloud native……
Optimize cloud cost, performance, and security. Manage CI/CD pipelines and infrastructure automation. Handle incident management and production support.…
You will design, build, and operate cloud-native infrastructure that supports ML model serving, data pipelines, and developer platforms across AWS, GCP, and……
Mentor and guide junior engineers while driving continuous service improvement. We are seeking an experienced Cloud Operations Lead with a strong background in……
Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to,……
On-call Support for L1,L2 tickets. Own environment setup and day-to-day operations for services on Kubernetes, design and maintain CI/CD and GitOps workflows,……
Contribute to scalable and cost-aware cloud infrastructure design alongside senior engineers. Proven experience with debugging production issues, improving……
Own environment setup and day-to-day operations for services on Kubernetes, design and maintain CI/CD and GitOps workflows, implement observability stacks, and……
As a DevOps & SRE Platform Engineer, you will be responsible for designing, implementing, and maintaining the infrastructure and tools necessary for the……
Proven ability to design and maintain automation and integration tests for distributed cloud-native systems. Formal SRE or cloud certifications.…
As a Cloud Platform Engineer – Container, you will be responsible for the design, development, deployment, operation, and continuous improvement of cloud native……
We are seeking an experienced – DevOps & Cloud Platform Engineering to partner closely with senior engineering leadership and drive cloud‑native DevOps……
Execute creative solutions for design, development, and technical troubleshooting. Uses enterprise-authorized AI capabilities within the work environment to……
Modernization without reliability is just a faster way to fail. At DuskByte, our "Risk-First" engineering means that Stability is our North Star. As a Platform Reliability Engineer, you are the guardian of operational continuity. You will build the frameworks, observability, and guardrails that allow us to modernize legacy B2B SaaS platforms without risking a single second of unplanned downtime.
What You Will Do (The Role)
Error Budgeting & SLOs
Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure we move as fast as possible without compromising reliability.
Chaos Engineering
Proactively test system resilience by injecting controlled failures into modernized environments to identify "hidden" technical debt.
Incident Management & Post-Mortems
Lead the "Blameless Post-Mortem" culture, turning every production hiccup into a permanent architectural fix.
Observability Architecture
Build world-class monitoring stacks (Prometheus, Grafana, Datadog) that provide deep-tissue visibility into legacy and modern hybrid systems.
Automated Guardrails
Develop "Self-Healing" infrastructure that automatically scales, rolls back, or isolates failing components across AWS, GCP, and Azure.
The SRE Tech Stack
You are the master of the "Safety Net"
Orchestration
Kubernetes (EKS/GKE/AKS), Docker, Service Meshes (Istio/Linkerd)
Observability
Prometheus, Grafana, ELK Stack, New Relic, Datadog
You think in "nines" (99.99%). You are obsessed with edge cases and race conditions.
Automation Over Action
You hate manual tasks. If you have to do something twice, you write a script to do it forever.
The Calm Architect
You are at your best when things are breaking. You have the discipline to follow a runbook while the "fire" is being put out.
Experience
8+ years in DevOps or SRE roles, with a deep understanding of distributed systems and cloud-native safety patterns.
Why This Role is Unique at Duskbyte
You aren't just "maintaining" a server. You are the high-level consultant who tells the development team when they are moving too fast. You have the authority to halt a deployment if it doesn't meet our Risk-First standards. You are the reason our enterprise clients sleep soundly at night.