CloudLinux

CloudLinux

Senior Database Reliability Engineer (DBRE) (worldwide remote)

RemoteRemotePosted 29 days ago
Full TimeSeniorRemote

See how this job matches your profile

Sign in for an AI-powered fit score, breakdown, and a tailored resume.

Sign in

Job Description

CloudLinux / TuxCare is a remote-first infrastructure and security company. More than 300 engineers build and operate products used by hosting providers, enterprises, and internal service teams worldw

Key Highlights

  • Own production PostgreSQL reliability: HA design, Patroni, PgBouncer, replication, failover, upgrades, vacuum/bloat control, query tuning, locks, indexes, capacity, backups, PITR, and restore validation.
  • Improve disaster recovery and operational evidence: tested restores, documented recovery paths, measurable RTO/RPO targets, runbooks, and safe maintenance plans.
  • Support the wider database estate: ClickHouse, MongoDB, and Redis. You will troubleshoot incidents, review access and data-safety changes, improve monitoring, and learn the production ClickHouse patterns already in use.
  • Automate DBA workflows with Ansible, Terraform/OpenTofu, GitLab CI/CD, scripts, and reproducible runbooks for provisioning, grants, backups, restores, health checks, and ownership metadata.
  • Help build DBaaS-style self-service capabilities so engineering teams can request databases, access, credentials, and operational checks with less manual DBA intervention.

Qualifications

Required Qualifications

  • Deep hands-on PostgreSQL experience in business-critical production environments, typically 5+ years or equivalent depth.
  • Strong understanding of PostgreSQL internals and operations: MVCC, WAL, transactions, locks, indexes, query planning, replication, autovacuum, bloat, major upgrades, backups, PITR, and restore testing.
  • Proven experience with highly available databases and the ability to reason about quorum, split-brain risk, failover, rollback, and recovery.
  • Strong Linux and infrastructure fundamentals: systemd, networking, storage, filesystems, CPU/memory/disk bottlenecks, TLS, DNS, firewalls, and root-cause troubleshooting.
  • Automation skills with Ansible and scripting. Terraform/OpenTofu, GitLab CI/CD, and merge-request based delivery are strong advantages.
  • Ability to support more than one database engine. You do not need to be a ClickHouse expert on day one, but you must be ready to learn it quickly and take responsibility for it.
  • Practical use of AI engineering assistants such as Claude and Codex. We expect you to use them to improve speed and quality, while personally verifying generated SQL, commands, scripts, and operational conclusions.
  • Clear written English for asynchronous work in Jira, Slack, GitLab, Slite, and runbooks.

Preferred Qualifications

  • ClickHouse operations: replication, Keeper/ZooKeeper, MergeTree engines, distributed DDL, grants, row policies, backups, query troubleshooting, and cluster recovery.
  • MongoDB replica sets and Percona Backup for MongoDB.
  • Redis/Sentinel and broker/cache failure modes.
  • Database observability, SLOs, golden signals, alert tuning, and executable incident runbooks.
  • Building internal platforms, self-service portals, or DBaaS workflows for engineering teams.

Skills & Technologies

PostgreSQLMongoDBRedisLinuxAnsibleTerraformCI/CDSQLJira

Interested in this role?

Sign in or create a free account to see how this job matches your skills, apply with one click, and let our AI tailor your resume.

Sign in to apply
AI-powered resume optimization
Save and track your applications

Job Details

Employment Type

Full Time

Experience Level

Senior

Location

Remote

Work Mode

Remote

Posted

29 days ago