BTSE
DevOps / Infrastructure Engineer
Apply Now
Date Posted
Department
Management Office
Location
Taipei
About BTSE:
彼特思方舟 is a specialized service provider dedicated to delivering a full spectrum of front-office and back-office support solutions, each of which are tailored to the unique needs of global financial technology firms.
彼特思方舟 is engaged by BTSE Group to offer several key positions, enabling the delivery of cutting-edge technology and tailored solutions that meet the evolving demands of the fintech industry in a competitive global market.
BTSE Group is a leading global fintech and blockchain company that is committed to building innovative technology and infrastructure. BTSE empowers businesses and corporate clients with the advanced tools they need to excel in a rapidly evolving and competitive market. BTSE has pioneered numerous trading technologies that have been widely adopted across the industry, setting new benchmarks for innovation, performance, and security in fintech. BTSE’s diverse business lines serve both retail (B2C) customers and institutional (B2B) clients, enabling them to launch, operate, and scale fintech businesses. BTSE is seeking ambitious, motivated professionals to join our B2C and B2B teams.
About the opportunity:
You keep the platform running reliably. For the first client operating in crypto markets, this means 24/7 uptime with zero maintenance windows. You build multi-tenant Kubernetes infrastructure with per-tenant namespace isolation, manage GPU scheduling for AI model serving, set up CI/CD for rapid iteration, and own monitoring and on-call. You also automate tenant provisioning so that scaling from one client to ten is an operational exercise, not an engineering project.
Responsibilities
- Set up a multi-tenant Kubernetes cluster: shared services namespace, per-tenant namespaces for isolated workloads, GPU node pools for model inference.
- Build CI/CD pipeline: source control → container build → automated deployment with zero-downtime rolling updates.
- Configure GPU management: scheduling, resource quotas per tenant, device plugins.
- Set up comprehensive monitoring: per-tenant metrics, SLA tracking, data pipeline health, GPU utilisation, API latency percentiles, WebSocket connection stability.
- Implement backup and disaster recovery: cross-region replication, automated database backups.
- Build tenant provisioning automation: scripted creation of new tenant namespaces, storage, network policies, and service accounts.
- Security hardening: network policies between namespaces, vulnerability scanning, audit logging.
- 24/7 on-call during initial pilot (rotating with Tech Lead).
Requirements
- 4+ years DevOps/SRE; Kubernetes cluster operations including multi-tenant patterns.
- GPU workloads on Kubernetes (GPU Operator, device plugins, resource scheduling).
- CI/CD pipelines: GitHub Actions, ArgoCD or FluxCD.
- Terraform IaC.
- On-call experience and incident management.
Nice to have
- Kubernetes namespace isolation and network policies for multi-tenancy.
- 24/7 systems experience (crypto, gaming, or global SaaS).
- Monitoring WebSocket-heavy architectures and streaming data pipelines.
- GPU cluster management for ML inference.
#LI-MC1
Interested in this job?
Apply for this position
Date Posted
Department
Management Office
Salary
N/A