BTSE
ML Engineer / AI Platform Lead
Apply Now
Date Posted
Department
Management Office
Location
Taipei
About BTSE:
彼特思方舟 is a specialized service provider dedicated to delivering a full spectrum of front-office and back-office support solutions, each of which are tailored to the unique needs of global financial technology firms.
彼特思方舟 is engaged by BTSE Group to offer several key positions, enabling the delivery of cutting-edge technology and tailored solutions that meet the evolving demands of the fintech industry in a competitive global market.
BTSE Group is a leading global fintech and blockchain company that is committed to building innovative technology and infrastructure. BTSE empowers businesses and corporate clients with the advanced tools they need to excel in a rapidly evolving and competitive market. BTSE has pioneered numerous trading technologies that have been widely adopted across the industry, setting new benchmarks for innovation, performance, and security in fintech. BTSE’s diverse business lines serve both retail (B2C) customers and institutional (B2B) clients, enabling them to launch, operate, and scale fintech businesses. BTSE is seeking ambitious, motivated professionals to join our B2C and B2B teams.
About the opportunity:
You own the AI core: model serving, the retrieval-augmented generation (RAG) pipeline, prompt engineering, and the feedback-to-training pipeline. In Phase 1, you make the base model perform as well as possible through context engineering — system prompts, few-shot exemplars, and retrieval optimisation — without modifying model weights. You also design the custom model training workflow so that enterprise clients can train their own fine-tuned models in Phase 2. This is the highest-leverage individual contributor role on the founding team.
Responsibilities
- Deploy and optimise a large language model for production inference: quantisation, continuous batching, low-latency serving.
- Build the RAG pipeline: document chunking, embedding generation, vector storage, cross-encoder reranking, and context assembly optimised for a 128K-token context window.
- Build the context layer: per-tenant system prompts, dynamically retrieved few-shot exemplars, task routing (classifying incoming requests to the right prompt configuration).
- Build defensive output parsing: structured JSON output from an unmodified base model with graceful fallbacks.
- Design and implement the feedback collection pipeline: capturing user corrections and ratings, automatically generating training data candidates for future fine-tuning.
- Design the custom model training workflow: tenant-scoped LoRA training on client-specific data, model evaluation, A/B testing, and isolated deployment.
- Monitor and improve inference quality: parsing failure rates, citation accuracy, hallucination rates, latency — all tracked per tenant.
- Iterate on prompts daily with the domain expert during the pilot phase.
Requirements
- 5+ years ML engineering; 2+ years working with large language models in production.
- Hands-on experience with LLM serving frameworks (vLLM, TGI, or equivalent).
- Deep experience building RAG pipelines: chunking strategies, embedding models, vector databases, reranking.
- Strong prompt engineering skills for production applications — you know how to make a base model produce consistent, structured, high-quality output.
- Python: PyTorch, Transformers, FastAPI.
- Familiar with LoRA/QLoRA fine-tuning workflows.
Nice to have
- Experience building multi-tenant ML serving infrastructure.
- Experience with financial or crypto AI applications.
- Experience with cross-encoder reranking models (DeBERTa or similar).
- Understanding of data isolation requirements for ML training pipelines.
#LI-MC1
Interested in this job?
Apply for this position
Date Posted
Department
Management Office
Salary
N/A