company logo

Polymarket

Senior Data Engineer

Apply Now

Date Posted

Department

N/A

Location

Remote

About Polymarket

Polymarket is the world's largest prediction market platform. We enable individuals to express views on real-world events by trading on outcomes across politics, economics, sports, culture, and current affairs. Built as a peer-to-peer marketplace with no centralized "house," Polymarket aggregates diverse opinions into transparent, market-based probabilities that reflect collective expectations about the future.

We're growing fast — both in terms of volume ($21B traded in 2025) and adoption as an alternative news source. Our ambition is to become a ubiquitous beacon of truth in global media and we need your help adding fuel to the fire.

About the Role

Polymarket is looking for a Senior Data Engineer to help scale our data platform to support the next phase of the company's growth. Every trade, position, and resolution flows through a data layer that powers our entire user-facing product – leaderboards, PnL, activity feeds, positions, volume analytics – with sub-100ms latency targets at crypto scale.

The team is small and high-ownership. We need a senior operator who can drive entire workloads end-to-end, not just execute on a defined roadmap. You'll be hands-on with design, architecture, and delivery. Your mandate: make Polymarket's analytics and serving data layer something the rest of the company can build on without thinking. Schema drift, parity gaps under migration, latency regressions, and cost blowouts all land on this role. When a product team asks "can the data layer answer this query under X ms at Y $/month," you're the one who can say yes confidently, or redesign the slice that makes it possible.

What You'll Do

  • Own the OLAP analytics layer. Drive our columnar warehouse environment end-to-end: raw event ingestion → cleaned facts and dimensions → business aggregates → API-serving views. You'll own materialized view design, refresh cadences, dictionary catalog, query planning, and cost optimization.

  • Partner on the OLTP serving layer. Work closely with the team on high-write serving tables that back our product APIs – partition strategy, indexing, trigger pipelines, autovacuum tuning, and bloat monitoring – with sub-100ms read-path discipline.

  • Shape streaming and data lake infrastructure. Define Kafka topic schema contracts, evolve the S3 lake layout with modern table formats, and contribute to parity-validation tooling that guards data correctness under migration pressure.

  • Design data models at scale. Work with event-sourced, append-mostly data with chain-reorg semantics. Design the derivative analytics – PnL, realized/unrealized position tracking, cohort metrics – and formalize ownership boundaries between upstream ingestion and downstream analytics.

  • Coordinate across teams. Negotiate schema contracts with the warehouse-owning team and downstream consumers including frontend, notifications, and third-party integrators.

What We're Looking For

  • 5+ years of data engineering on production systems serving real users at scale

  • Deep knowledge of OLTP/OLAP split architectures: you know when a row store wins, when a column store wins, and when to use both

  • Columnar warehouse expertise: ClickHouse strongly preferred; Snowflake, BigQuery, Redshift, or Apache Pinot accepted if fundamentals are solid

  • Data lake experience: Parquet, Iceberg (or Delta/Hudi), compaction strategies, S3 layout discipline

  • Streaming pipeline experience: Kafka, exactly-once vs. at-least-once reasoning, backpressure, consumer-group patterns, schema evolution

  • Strong data modeling fundamentals: star/snowflake, SCD patterns, CDC, idempotent event sourcing, dimensional vs. event-log tradeoffs

  • PostgreSQL at scale: partitioning, index design, autovacuum/bloat remediation, query planning, CDC triggers vs. logical replication

  • SQL fluency at warehouse scale: window functions, CTEs, dictionary-based enrichment, dialect specifics

  • Distributed systems reasoning: consistency models, event ordering, replay semantics, write-once vs. mutable state, reorg handling

  • (Plus) EVM indexing experience: rindexer, subgraphs, or comparable – this shortens ramp considerably

  • (Plus) Rust: you'll touch indexer and validation tooling codebases; comfortable reading and contributing

  • (Plus) Domain knowledge in DeFi, prediction markets, or order-book systems

  • (Plus) Observability and SLO thinking: Prometheus metrics design, dashboard discipline, alert-fatigue avoidance

  • (Plus) Python for SQL tooling, ad-hoc analysis, and one-off migrations

  • (Plus) Track record shipping a platform migration or greenfield data stack under a hard deadline

Benefits

  • Competitive salary & equity

  • Unlimited PTO

  • Full Health, Vision, & Dental coverage

  • 401k match

  • Hardware setup: new MacBook Pro, big display, & accessories

Interested in this job?
Apply Now