Job Description
We are seeking a highly capable Data Platform Engineer to build and maintain a secure, scalable, and air-gapped-compatible data pipeline that supports multi-tenant ingestion, transformation, warehousing, and dashboarding.
You’ll work across the stack—from ingesting diverse data sources, transforming them via SQL or Python tools, storing them in OLAP-optimized warehouses, to surfacing insights through customizable BI dashboards.
Key Responsibilities
1. Data Ingestion (ETL Engine)
- Design pipelines for files (CSV, Excel, PDF), APIs (REST/XML/GraphQL), and JDBC databases
- Use Airflow, Apache NiFi, Kafka, Redis Streams for ingestion
- Build custom Python connectors and process binary data (PyPDF2, protobuf, OpenCV, Tesseract)
- Ensure secure storage using MinIO, GlusterFS, or other vaults
2. Transformation Layer
- Use dbt-core, Dask, Pandas, and Apache Spark for modular & scalable transformation
- Implement optional validation with Great Expectations
- Optimize pipelines for latency, memory, and parallelism
3. Data Warehouse (On-Prem)
- Deploy ClickHouse, Apache Druid, PostgreSQL, Greenplum, or DuckDB
- Design multi-tenant schema isolation and OLAP/OLTP strategies
- Ensure performance tuning and data consistency
4. BI Dashboards
- Configure Metabase, Superset, Redash, and Grafana dashboards
- Support per-tenant embedding and scheduled PDF/email reports
- Define KPIs for marketing, ops, and exec teams
Required Skills & Qualifications
- 5+ years with ETL tools, BI platforms, and Python/SQL
- Experience with NiFi, Airflow, Kafka, Redis Streams
- Hands-on with ClickHouse, Druid, PostgreSQL, Greenplum, or DuckDB
- Strong data modeling and optimization expertise
- Comfortable in air-gapped or on-premises environments
- Familiar with RBAC, data security, and governance
Nice-to-Have Skills
- Experience in regulated domains: BFSI, telecom, or government
- Familiarity with Docker/Podman and K8s/OpenShift
- Experience with Great Expectations and embedded dashboards
What We Offer
- Opportunity to build an open-source-first data platform
- Collaborative team culture focused on secure and scalable systems
- Competitive compensation and long-term growth potential