Founding Data Engineer

About us

Our mission

Parsio makes the cost locked inside engineering decisions visible and computable, so industrial manufacturers can optimize what they buy and stay competitive at the global level.

Our story

Industrial procurement is worth trillions of euros, yet it’s one of the last great domains still run on PDFs and spreadsheets. Mid-market industrial manufacturers are world leaders in their niches and the backbone of European competitiveness — but the procurement software they’ve been handed was never designed for the engineering complexity of industry. Parsio exists to change that.

Our ambition

What we’re building wasn’t possible six months ago. The bottleneck was never insight — it was the capacity to access that knowledge. That constraint just disappeared. There is an 18-month window to build the category winner in AI-native procurement intelligence for industry.

Our team

A small, synchronous, trust-first team. No org chart between you and the founders: daily standup, decisions made out loud. On your scope, we expect ownership, autonomy and initiative.

Why we’re hiring for this role

Manufacturing data — design decisions, over-specs, and the rest — lives in PDFs, scanned drawings and heterogeneous spreadsheets. Yet 70–80% of product cost is locked in that engineering knowledge. We built Parsio to make it understandable, auditable and challengeable for procurement teams. And every time, it starts with the data: turning messy industrial files into typed, auditable, decision-ready data.

What you’ll do

In a nutshell. You own the pillars of our data stack: extraction, structuring the data model, and enrichment. AI lets us turn messy, sparse files into computable data at scale.

Your main responsibilities:

Data extraction. Build an LLM & OCR pipeline that reads clients’ PDF drawings and CAD files the way their best engineers would — pulling the right specs from structured cartouches and unstructured schematics, doing in a few hours what would take their engineers hundreds.
Data structuring & ELT. Normalize the extracted client data into our ontology / data model, and evolve that ontology so it can precisely represent every existing industrial process.
Data enrichment. Build the scraping stack that enriches the model with external signals — starting with material and energy price time series, and anything else that drives our clients’ running costs.
Own the architecture on your scope. The orchestrator choice (Airflow, Dagster, Prefect, or something else) is deliberately left open. It’s one of the first calls we expect you to make.

Who you’ll work with

As our first founding engineer, you report to the Co-founder & CTO and work directly with the CEO — no layers in between. Decisions are made together (the what and the when), then we ship fast (the how). On your scope — extraction, ELT, or enrichment — you own the data architecture calls.

The stack

The data stack is autonomous (its own repo and DB), structuring data through successive schemas of increasing quality (raw → stg_ → marts) before the marts schema is cloned into the app DB.

Data: Python, dbt, dlt, PydanticAI + Logfire, GCP Vertex (Gemini, Claude), FreeCAD, Postgres / Neon. Orchestrator deliberately left open.

App: React, Hono (Drizzle), a separate FastAPI + PydanticAI LLM backend, GCP Cloud Run.

We have an opinion on every brick, but none is set in stone.

What success looks like

At 30 days. You own a pillar of the data stack (extraction, ELT/ontology, or enrichment) and have shipped your first improvements to production.
At 90 days. The pipelines you own run reliably end-to-end, from file reception to typed data in PostgreSQL, with the monitoring, tests and auditability you put in place.
At 6 months. Your work measurably raises extraction quality and ontology coverage, directly enabling sourcing decisions and savings for live clients. You’re trusted to make the architectural calls on your scope.

Your profile

Must-haves

Solid SQL and Python. You read and write production-grade SQL and Python.
Comfortable with dbt — or ready to get there in a month. Models, tests, contracts, slim CI.
Experience with relational databases.
Pragmatic with LLM tooling. Claude Code (or equivalent) as your primary coding interface, not a curiosity — you know when to trust it and when not to.
Operational French and English (international clients).
5+ years in tech, 3+ as a data engineer.

Nice-to-haves

Experience with the agentic stack (PydanticAI or equivalent, LLM evals).
Familiarity with CAD / STEP files (CadQuery, FreeCAD).
You’ve shipped a typed product taxonomy or ontology.
Real industrial exposure.

This role is probably not for you if…

You want a fully defined scope with no ambiguity.
You’re uncomfortable switching between architectural decisions and hands-on, scrappy execution.
You need a large team and established processes to be effective.
You see LLM tooling as a gimmick rather than a core part of how you ship.
You’re not comfortable operating in both French and English.

Career path

We level engineering on a clear track: Data Engineer → Senior Data Engineer → Lead / Staff Data Engineer.

You join owning a pillar of the stack end-to-end. As you grow, you’ll take on broader architectural ownership across extraction, ELT and enrichment, set the engineering standards, and mentor the engineers we hire next.