Careers · Product

The PM at Nilo is not what you think.

Not an executor. Not a roadmap manager. Not a ticket writer. A systems thinker with commercial instincts — the person who decides what not to build when capacity is infinite.

"

Most PMs were never actually bottlenecked by execution. They were bottlenecked by taste and judgment. Team capacity was a governor that prevented bad ideas from shipping. Remove that governor and you discover who was driving and who was just steering.

— Head of Product, Gemini

This role

Is NOT

—Executor
—Roadmap manager
—Ticket writer
—PRD-writer
—Stakeholder-manager

+Systems thinker
+Commercial operator
+Killer of bets
+Owner of P&L consequence
+Writer of short bets

The Booking.com-era PM was a coordination function — PRDs, sprints, tickets, stakeholder alignment. That role is gone. What replaces it is not a faster version of the same. It's a different job: reasoning about a probabilistic system, owning a commercial outcome, and having the judgment to pick the right bet from an infinite menu.

Open role · Flagship

Product Manager — Agentic Demand

Lisbon or remote (EU hours). Reports to the two founders. One of six PM roles at Nilo — one per agent: Handshake, Demand, Fraud, Compliance, Migration, Review.

We are not hiring someone to manage a roadmap. We're hiring someone to own a commercial outcome inside a probabilistic system. The Demand agent places Nilo inside Claude, Gemini, Perplexity and traveler-side agents — which means your job is not to "drive citation rate." It is to understand how a change in Demand ripples into Handshake, Fraud and Review, to decide which second-order effect is worth the first-order win, and to defend that decision in front of a P&L.

Critical thinking is non-negotiable. You will be handed a steady supply of internal and external analyses, most of them confident, many of them wrong. The skill is separating signal from noise when the noise is well-formatted. Strategic prioritization is the muscle that atrophies fastest in an agent-era PM — because you can build anything, you will be tempted to build too much. Our best PMs kill more bets than they ship.

About documents: we don't write PRDs because PRDs were for coordinating expensive execution. Execution isn't expensive anymore. But every bet does get written down — one page, one P&L line, one kill criterion. We call it a bet memo. It's a PRD that stopped pretending.

And above all, you will have taste — the editorial judgment to know what "good" looks like when the agents can build anything. Taste is the one skill the agents cannot carry for you. It is the reason this role exists.

The outcome they own

One number: agentic-sourced GMV. Total booking value arriving through Claude, Gemini, Perplexity and traveler-side agents. Measured weekly. Target: €40M per quarter by end of year 2.

Not citation rate. Not click-through. Not model evals. GMV. If you find yourself optimizing a metric that doesn't translate to money on a hotelier's ledger, you're measuring the wrong thing.

How they think

Three mental models. Every day, every decision.

Skills are inputs; judgment is the output. These are the frames the best PMs at Nilo use to turn one into the other.

Kill, don't queue.

When execution is cheap and capacity is infinite, prioritization is no longer "what's next on the list." It's "what is worth existing at all." Our best PMs kill two bets for every one they ship. A backlog is evidence you haven't decided.

Commercial consequence, not feature output.

Every "ship it" is a P&L decision. You will name the unit-economic consequence before you ship, not after. If you can't articulate what this change does to CAC, margin, or retention in one sentence, you haven't thought hard enough yet.

Own the seams.

Six PMs, one per agent. That only works if someone owns what happens between agents. When Demand hallucinates a room that doesn't exist, when a DSA auditor asks about citation bias, when a model-vendor downgrades our partnership rate — the seams are where the real work lives. You own them without being told to.

Rhythm

A week in the life.

Think → bet → orchestrate → kill → listen. Five days. No sprints.

Mon

Map the system.

Opens the P&L dashboard. Traces last week's Demand agent changes into Handshake, Fraud and Review. Writes down three 2nd-order effects nobody flagged — because nobody was looking.

Tue

Write the bet.

A one-page memo: what we'd try, why now, the commercial consequence if we're right, the commercial consequence if we're wrong. Not a PRD. The argument, not the spec.

Wed

Orchestrate the agents.

Sets the failure modes. Writes the evals the work has to clear before ship. Defines what the agents must not do. Delegates the build itself to the stack.

Thu

Kill something.

Reviews three in-flight bets against the P&L. Kills one. Not because it failed — because even if it succeeds, it's not the thing that moves the business. Writes a 3-min Loom on why.

Fri

Listen.

Three calls: one hotelier whose share-of-bookings dropped, one whose tripled, one traveler who bounced off a citation. By video, not survey. Writes what they actually said — not what it sounded like.

Processes · The centerpiece

The moment that tells you what this job actually is.

A Tuesday with the Demand agent.

The day the PM kills a feature that was objectively winning — because winning was measured wrong.

~/nilo/demand-agent — tuesday.log

07:30 | PM opens the P&L dashboard first, as every morning. Demand-sourced GMV is up 9% WoW. Looks like a win.

07:45 | Cross-checks the share-of-bookings cohort. Independent hotels' SoB is down 2.1 points. That's the moat.

08:10 | Opens LangSmith. Reads 40 last-week traces. Demand is citing chain hotels over independents in 31% of Paris queries, up from 18%.

08:40 | Checks the existing eval suite. Every eval passes — because every eval was measuring citation accuracy, not hotelier mix. A measurement gap, not a model gap.

09:20 | Writes 6 new evals: "query X, expected mix ≥70% independents." All 6 fail. Regression is confirmed.

10:05 | Traces the cause. Last week's retrieval tweak surfaced chain listings because chains have cleaner structured data — the agent optimized for what it could see, not what we wanted seen.

10:30 | Writes the kill memo. Three sentences: "Short-term GMV win is coming from the mix we don't want. The new evals will keep it from happening again. Rolling back now."

11:00 | Rolls the retrieval change back to 0% traffic. Demand-sourced GMV will likely drop back. That's fine. The model just learned we optimize for share-of-bookings on independents, not for GMV at any cost.

14:00 | Records a 4-min Loom: "Why I killed +9% GMV this morning." Shared to the team. No meeting, no deck, no retrospective.

16:30 | Drafts the next bet: re-weight independents explicitly in Demand's retrieval. Prototype in Cursor, 90 lines. Evals from this morning become the acceptance criteria.

18:00 | Closes laptop. One commercial call made, one false metric retired, six new evals added, the moat defended.

The annotation

What just happened.

Opened the P&L first. Every morning.

→ 07:30

Spotted the 2nd-order regression. Nobody flagged it.

→ 07:45

Wrote the evals the agent should have had.

→ 09:20

Killed a +9% win because it ate the moat.

→ 11:00

"The model did what we asked. We asked the wrong thing."

That's the question the PM at Nilo reaches for when a metric wins — not "is this a real lift?" but "is this the metric that's actually the business?"

Habits

What they refuse to do.

Each of these was a ritual that worked when execution was scarce and thinking was optional. It doesn't work now. If any of these feel like they'd be hard to give up, this role isn't for you.

No feature factories.

Owning a backlog is not the job. Deciding what not to build when capacity is infinite is the job. A full Linear board is a confession, not a plan.

No PRDs.

Write the bet, not the spec. A PRD is a document you write when execution is expensive. It isn't anymore. If your bet doesn't fit on one page with a P&L consequence, it isn't a bet yet.

No product decisions divorced from P&L.

Every "ship it" names the unit-economic consequence. CAC, margin, retention, share-of-bookings — one of those moves, or the decision isn't finished.

No optimizing outputs over outcomes.

The agent can drive any metric. Pick the right one, then defend it. Citation rate went up 9% while hotelier share-of-bookings dropped two points — that's not a win, that's a regression you can't see from inside the feature.

No 6-month roadmaps.

In a stack reshaped by every model release, anything beyond six weeks is a narrative. We plan in six-week horizons, review weekly, and treat any further-out commitment as a lie dressed as a gantt chart.

The practicalities

The facts, before the philosophy.

If you've read this far and you're serious, here's what you actually need to know.

Reports to

Inês Marques (CEO, ex-Mews) and Rui Tavares (CTO, ex-Stripe + Anthropic applied team).

Team you'd join

Five other PMs (one per agent: Handshake, Fraud, Compliance, Migration, Review), plus two eng leads, one designer, and a direct working relationship with the founders.

Comp

€140–180k base + 0.4–0.9% equity. Comp band published; no ranges-on-request theatre.

Location

Lisbon office or EU-hours remote. Visa + relocation covered. Four days/quarter in Lisbon minimum.

First 90 days

1.Rewrite the Demand agent's eval suite from scratch.

2.Ship one kill memo on a current bet.

3.Own the Q3 agentic-sourced GMV number (€10M target).

Interview loop

Four rounds — written memo → 90-min systems case with a founder → live eval-design session (paired, 2 hours) → video call with an actual hotelier. No takehomes beyond the memo. No culture-fit round.

What you also own — the seams.

The work that doesn't fit on any agent's roadmap. It fits on yours.

Model-vendor risk

When Anthropic changes a pricing tier or Perplexity shifts citation logic, you're the first to see it and the first to respond.

Citation-bias posture

When a DSA auditor or an NGO asks why independents surface less than chains in a given week, you have the answer and the evidence.

Incident response

When the agent confidently hallucinates a room or a price, you own the rollback, the comms to the hotelier, and the eval that prevents the recurrence.

Hotelier trust

When share-of-bookings drops for a segment, you call three of them by video the same week and write down what they said, not what it felt like.

Tools · The stack

What they actually live in.

Explicitly not Jira. Explicitly not Confluence. Explicitly not roadmap decks in Google Slides.

The P&L · Metabase

The first tab, every morning. Before evals, before traces. If you can't name the commercial state of your agent in one sentence, you don't open the next tool.

Claude Code · Cursor

Prototyping and reading code, not outsourcing it. Operator fluency is table stakes — not the job, but the entry ticket.

Braintrust

Evals are the new A/B test. You write them, you own them, you ship against them.

LangSmith

Observability on every agent run. Where the agent burned tokens and where it confidently went wrong.

Linear

Issues, not sprints. No backlog grooming. No points. No ceremonies. The board is small on purpose.

Loom + a phone

User research artifacts. Three minutes of a hotelier showing you what broke beats a 20-page research deck.

Claude / GPT, directly — as a thinking partner

Red-team of one. Systems-thinking scratch pad. Eval case generator. We use the models to stress-test our own arguments before we ship them to the team — "here's my plan, what am I missing?" is the most-used prompt at Nilo.

If any of this was uncomfortable to read,
this job is for someone else.

If you've already been working like this — unofficially, inside an org that rewards roadmaps over judgment — we'd like to meet you.

Send us a bet you killed

No CV needed. One page on something you decided not to ship will do — and why.

Or — under NDA

A one-page memo on a bet you'd kill at Nilo, based on this page. 250 words. A human reads every one. Inês or Rui (the founders) will reply within 5 working days.