How I built Piso Propio
The technical story behind Piso Propio: unifying dozens of official Spanish protected-housing sources with scraping, AI extraction, and an eligibility engine.
My girlfriend is from Esplugues de Llobregat, just outside Barcelona. A few months ago she set out to buy her first home, and protected housing was the best option for her. The hard part was making sense of the bureaucracy: do I qualify, what does it cost, when does the application open and when does it close?
The information isn't missing. It's there, just spread across dozens of official websites, registers, and PDFs that feel designed for nobody to actually read. Every town hall publishes differently, every metro operator has its own portal, and the deadlines slip by while you're still trying to work out where things stand.
So I started building Piso Propio: a tool for Spain's protected-housing market that tells you which offers you qualify for, what they cost, when they close, and your real odds of actually getting one. For now it covers 13 regions, pulls together 22 official sources, and watches 228 offers.
Here's how it works under the hood.
The hard part isn't the app, it's the data
The frontend was the least difficult part. The hard part was taking all that mess of official information and turning it into something you can actually query.
Protected housing in Spain doesn't have one place to look. There are the registers, like Barcelona's RSHPOB or the Registre de Sol·licitants de Catalunya. There are the metro operators, like AMB/IMPSOL and Habitatge Metròpolis Barcelona. There are the individual municipal offices. And on top of all that there are the official gazettes (BOJA, DOCM, BOCM, DOG, BOIB, BOPV), where the calls show up buried among a thousand other things.
All told I've got around 39 sources defined, 22 of them running today. No two are alike. I started with just Barcelona and its metro area, and slowly added more until I hit 13 regions. Each new region is, basically, starting from scratch with another format.
All of it runs through a six-stage pipeline on BullMQ, where each stage enqueues the next:
flowchart LR
F([fetch]) --> P([parse])
P --> N([normalize])
N --> M([match])
M --> A([notify])fetch pulls the raw content, parse makes sense of it, normalize maps it onto a common model, match cross-checks it against your profile, and notify warns you about deadlines and changes. There's a sixth queue, maintenance, for the dull stuff: retries, cleanup, and checking which sources stopped answering.
For the user, the last two are what matter. match compares each offer against your situation (income, how many people live with you, which municipalities you're searching), and notify warns you before a deadline passes. It sounds obvious, but half the value is right there: almost nobody loses out because they don't qualify, they lose out because they found out too late.
Most sources I handle with the cheapest thing going: an HTTP fetch and Cheerio to read the HTML. I save Playwright for the ones that seriously block bots, like AMB/IMPSOL, because spinning up a headless browser for everything is expensive. PDFs I read with unpdf.
The stage I didn't expect to eat so much of my time was normalize. The same development shows up in three sources under three different names, and you have to recognize they're the same one. I solve it with pg_trgm, the PostgreSQL extension that compares strings by similarity, plus a canonical key built from the municipality code and a slug of the title. And when the listing's HTML says one thing and the call's PDF says another, I trust the PDF: the terms are the legal document, the listing is just a hint.
My favorite part of all this is an irony. The site that kicked off the project, esplugues.cat, turned out to be the most broken of the lot: its TLS certificate chain is incomplete, so the connection just dies on its own. I had to turn on insecureTls for that one source specifically. My girlfriend's city, the one that started everything, was the hardest to scrape.
AI, where it actually pulls its weight
A protected-housing call is usually a PDF that runs several pages, in legal Catalan or Spanish, with income caps, quotas per group, deadlines, and the list of documents you have to submit. Pulling that out with regular expressions is a losing fight: you write the regex for one town hall and it breaks on the next.
This is where Claude does something none of my heuristics could match: I use the claude-opus-4-8 model to extract the structured fields from each PDF and each gazette notice. But I don't take it on faith: the output has to pass a Zod schema. The model doesn't hand me free text, it hands me an object that satisfies a contract, and if it doesn't fit I throw it out. And there's one line in the system prompt that matters a lot with legal data: "use null when a field is missing, don't guess." I'll take an empty value over an invented one any day; a wrong closing date in a housing app isn't a small thing.
There are two separate extractions, one for the PDF calls and one for the gazette notices, and I cache them by hash (the SHA-256 of the text) so the same document is never processed twice. I also didn't want this to depend on an API key: if there isn't one, extraction falls back to the local claude -p CLI, and if that's gone too, it returns null and I carry on with what I scraped from the HTML. AI helps a lot, but the system doesn't fall over without it.
The other messy front is the gazettes. Search by keyword and you get about ten times more noise than useful stuff: land tenders, sanctions, awards that were already settled. That's where Claude acts as a filter: every notice goes through an extraction that decides whether it's even a housing offer at all, and none gets published without clearing that check.
Codifying the rules: the dull, critical part
If AI extraction is the flashy bit, the eligibility engine is the dull one. And by a long way, it's what took the most effort to get right.
Qualifying for protected housing in Catalonia isn't clearing a single threshold. It depends on the IRSC, the housing regime, the municipality's zone, and how many people are in the household. The 2026 IRSC is 801.85 € a month, 9,622.18 € a year; you apply each regime's multiplier on top of that figure, and income gets weighted by two coefficients:
// weighted income = gross annual income × C[zone] × F[members]
const weightedIncome = grossAnnual * zoneCoefficient[zone] * memberFactor[members];The zone coefficient and the household-composition factor come straight out of the decree, so I've got all 947 Catalan municipalities mapped by zone and by whether they're an area of strong demand (ADFA). Some rules look like a detail until you see who they affect: if someone in the household has a disability, the calculation counts it as one more person, which raises the income cap. It's a couple of lines in the code; for that family it can be the difference between getting in or being left out.
This part is worthless if it's wrong, so I locked it down with tests: the 8 published Zona-A income caps come out to the cent, and if a refactor moves a single euro, the build complains. And I set myself one rule across the whole engine: when in doubt, don't leave anyone out. If a municipality isn't in the map, it falls back to a conservative classification. I'd rather show you one offer too many than hide one you actually qualified for.
Nothing to sell you
Looking for protected housing leaves you with two kinds of sources, and neither is on your side. On one hand, the real-estate portals and agencies: their business is selling you something, so they have every incentive to tell you what you want to hear. On the other, the official sites: they're not selling anything, but they're so buried in bureaucracy and so badly built that just understanding your situation becomes a job in itself.
Piso Propio is neither. I have nothing to sell you, and that's exactly why I can afford to be honest. The temptation is still there: a giant green "You qualify!" and move on. But qualifying isn't winning.
Protected housing is almost always allocated by lottery, and the gap between being eligible and actually getting a place can be huge. The example I use in the app is real: the Illa Glòries development in Barcelona had 11,243 admitted applications for 238 homes. So, roughly one award for every 47 applications.
So Piso Propio shows you the real odds instead of selling you hope. That's the promise and nothing more: it helps you not miss a deadline and understand where you stand. It doesn't promise you a home. Taking the triumphant tone out of the whole interface was, for me, the most important product decision.
The stack and the deploy
Underneath, the stack is fairly boring, and that's on purpose. The interesting part is the data, not the infrastructure.
- Frontend and API: TanStack Start (React 19) with tRPC v11.
- Database: PostgreSQL 16 with Drizzle, around 20 tables.
- Queues: BullMQ on Redis 7.
- Storage: MinIO, where I keep a snapshot of every fetch forever.
- Auth: Better Auth. Notifications: Resend and Web Push. Languages: Spanish, Catalan, and English.
The snapshots are worth a separate mention. Every time the system downloads an HTML page or a PDF, it saves it with its content hash. That buys me two things that ended up being key: I can re-extract everything when a better model ships, without bothering the sources again, and I can trace where any given value came from months later. In something people use to make a real decision, knowing where each piece comes from isn't up for debate.
The deploy is deliberately simple: a single Hetzner VPS with Docker Compose, managed with Coolify (caddy, web, worker, scheduler, postgres, redis, and minio). And there's a separation that lets me sleep at night: production is a read-only replica, with scraping turned off. All the collection and AI enrichment runs on my machine, and I only sync the already-verified result to production.
Built with Claude Code
I built most of it with Claude Code, leaning on a few specific skills: deep-research to make sense of the tangle of protected-housing rules before writing a line of code, impeccable so the interface wouldn't end up looking like a government form, and the SEO/GEO skill set by aaron-he-zhu for search ranking and generative-engine optimization. The AI isn't only inside the product: it was also part of how I built it.
What I'm taking away
The strangest part of building this is that the AI, the bit that sounds impressive, was nowhere near the hardest. The hard part was everything else: codifying legal rules and checking they came out exact, keeping the provenance of every value, filtering the gazette noise without letting through notices that aren't real offers, and holding the line on showing honest odds instead of inflating expectations.
Claude solved something that was nearly impossible a couple of years ago: reading ugly official PDFs and handing back data I can trust. But that trust doesn't come from the model alone, it comes from wrapping it in a schema, a cache, a fallback, and a test that breaks the build. AI pays off far more when you treat it as one more part of the system, with its limits, and not as a magic box that's always right.