← index

data pipeline · open source

UK house prices

Full historical ingest of HM Land Registry Price Paid Data, enriched with EPC energy certificates for floor area and £/m². Postcode geocoding, Grafana dashboards, and a conversational analytics agent a local LLM drives over MCP — kept current with automated monthly updates.

GitHub ↗

A full historical ingest of HM Land Registry Price Paid Data — every recorded residential sale in England and Wales since 1995, ~28 million rows — loaded into Postgres and geocoded to coordinates from OS Code-Point Open for regional and map-based analysis.

On top sits an EPC (Energy Performance Certificate) layer: ~23 million domestic certificates address-matched to sales — Land Registry has no UPRN, so the join is built on normalised address keys — adding floor area, £ per m², and energy ratings to ~68% of transactions (88–92% in recent years). Grafana dashboards drive exploration, and a local LLM powers both a monthly market summary and a conversational analytics agent shaped around what a first-time buyer actually asks — what fits my budget, is this good value, £ per m², where's the market moving — served over an MCP server with every figure coverage-gated and grounded in real numbers.

data source
HM Land Registry Price Paid Data: full historical ingest, ~28M sales from 1995 to present, plus ~23M EPC domestic certificates and OS Code-Point Open postcodes.
geocoding
OS Code-Point Open resolves each postcode to coordinates (lat/lng) for map heatmaps and regional rollups.
value enrichment
EPC certificates are address-matched to sales (no UPRN in Land Registry) to add floor area, £/m² and energy rating. Coverage is partial, so £/m² is only ever reported with its match rate — never fabricated.
dashboards
Grafana over the geocoded data: price heatmaps, YoY trends, borough/county breakdowns, and inside/outside-the-M25 views.
analytics agent
A local LLM (Qwen3-8B via llama.cpp) drives a small set of buyer-intent tools over an MCP server — budget fit, value-vs-market, £/m², market movers — answering in natural language with coverage-honest figures. A golden-question eval suite keeps its tool-routing and honesty in check.
updates
A systemd timer pulls the monthly Land Registry delta and upserts it automatically — no manual reload (EPC tops up quarterly via the developer API).
Grafana property price dashboard — inside the M25
Grafana dashboard — price trends and breakdowns inside the M25
Grafana property price dashboard — outside the M25 (Kent)
Grafana dashboard — outside the M25 (Kent)
Conversational analytics agent over MCP — buyer questions answered in natural language
PostgreSQLFastAPIGrafanallama.cppQwen3-8BMCP