Lorentz Yeung PRO

Entz
·

AI & ML interests

None yet

Recent Activity

upvoted a changelog about 4 hours ago
Protected Spaces with Public URLs
repliedto their post about 4 hours ago
London Property Market Analyst — GPT + HM Land Registry + Gradio (full pipeline breakdown) --- Built a production RAG-style chatbot over 32 million rows of official UK property transaction data. Sharing the architecture in case it's useful for others building structured-data Q&A systems. This is the latest step in a project that started in 2021 as a basic Python analyst script, grew into automatic PDF report generation, then became a fully automatic daily pipeline — and now a conversational AI anyone can query. **Live demo:** https://uk-property-app.entzai.com/ **My Space:** https://huggingface.co/spaces/Entz/uk-property-app --- ### The problem with LLMs over tabular data The naive approach — dump your CSV into the context window — breaks down fast at scale. The raw HM Land Registry file is a 4–5GB CSV covering 32 million transactions across England & Wales. I filtered it to ~1.76M London transactions (2010–2026), but even that doesn't fit in any context window, and even if it did, asking GPT to do GROUP BY in its head is asking for hallucinations. The solution: **a structured analytics layer between the LLM and the data.** --- ### Architecture ``` User query (natural language) ↓ Triage Agent (GPT) — classifies intentions. — extracts structured params: district, property type, new/old build, time window, metric (median/mean/count) ↓ Analytics Engine (pure Python + Pandas) — queries pre-aggregated Parquet files (not raw CSV) — returns structured JSON ↓ Synthesis Agent (GPT) — receives structured JSON, writes prose analysis — hard rules prevent hallucination of years, ranges, stats ↓ Chart Agent (Matplotlib) — 10 chart types: line, multi-line, stacked bar, h-bar, diverging bar, band trend, growth ranking, table — returned as base64 PNG → gr.Image on frontend ```
posted an update about 4 hours ago
London Property Market Analyst — GPT + HM Land Registry + Gradio (full pipeline breakdown) --- Built a production RAG-style chatbot over 32 million rows of official UK property transaction data. Sharing the architecture in case it's useful for others building structured-data Q&A systems. This is the latest step in a project that started in 2021 as a basic Python analyst script, grew into automatic PDF report generation, then became a fully automatic daily pipeline — and now a conversational AI anyone can query. **Live demo:** https://uk-property-app.entzai.com/ **My Space:** https://huggingface.co/spaces/Entz/uk-property-app --- ### The problem with LLMs over tabular data The naive approach — dump your CSV into the context window — breaks down fast at scale. The raw HM Land Registry file is a 4–5GB CSV covering 32 million transactions across England & Wales. I filtered it to ~1.76M London transactions (2010–2026), but even that doesn't fit in any context window, and even if it did, asking GPT to do GROUP BY in its head is asking for hallucinations. The solution: **a structured analytics layer between the LLM and the data.** --- ### Architecture ``` User query (natural language) ↓ Triage Agent (GPT) — classifies intentions. — extracts structured params: district, property type, new/old build, time window, metric (median/mean/count) ↓ Analytics Engine (pure Python + Pandas) — queries pre-aggregated Parquet files (not raw CSV) — returns structured JSON ↓ Synthesis Agent (GPT) — receives structured JSON, writes prose analysis — hard rules prevent hallucination of years, ranges, stats ↓ Chart Agent (Matplotlib) — 10 chart types: line, multi-line, stacked bar, h-bar, diverging bar, band trend, growth ranking, table — returned as base64 PNG → gr.Image on frontend ```
View all activity

Organizations

None yet