RAG assistants on private data
Question-answering and assistant systems grounded in your knowledge base, contracts, SOPs, product manuals or customer history. With citations, freshness signals and red-team test sets — not just "ChatGPT for X".
We build AI and ML systems that earn their keep — RAG assistants on your private data, computer-vision pipelines for production lines, document automation, demand forecasting, fraud and anomaly detection, and LLM integration into existing software. Engineered with evaluation harnesses, observability, cost control and EU data residency. No hand-wave demos.
Six recurring shapes of applied AI work for Irish and UK operators. Every engagement starts with the same question — what would change if this worked? — and every system ships with the evaluation harness that proves it does.
Question-answering and assistant systems grounded in your knowledge base, contracts, SOPs, product manuals or customer history. With citations, freshness signals and red-team test sets — not just "ChatGPT for X".
Defect detection, label verification, count-and-weigh, occupancy sensing for hospitality. Trained on your imagery, deployed at the edge or via private inference, with explainable decisions.
Invoices, COAs, contracts, scanned PDFs — extracted to structured data with confidence scoring and human-in-the-loop review. Especially valuable for food producers and professional-services firms.
Time-series forecasting for stock, production capacity, hospitality occupancy and cashflow. Built with the discipline of a real forecasting harness — backtested, MAPE-honest, not vibes.
Transaction anomaly detection for ecommerce, login-anomaly for B2B portals, supplier-onboarding red-flag screening. Tuned for low false-positive rates so ops teams actually trust the alerts.
Drafting, summarisation, classification and routing inside the tools your team already uses. With prompt versioning, cost dashboards, model-fallback chains and refusal handling.
We've seen too many AI proof-of-concepts that demo well, ship slow, and silently degrade in production. Our delivery treats AI features like any other software feature — but with extra rigour where the failure mode is "confidently wrong".
Before a single LLM call ships, we build the eval set — real prompts, real expected outputs, real scoring. CI gates the deploy on regression.
Every inference call logged with prompt, output, cost, latency and version. Cost dashboards, model-fallback chains, refusal-rate alerts — visible day one.
Budget caps per tenant. Cached embeddings. Cheap-model-first routing. We've seen six-figure cloud bills from sloppy AI features; you won't have one.
EU regions by default. Vendor selection biased toward EU-based or self-host-friendly options. PII redaction, retention policies, model-vendor data-sharing review.
Established operators with a real workflow, real data, and a specific bottleneck where AI looks like it might earn its keep — invoice processing, retailer-COA generation, defect detection on a production line, internal Q&A across years of SOPs, fraud screening on B2B sign-ups.
We're not the right fit if you want to chase the model-of-the-month, build a thin GPT wrapper to raise, or treat AI as the marketing story. Our credibility runs through delivery, not demos.
We're model-agnostic by design. We use GPT-5 / Claude / Gemini for general-purpose LLM workloads, open-weight models (Llama, Mistral) for cost or data-residency reasons, and bespoke fine-tuned models where the use case justifies it. For vision we use Florence, YOLO, and custom-trained models. Selection follows the use case, the data-residency constraint and the unit economics — not the press cycle.
EU residency is the default. We bias toward EU-hosted inference (Azure EU regions, AWS Frankfurt / Dublin) and EU-friendly vendors, with PII redaction at the boundary where US-hosted vendors are unavoidable. For sensitive workloads we self-host open-weight models on Hetzner / private GPU. Data-sharing terms with model vendors are reviewed before any client data touches them.
Three layers. First, an evaluation harness with real prompts and expected outputs that gates every deploy. Second, confidence-scored outputs with human-in-the-loop on low-confidence cases. Third, observability — every call logged with citations, sources and refusal-rate metrics so degradation shows up before users complain.
Discovery is one paid week, typically €4k–€8k, that lands a concrete scope, success criteria and a fixed-price phase-1 quote. Phase-1 pilots usually run €20k–€60k for a working production-shipped feature. Larger builds (vision pipelines, multi-tenant LLM systems) range from €60k–€300k depending on data volume, evaluation rigour and compliance scope. Inference costs are passed through transparently or estimated up-front.
Yes — that's the most common shape. We add LLM features inside the apps your team already uses (your CRM, your ERP, your internal tooling, your Shopify admin, your support inbox) rather than building a parallel AI product nobody opens.
Both — depending on the use case. For most LLM workloads, modern API models plus good RAG plus careful prompts outperform fine-tuning. For vision, classification and forecasting we routinely train custom models on client data. We'll never recommend fine-tuning when it isn't earning its keep.
Tell us the workflow. 20-minute discovery call. We'll be honest about whether AI is the right tool here.
Book a discovery callWe use only strictly necessary cookies to keep this site working (e.g. your admin session, your consent choice). We do not run analytics or advertising trackers without your permission. See our privacy notice.