🔍 Discover →📚 Learn →🛠 Build →📢 Share →🤖 AI & ML

SQL — NHS Quickstart

📚 ANSI SQL · T-SQL (SQL Server) · PostgreSQL · DuckDB · Views · Indexing · RLS

Why SQL in the NHS

SQL is the lingua franca for NHS data: warehouses, reporting marts, and analytics tools all speak it. Writing portable SQL means the same logic can run on SQL Server in your Trust, PostgreSQL in research, or DuckDB on your laptop.

Great for: BI Analysts · Data Scientists · Developers · Data Engineers.

⚙️ 10-minute connect (pick your engine)

SQL Server (T-SQL)
PostgreSQL
DuckDB (file)

Client: SSMS or VS Code (SQL Tools / mssql).

Connection: SERVER=HOSTNAME; Trusted_Connection=Yes; Encrypt=Yes;

Named instance: SERVER=HOST\INSTANCE

Smoke test
SELECT SUSER_SNAME() AS whoami, DB_NAME() AS dbname, SYSDATETIMEOFFSET() AS now_utc;

Client: psql, pgAdmin, or VS Code (SQLTools).

URI: postgresql://user:password@host:5432/dbname

Smoke test
SELECT current_user AS whoami, current_database() AS dbname, now() AT TIME ZONE 'UTC' AS now_utc;

Client: duckdb CLI or Python package.

No server; database is a single file (nhs.duckdb).

Smoke test (CLI)
SELECT 'ok' AS status, now() AS now_utc;

🚀 “Hello NHS” queries (portable patterns)

Top practices by appointments (past 30d)

ANSI-ish SQL (portable)
T-SQL (SQL Server)
PostgreSQL
DuckDB

SELECT practice_id, COUNT(*) AS total_appointments
FROM appointments
WHERE start_time >= (CURRENT_DATE - INTERVAL '30 day')
GROUP BY practice_id
ORDER BY total_appointments DESC
FETCH FIRST 20 ROWS ONLY;

SELECT TOP (20) practice_id, COUNT(*) AS total_appointments
FROM dbo.appointments
WHERE start_time >= DATEADD(DAY, -30, SYSUTCDATETIME())
GROUP BY practice_id
ORDER BY total_appointments DESC;

SELECT practice_id, COUNT(*) AS total_appointments
FROM public.appointments
WHERE start_time >= (now() AT TIME ZONE 'UTC') - INTERVAL '30 days'
GROUP BY practice_id
ORDER BY total_appointments DESC
LIMIT 20;

SELECT practice_id, COUNT(*) AS total_appointments
FROM read_parquet('data/appointments/*.parquet')
WHERE start_time >= now() - INTERVAL 30 DAY
GROUP BY practice_id
ORDER BY total_appointments DESC
LIMIT 20;

Reusable view (attendance rate)

ANSI-ish (concept)
T-SQL
PostgreSQL
DuckDB

CREATE VIEW vw_attendance AS
SELECT practice_id,
       AVG(CASE WHEN attended THEN 1.0 ELSE 0.0 END) AS attendance_rate
FROM appointments
GROUP BY practice_id;

CREATE OR ALTER VIEW dbo.vw_Attendance AS
SELECT practice_id,
       AVG(CASE WHEN attended = 1 THEN 1.0 ELSE 0.0 END) AS attendance_rate
FROM dbo.appointments
GROUP BY practice_id;

CREATE OR REPLACE VIEW public.vw_attendance AS
SELECT practice_id,
       AVG(CASE WHEN attended THEN 1.0 ELSE 0.0 END) AS attendance_rate
FROM public.appointments
GROUP BY practice_id;

CREATE OR REPLACE VIEW vw_attendance AS
SELECT practice_id,
       AVG(CASE WHEN attended THEN 1.0 ELSE 0.0 END) AS attendance_rate
FROM read_parquet('data/appointments/*.parquet')
GROUP BY practice_id;

🔒 Parameterised access (safe from apps)

Avoid string concatenation. Use bound parameters from your app layer.

Python + SQLAlchemy
from sqlalchemy import create_engine, text
import pandas as pd

engine = create_engine("mssql+pyodbc:///?odbc_connect=...")  # or postgresql+psycopg / duckdb://
q = text("""
  SELECT practice_id, total_appointments
  FROM vw_PracticeKPI_Monthly
  WHERE month = :month
  ORDER BY total_appointments DESC
""")
df = pd.read_sql(q, engine, params={"month":"2025-07"})

⚡ Indexing 101 (make dashboards fast)

Choose indexes for WHERE, JOIN, and ORDER BY columns used by your dashboards.

SQL Server
PostgreSQL
DuckDB

-- Filter by start_time; group by practice_id
CREATE INDEX IX_appts_start_practice
ON dbo.appointments(start_time, practice_id);

-- B-tree is default; add composite index in filter order
CREATE INDEX IF NOT EXISTS appts_start_practice_idx
ON public.appointments (start_time, practice_id);

Materialisation for speed

SQL Server
PostgreSQL
DuckDB

Use a scheduled job to populate a summary table (preferred).
Indexed views can help but have trade-offs; test carefully.

CREATE MATERIALIZED VIEW mv_kpi_monthly AS
SELECT date_trunc('month', start_time)::date AS month,
       practice_id,
       COUNT(*) AS total
FROM appointments
GROUP BY 1,2;

-- refresh nightly
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_kpi_monthly;

🧭 Row-level security (RLS) & access control

Restrict users to the rows they are allowed to see.

SQL Server (predicate)
PostgreSQL (POLICY)
DuckDB

Create a predicate function that returns 1/0 for allowed rows.
Attach a security policy to the table.

-- Example sketch; adapt to your Trust context
CREATE SCHEMA sec;
GO
CREATE FUNCTION sec.fn_limit_practice(@practice_id nvarchar(20))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN SELECT 1 AS allow
WHERE @practice_id IN (SELECT practice_id FROM dbo.user_practices WHERE user_name = USER_NAME());
GO
CREATE SECURITY POLICY sec.policy_practice
ADD FILTER PREDICATE sec.fn_limit_practice(practice_id) ON dbo.appointments
WITH (STATE = ON);

ALTER TABLE public.appointments ENABLE ROW LEVEL SECURITY;

CREATE POLICY practice_rls ON public.appointments
USING (practice_id = current_setting('app.practice_id', true));

-- In app connection, SET app.practice_id to the user's scope.

🕒 Temporal/audit patterns

SQL Server: system-versioned temporal tables for history; or CDC if enabled by your DBA.
PostgreSQL: use audit triggers (e.g., created_at, valid_from/valid_to) or extensions (e.g., pgaudit); materialise snapshots.
DuckDB: append-only datasets with version columns; keep dated Parquet partitions.

📦 Safe extracts for analysis

Prefer columnar Parquet for extracts, regenerated on a schedule.

Extract → Parquet (any engine)
import pandas as pd
# df = pd.read_sql(..., engine)
df.to_parquet("out/kpi_monthly.parquet", index=False)

Point Dash/Evidence.dev at this file for fast, reproducible loads.

🛡 IG & safety checklist

Follow least-privilege: read-only roles for analysts; separate service accounts for apps.
Use parameterised queries; avoid ad-hoc concatenation from user input.
Apply small-number suppression and avoid free-text PHI in exports.
Log who ran what (e.g., via app layer) and keep a KPI definitions register.
Store connection strings in a secret store; never in code or Git.

📏 Measuring impact

Latency: query → dashboard render time; nightly refresh duration.
Stability: failed refreshes; long-running query count.
Quality: validation pass rate; divergence between view vs. source.
Reusability: number of teams using the same views/materialised outputs.

🔗 See also

See also: Python · DuckDB · Dash · Evidence.dev · Docker · GitHub · Secrets & .env

What’s next?

You’ve completed the Learn — SQL stage. Keep momentum:

Continue to Build Jump to my persona

SQL — NHS Quickstart

⚙️ 10-minute connect (pick your engine)​

🚀 “Hello NHS” queries (portable patterns)​

🔒 Parameterised access (safe from apps)​

⚡ Indexing 101 (make dashboards fast)​

🧭 Row-level security (RLS) & access control​

🕒 Temporal/audit patterns​

📦 Safe extracts for analysis​

🛡 IG & safety checklist​

📏 Measuring impact​

🔗 See also​