SQL β NHS Quickstart
SQL is the lingua franca for NHS data: warehouses, reporting marts, and analytics tools all speak it. Writing portable SQL means the same logic can run on SQL Server in your Trust, PostgreSQL in research, or DuckDB on your laptop.
Great for: BI Analysts Β· Data Scientists Β· Developers Β· Data Engineers.
βοΈ 10-minute connect (pick your engine)β
- SQL Server (T-SQL)
- PostgreSQL
- DuckDB (file)
Client: SSMS or VS Code (SQL Tools / mssql).
Connection: SERVER=HOSTNAME; Trusted_Connection=Yes; Encrypt=Yes;
Named instance: SERVER=HOST\INSTANCE
SELECT SUSER_SNAME() AS whoami, DB_NAME() AS dbname, SYSDATETIMEOFFSET() AS now_utc;
Client: psql, pgAdmin, or VS Code (SQLTools).
URI: postgresql://user:password@host:5432/dbname
SELECT current_user AS whoami, current_database() AS dbname, now() AT TIME ZONE 'UTC' AS now_utc;
Client: duckdb CLI or Python package.
No server; database is a single file (nhs.duckdb).
SELECT 'ok' AS status, now() AS now_utc;
π βHello NHSβ queries (portable patterns)β
Top practices by appointments (past 30d)
- ANSI-ish SQL (portable)
- T-SQL (SQL Server)
- PostgreSQL
- DuckDB
SELECT practice_id, COUNT(*) AS total_appointments
FROM appointments
WHERE start_time >= (CURRENT_DATE - INTERVAL '30 day')
GROUP BY practice_id
ORDER BY total_appointments DESC
FETCH FIRST 20 ROWS ONLY;
SELECT TOP (20) practice_id, COUNT(*) AS total_appointments
FROM dbo.appointments
WHERE start_time >= DATEADD(DAY, -30, SYSUTCDATETIME())
GROUP BY practice_id
ORDER BY total_appointments DESC;
SELECT practice_id, COUNT(*) AS total_appointments
FROM public.appointments
WHERE start_time >= (now() AT TIME ZONE 'UTC') - INTERVAL '30 days'
GROUP BY practice_id
ORDER BY total_appointments DESC
LIMIT 20;
SELECT practice_id, COUNT(*) AS total_appointments
FROM read_parquet('data/appointments/*.parquet')
WHERE start_time >= now() - INTERVAL 30 DAY
GROUP BY practice_id
ORDER BY total_appointments DESC
LIMIT 20;
Reusable view (attendance rate)
- ANSI-ish (concept)
- T-SQL
- PostgreSQL
- DuckDB
CREATE VIEW vw_attendance AS
SELECT practice_id,
AVG(CASE WHEN attended THEN 1.0 ELSE 0.0 END) AS attendance_rate
FROM appointments
GROUP BY practice_id;
CREATE OR ALTER VIEW dbo.vw_Attendance AS
SELECT practice_id,
AVG(CASE WHEN attended = 1 THEN 1.0 ELSE 0.0 END) AS attendance_rate
FROM dbo.appointments
GROUP BY practice_id;
CREATE OR REPLACE VIEW public.vw_attendance AS
SELECT practice_id,
AVG(CASE WHEN attended THEN 1.0 ELSE 0.0 END) AS attendance_rate
FROM public.appointments
GROUP BY practice_id;
CREATE OR REPLACE VIEW vw_attendance AS
SELECT practice_id,
AVG(CASE WHEN attended THEN 1.0 ELSE 0.0 END) AS attendance_rate
FROM read_parquet('data/appointments/*.parquet')
GROUP BY practice_id;
π Parameterised access (safe from apps)β
Avoid string concatenation. Use bound parameters from your app layer.
from sqlalchemy import create_engine, text
import pandas as pd
engine = create_engine("mssql+pyodbc:///?odbc_connect=...") # or postgresql+psycopg / duckdb://
q = text("""
SELECT practice_id, total_appointments
FROM vw_PracticeKPI_Monthly
WHERE month = :month
ORDER BY total_appointments DESC
""")
df = pd.read_sql(q, engine, params={"month":"2025-07"})
β‘ Indexing 101 (make dashboards fast)β
Choose indexes for WHERE, JOIN, and ORDER BY columns used by your dashboards.
- SQL Server
- PostgreSQL
- DuckDB
-- Filter by start_time; group by practice_id
CREATE INDEX IX_appts_start_practice
ON dbo.appointments(start_time, practice_id);
-- B-tree is default; add composite index in filter order
CREATE INDEX IF NOT EXISTS appts_start_practice_idx
ON public.appointments (start_time, practice_id);
DuckDB scans Parquet/Arrow efficiently; use partitioning (e.g., by month) and maintain statistics. Pre-aggregate to Parquet for best performance.
Materialisation for speed
- SQL Server
- PostgreSQL
- DuckDB
- Use a scheduled job to populate a summary table (preferred).
- Indexed views can help but have trade-offs; test carefully.
CREATE MATERIALIZED VIEW mv_kpi_monthly AS
SELECT date_trunc('month', start_time)::date AS month,
practice_id,
COUNT(*) AS total
FROM appointments
GROUP BY 1,2;
-- refresh nightly
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_kpi_monthly;
Write pre-aggregated Parquet and read it directly from dashboards (see DuckDB page).
π§ Row-level security (RLS) & access controlβ
Restrict users to the rows they are allowed to see.
- SQL Server (predicate)
- PostgreSQL (POLICY)
- DuckDB
- Create a predicate function that returns 1/0 for allowed rows.
- Attach a security policy to the table.
-- Example sketch; adapt to your Trust context
CREATE SCHEMA sec;
GO
CREATE FUNCTION sec.fn_limit_practice(@practice_id nvarchar(20))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN SELECT 1 AS allow
WHERE @practice_id IN (SELECT practice_id FROM dbo.user_practices WHERE user_name = USER_NAME());
GO
CREATE SECURITY POLICY sec.policy_practice
ADD FILTER PREDICATE sec.fn_limit_practice(practice_id) ON dbo.appointments
WITH (STATE = ON);
ALTER TABLE public.appointments ENABLE ROW LEVEL SECURITY;
CREATE POLICY practice_rls ON public.appointments
USING (practice_id = current_setting('app.practice_id', true));
-- In app connection, SET app.practice_id to the user's scope.
No built-in RLS; control file permissions and expose only pre-aggregated Parquet to analysts/dashboards.
π Temporal/audit patternsβ
- SQL Server: system-versioned temporal tables for history; or CDC if enabled by your DBA.
- PostgreSQL: use audit triggers (e.g.,
created_at,valid_from/valid_to) or extensions (e.g., pgaudit); materialise snapshots. - DuckDB: append-only datasets with version columns; keep dated Parquet partitions.
π¦ Safe extracts for analysisβ
Prefer columnar Parquet for extracts, regenerated on a schedule.
import pandas as pd
# df = pd.read_sql(..., engine)
df.to_parquet("out/kpi_monthly.parquet", index=False)
Point Dash/Evidence.dev at this file for fast, reproducible loads.
π‘ IG & safety checklistβ
- Follow least-privilege: read-only roles for analysts; separate service accounts for apps.
- Use parameterised queries; avoid ad-hoc concatenation from user input.
- Apply small-number suppression and avoid free-text PHI in exports.
- Log who ran what (e.g., via app layer) and keep a KPI definitions register.
- Store connection strings in a secret store; never in code or Git.
π Measuring impactβ
- Latency: query β dashboard render time; nightly refresh duration.
- Stability: failed refreshes; long-running query count.
- Quality: validation pass rate; divergence between view vs. source.
- Reusability: number of teams using the same views/materialised outputs.
π See alsoβ
Whatβs next?
Youβve completed the Learn β SQL stage. Keep momentum: