Information Governance
The NHS shift to digital, preventative, community-oriented care only works when data is handled safely. This page gives IG leads a concrete, repeatable way to embed privacy, security, and auditability into open‑source projects from day one.
👤 Role snapshot​
You protect patient data and ensure compliance with NHS, legal, and ethical frameworks for data use — while enabling safe, timely access for care and analytics.
🧰 Core NHS toolchain​
- DSP Toolkit (DSPT) for organisational assurance
- DPIA/Caldicott templates & approvals workflow
- NHS Mail / M365 for secure sharing and controls
🔗 Open‑source augmentations​
Python JSON logs; SQL audit tables; optional ELK stack.
SFTP; S3/MinIO with server‑side encryption and bucket policies.
MkDocs/Docusaurus for living guidance and runbooks.
SQL functions; hash/pseudonymise; small‑number suppression.
GitHub Issue templates and PR gates.
⚙️ 90‑minute quickstart​
Goal: create an IG‑ready starter repo with a DPIA checklist, secrets setup, audit logging, masking helpers, and a secure transfer example.
1) Project guardrails (choose template style)​
- GitHub Issue Form (recommended)
- Markdown Checklist (simple)
name: IG Checklist
description: Minimum IG checks for new or updated data projects
title: "IG: <project/feature>"
labels: ["IG"]
body:
- type: checkboxes
attributes:
label: Data classification
options:
- label: No direct patient identifiers used in development data
- label: DPIA reference recorded in README
- label: Data dictionary updated (fields, types, source, owner)
- type: checkboxes
attributes:
label: Security
options:
- label: Secrets are in env/secret store (not in code or Git)
- label: Access is least-privilege (service accounts, RBAC)
- label: Audit logs enabled for read/write operations
- type: checkboxes
attributes:
label: Sharing & outputs
options:
- label: Small-number suppression applied where required
- label: Aggregated outputs only; no free-text PHI in logs
- label: Data retention and deletion plan documented
# Information Governance — Minimum Checklist
- [ ] DPIA reference: _______
- [ ] Data dictionary updated (name, type, definition, source, owner)
- [ ] Secrets in env/secret store (never in code/Git)
- [ ] Least-privilege access; service accounts only
- [ ] Audit logging enabled (read/write, dataset, actor, purpose)
- [ ] Small-number suppression in outputs
- [ ] Retention & deletion plan documented
2) Secrets & configuration​
# SQL Server
SQLSERVER_SERVER=YOURSERVER
SQLSERVER_DATABASE=NHS_Analytics
# S3/MinIO (optional)
S3_ENDPOINT=https://s3.example.local
S3_BUCKET=nhs-secure
S3_ACCESS_KEY=
S3_SECRET_KEY=
# App/API
API_KEY=rotate_me
3) Audit logging (pick Python or SQL)​
- Python JSON logs
- SQL audit table + insert
import json, logging, os, time, uuid
logger = logging.getLogger("audit")
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter('%(message)s'))
logger.addHandler(handler)
logger.setLevel(logging.INFO)
def audit(event: str, dataset: str, actor: str = "service", count: int | None = None, purpose: str = "analytics"):
entry = {
"ts": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
"event": event, # read/write/export/delete
"dataset": dataset,
"actor": actor, # service/user id (no PHI)
"purpose": purpose,
"count": count,
"corr_id": str(uuid.uuid4())
}
logger.info(json.dumps(entry))
if __name__ == "__main__":
audit("read", "dbo.vw_PracticeKPI", actor="svc-analytics", count=1203)
CREATE TABLE dbo.AuditLog(
id BIGINT IDENTITY(1,1) PRIMARY KEY,
ts DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME(),
event NVARCHAR(24) NOT NULL,
dataset NVARCHAR(128) NOT NULL,
actor NVARCHAR(128) NULL,
purpose NVARCHAR(64) NULL,
row_count INT NULL,
corr_id UNIQUEIDENTIFIER NULL
);
GO
INSERT INTO dbo.AuditLog(event,dataset,actor,purpose,row_count,corr_id)
VALUES('read','dbo.vw_PracticeKPI','svc-analytics','refresh',1203,NEWID());
4) Masking & suppression​
- SQL masking helper
- Python export check
-- Pseudonymise an identifier
SELECT HASHBYTES('SHA2_256', CAST(NHS_NUMBER AS VARBINARY(32))) AS nhs_hash, *
FROM dbo.patient_demo;
-- Partial mask of free text (if present)
SELECT practice_id, LEFT(note, 0) AS note_redacted -- store empty in exports
FROM dbo.notes_export;
-- Small-number suppression example
SELECT org_code, CASE WHEN COUNT(*) < 5 THEN NULL ELSE COUNT(*) END AS count_suppressed
FROM dbo.rare_event
GROUP BY org_code;
import pandas as pd
def suppress_small_numbers(df: pd.DataFrame, group_cols: list[str], threshold: int = 5) -> pd.DataFrame:
g = df.groupby(group_cols).size().reset_index(name="n")
g.loc[g["n"] < threshold, "n"] = None
return g
if __name__ == "__main__":
# demo
raw = pd.DataFrame({"org":["A","A","B","B","B","C"], "event":[1,1,1,1,1,1]})
print(suppress_small_numbers(raw, ["org"], threshold=5))
5) Secure transfer (SFTP or S3/MinIO)​
- SFTP
- S3/MinIO (encrypted)
#!/usr/bin/env bash
set -euo pipefail
sftp -b - "$SFTP_USER@$SFTP_HOST" <<EOF
put out/aggregates.csv secure/inbox/aggregates.csv
bye
EOF
#!/usr/bin/env bash
set -euo pipefail
aws --endpoint-url "$S3_ENDPOINT" s3 cp out/aggregates.parquet s3://$S3_BUCKET/exports/aggregates.parquet \
--sse AES256 --metadata project=nhs,classification=aggregated
▶️ Run (local demo)​
python audit.py
python export_guard.py
bash sftp_upload.sh # or
bash s3_upload.sh
🗓️ Week‑one build (repeatable, safe)​
Day 1 — Scope & DPIA
- Confirm purpose, legal basis, data flows; log DPIA ref in README.
- Create data dictionary and owners.
Day 2 — Secrets & roles
- Move secrets to env/secret store; use least‑privilege service accounts.
- Add audit logs (Python or SQL) to all read/write steps.
Day 3 — Masking & suppression
- Add SQL/Python masking and small‑number rules to exports.
- Add “data last updated” and sample size to reports.
Day 4 — Transfer & retention
- Configure SFTP or S3/MinIO with encryption; document retention/deletion.
- Add PR checks requiring IG checklist completion.
Day 5 — Review & share
- Peer review with IG + product team; publish a living guidance page (Docusaurus/MkDocs).
🛡️ Always‑on IG checklist​
- De‑identified/synthetic data in development examples
- No secrets in code or Git; rotate keys regularly
- Least‑privilege access; role‑based permissions; audit logs enabled
- Aggregation first; small‑number suppression in outputs
- DPIA reference and data dictionary kept up to date
- Avoid PHI in logs, tickets, and commit messages
📏 Measuring impact​
- Coverage: % projects with completed IG checklist & DPIA reference
- Security: zero committed secrets; time to rotate compromised keys
- Auditability: % pipelines logging reads/writes with correlation IDs
- Timeliness: time from request → approved secure share
- Quality: leakage incidents (target: zero); suppression rule adherence
📚 References & next​
See also: Secrets & .env · Docker · GitHub · Evidence.dev · FastAPI · AWS · Azure
What’s next?
You’ve completed the Persona — Information Governance stage. Keep momentum: