Skip to main content

R

R is a powerful open-source programming language and environment for statistical computing and graphics. Developed in 1993 by Ross Ihaka and Robert Gentleman at the University of Auckland, R has since become one of the most popular tools for data analysis, visualization, and statistical modeling. It offers a wide range of statistical and graphical techniques, including linear and nonlinear modeling, time-series analysis, classification, and clustering.

Why R for the NHS?​

R is particularly well-suited for use in the NHS for several reasons:

  • Cost-effective: As an open-source tool, R is free to use and distribute, making it an economical choice for NHS trusts. The University of Bradford reports that "If the NHS was to transition to using R, there is the potential to save millions of pounds in licence fees, and training and support—because these would essentially be free"1.
  • Versatility: R can handle various types of healthcare data, from patient records to epidemiological studies. The Health Foundation states that R can be used to "improve data analysis and develop shared solutions to common analytic challenges" in the NHS, indicating its ability to handle various types of healthcare data2.
  • Reproducibility: R scripts ensure that analyses are reproducible, which is crucial for maintaining transparency in healthcare research and decision-making.
  • Strong Community Support: The large R user community, including many in healthcare and biostatistics, provides extensive resources and packages tailored to medical research. The Health Foundation mentions "a worldwide R Community that freely shares learning and resources" and notes the development of an NHS-R Community that is "promoting the use of R in the NHS"23.
  • Integration Capabilities: NHS England Digital reports that their national Secure Data Environment platform includes RStudio, "a data analysis tool for R," integrated with other tools like Databricks and Gitlab, demonstrating R's ability to work within NHS systems4.
  • Predicting Patient Outcomes: The University of Bradford article mentions that R could be used "to create new statistical models based on the vast amount of data available to the NHS to support clinical and non-clinical decision making in the NHS".

Getting Started with R​

Before diving into R, you'll need to set up your environment:

Installation​

  1. Download R from the Comprehensive R Archive Network (CRAN).
  2. Follow the installation instructions for your operating system.

Installing RStudio​

RStudio is an integrated development environment (IDE) that enhances your R programming experience:

  1. Download RStudio:
  2. Install RStudio:
    • Run the downloaded installer file and follow the installation instructions.
    • For Windows and macOS, this typically involves running an .exe or .dmg file respectively.

Installing Packages​

Enhance R's functionality by installing additional packages:

  1. Using the console:
    • Use the command install.packages("package_name") in the R console.
    • For multiple packages: install.packages(c("dplyr", "ggplot2")).
  2. Using RStudio's interface:
    • In RStudio, go to Tools > Install Packages.
    • Search for the package you want to install and click "Install".
  3. Updating packages:
    • Use the update.packages() command to check for and install updates to your installed packages.

Basic R Programming Concepts​

Begin your journey with R by understanding these basic concepts:

Variables and Data Types​

R supports various data types, including:

  • Numeric (e.g., 3.14)
  • Integer (e.g., 42L)
  • Character (e.g., "Hello")
  • Logical (TRUE/FALSE)
  • Complex (e.g., 3+2i)

Variables in R are assigned using the <- operator:

x <- 5
name <- "John"

R also has special data types like factors for categorical data and Date for date-time information5.

Data Frames and Lists​

Data frames are table-like structures that are fundamental to data analysis in R. They can contain different types of data in each column:

df <- data.frame(
name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
height = c(165, 180, 175)
)

Lists are versatile structures that can contain elements of different types and lengths:

my_list <- list(
numbers = 1:5,
text = "Hello",
data = data.frame(x = 1:3, y = c("a", "b", "c"))
)

These structures are crucial for organizing and manipulating data in R6.

Basic Functions​

R comes with many built-in functions for data manipulation and analysis:

# Calculate mean
mean(c(1, 2, 3, 4, 5))

# Find maximum value
max(c(10, 5, 8, 12, 3))

# Create a sequence
seq(1, 10, by = 2)

You can also create your own functions:

square <- function(x) {
return(x^2)
}

Functions are essential for performing operations and creating reusable code57.

Data Import and Export​

R provides functions to read data from various file formats:

# Read CSV file
data <- read.csv("file.csv")

# Read Excel file (requires readxl package)
library(readxl)
excel_data <- read_excel("file.xlsx")

Similarly, you can write data to files:

# Write CSV file
write.csv(data, "output.csv")

# Write Excel file (requires writexl package)
library(writexl)
write_xlsx(data, "output.xlsx")

These functions allow you to work with external data sources, which is crucial for real-world data analysis58.

R Packages for Healthcare​

Several R packages are specifically useful for NHS staff:

dplyr​

Part of the tidyverse, dplyr is essential for data manipulation and analysis. It provides a set of tools for efficiently manipulating datasets in R. Key functions include:

  • filter() for subsetting data
  • arrange() for reordering rows
  • select() for picking variables by name
  • mutate() for creating new variables
  • summarise() for reducing multiple values to a single summary

NHS analysts can use dplyr to clean and prepare patient data, aggregate statistics, and perform complex data transformations efficiently29.

ggplot2​

Another tidyverse package, ggplot2 is powerful for creating advanced graphs and visualizations. It's based on the grammar of graphics, allowing users to build charts layer by layer. NHS staff can use ggplot2 to create:

  • Time series plots of patient outcomes
  • Heatmaps of disease prevalence
  • Box plots of treatment efficacy
  • Scatter plots of health indicators

ggplot2's flexibility makes it ideal for creating publication-quality figures for reports and presentations29.

caret​

The caret package (Classification And REgression Training) provides a unified interface for training and plotting predictive models. It's particularly useful for machine learning workflows in healthcare, such as:

  • Predicting patient readmissions
  • Classifying disease risk
  • Forecasting resource needs

Caret simplifies the process of model training, tuning, and evaluation, making it easier for NHS analysts to implement machine learning solutions10.

survival​

The survival package is crucial for survival analysis in clinical research. It provides a comprehensive set of tools for analyzing time-to-event data, which is common in healthcare studies. Functions include:

  • survfit() for fitting survival curves
  • coxph() for Cox proportional hazards regression
  • survdiff() for comparing survival curves

NHS researchers can use this package to analyze patient survival rates, treatment efficacy over time, and other time-dependent outcomes1.

Examples of R in Action​

  • The NHS-R Community is a vibrant group that promotes the use of R in the NHS. They provide resources, training, and support for NHS staff to learn and use R for data analysis and visualization.
  • The Health Economics Unit in the NHS has used R to create causal models for Population Health Management (PHM). These models can have tangible impacts on healthcare delivery and decision-making.
  • In East London, analysts have used R to analyze joined-up health and social care data. For example, they used exact matching in R to study the relationship between social isolation and healthcare costs. This analysis used a dataset showing how residents of the London Borough of Barking & Dagenham interact with NHS and council services.
  • The NHS-R Community has created an R package called NHSRdatasets, which contains several healthcare-related datasets specifically designed to help NHS, Public Health, and related analysts learn to use R. This package includes datasets with help files and vignette examples, making it easier for healthcare professionals to practice data analysis in a relevant context.
  • The CSP (Chartered Society of Physiotherapy) has highlighted how communities like NHS-R can help maximize shareable data in the NHS, indicating a growing recognition of R's potential in healthcare analytics.

Further Learning Resources​

Enhance your R skills with additional resources that are specifically tailored for healthcare applications:

  • DataCamp's R Courses: Offers a variety of courses on R, including those focused on data science techniques.
  • Coursera R Programming Courses: Provides comprehensive courses on R programming, with some focusing on biostatistics and epidemiology which are highly relevant to the NHS.

Footnotes​

  1. Anonymous. "NHS could save millions by switching to open-source software ‘R’" University of Bradford, 2022, Link ↩ ↩2

  2. Anonymous. "The power of R for the NHS" The Health Foundation, Link ↩ ↩2 ↩3 ↩4

  3. Mohammed, A Mohammed. "The Strategy Unit have been awarded funding from the Health Foundation to continue promoting the use of R in the NHS via the NHS-R Community" The Strategy Unit, 2019, Link ↩

  4. Chapman, Michael. "One giant step for secure data access" NHS Digital, 2023, Link ↩

  5. Schork, Joachim. "Introduction to the R Programming Language – Basic Concepts for Beginners" data36, 2021, Link ↩ ↩2 ↩3

  6. Anonymous. "R Tutorial" tutorialspoint, Link ↩

  7. Mehta, Siddharth. "Basic R Concepts" MSSQLTips, Link ↩

  8. Neth, Hansjörg. "Data Science for Psychologists - Chapter 1: Basic R concepts and commands" Bookdown, 2024, Link ↩

  9. Turner, Zoë. Jemmett, Tom. Hutson, Gary. Blatter, Cath. Mainey, Chris. "NHS-R Community Datasets " NHS-R Community, 2021 Link ↩ ↩2

  10. Pott, Jason. Jemmett, Tom. Beeley, Chris. "NHS-R Recommended Packages" NHS-R Community, 2021 Link ↩