What is a Function?
A function is a reusable block of code which takes an input argument and produces output by executing a set of code statements. Functions basically can encompass large pieces of code and embed them in a short code phrase. Functions can be provided by R or packages or user defined (created by you). R comes with many pre built functions for common tasks. Some examples include:
- unique() - returns the unique values from a vector or data frame column
- summary() - provides descriptive statistics for a data frame or vector
- length() - returns the number of elements in a vector
- mean(), median(), sd() - calculate descriptive statistics
Why Create User Defined Functions?
User-defined functions offer several important advantages:
- Code reusability: Write the code once and call it multiple times, reducing redundancy.
- Optimisation: Consolidate repeated code into a single, efficient block.
- Error reduction: Minimise copy paste errors and inconsistencies in code.
- Readability: Make your code more organized and easier to understand.
- Maintainability: Update code/information in one place rather than multiple locations.
Creating a function in R
To make a function this format needs to be used.
function_name_ <- function(parameters){body}
function_name # The function name is the identifier you use to call your function. When you define a function, it is stored as an object in your R environment. Choose names that are concise, clear, and meaningful to describe what the function does.
function(parameters) # Parameters (also called formal arguments) are variables defined in the function definition that represent inputs the function will receive. Parameters are enclosed in parentheses and separated by commas. For example
circumference <- function(r){
2*pi*r}
print(circumference(2))
{body} # The function body contains the code statements that execute when the function is called. This is where you perform calculations, manipulations, or other operations on the input parameters. The function body is enclosed in curly braces {}.
# Return Values = Functions typically return output using the return() statement. This specifies what result the function produces. If no explicit return() statement is included, R returns the value of the last expression evaluated in the function.
# Calling a Function = Once defined, you call a function by using its name followed by parentheses containing the argument values: calculate_average(5, 10) # Returns 7.5Example
calculate_calories_women <- function(weight, height, age){
(10 * weight) + (6.25 * height) - (5 * age) - 161}- This calculates the basal metabolic rate.
- For women the formula is (10 x weight) + (6.25 x height) - (5 x age) - 161
- To calculate the daily consumption of calories for a women who is 30,k weights 60kg and is 165cm tall
## [1] 1320.25
Function Creation
Plotting
fast_plot = name of the function
function_ =function code
(v) = place holder for variable (like x in algebra)
col_name) = It is a placeholder for information to pass on into the function. In this case _col_name_ stores the name of the column displayed as the x-axis label.
{ = Opens the function body
ggplot() = starts the plot
data.frame(x = factor(V, Gender = df$Gender) = creates a data frame with two columns
+ x = the variable pulled in from original data set (converted to categories with factor())
+ Gender = the Gender column from the data set
aes(x=x, fill = Gender) = sets up the aesthetics
+ x=x - puts variable on the x axis
+ fill = Gender = colours the bars by Gender
geom_bar(position = "dodge") = creates bar chart and places bars side-by-side
theme_light() = applies light theme
theme(axis.text.x = element_text(angle = 90)) = Rotates x axis labels to 90 degrees
labs(x = col_name) = adds label to plotHere is an example.
fast_plot1 <- function(x, col_name){
ggplot(data.frame(x = factor(x), Sex = df$Sex), aes(x = x, fill = Sex)) + # Creates a function called fast_plot1 that takes two inputs: x (a column of data) and col_name (a string for the x-axis label).
geom_bar(position = "dodge") +
theme_light() +
theme(axis.text.x = element_text(angle = 0)) +
labs(x = col_name) # data.frame(x = factor(x), Sex = df$Sex) — creates a temporary dataframe with two columns: x (your variable converted to a category with factor()) and Sex (pulled from the global df)
}
fast_plot1(df$Smoke_nic, "Smoke_nic")as.character
To make sure all the values have been pulled from the data set you need to include as.character.
You need to change from factor(v) to factor(as.character(v)) at the beginning of the code.
This ensures all the values are extracted from the data set.
fast_plot1 <- function(x, col_name){
ggplot(data.frame(x = factor(as.character(x)), Sex = df$Sex), aes(x = x, fill = Sex)) +
geom_bar(position = "dodge") +
theme_light() +
theme(axis.text.x = element_text(angle = 0)) +
labs(x = col_name)
}
fast_plot1(df$`Ever_sectioned?`, "Ever_sectioned")Converting the NAs’
Sometimes in a data set there will be NAs which have been manually put in and naturally occurring NAs which appear when the cell was left blank. To include all these values into the graph you need to add this code. Furthermore, this shows how to change values into different meaning which can be useful.
fast_plot1 <- function(x, col_name){
x[is.na(x)] <- "NA" # detects any true R missing values (the actual NA type) in your vector. It then replaces them with the string "NA".
x[x == "NA"] <- "NA" # This catches values that are already the string "NA" (i.e. someone typed "NA" as text in the data rather than it being a true missing value). It reassigns them to the same string "NA" — so effectively it's a safety/consistency check to make sure both types end up as the same thing.
ggplot(data.frame(x = factor(as.character(x)), Sex = df$Sex), aes(x = x, fill = Sex)) +
geom_bar(position = "dodge") +
theme_light() +
theme(axis.text.x = element_text(angle = 0)) +
labs(x = col_name)
}
fast_plot1(df$Psychosis_ever_sectioned, "Psychosis_ever_sectioned")Numeric Values
Data sets can include values which are numeric like age or weight. These need to be converted to as.numeric after they have been pulled from the data set using as.character. This means when they are plotted R recognizes them as numbers instead of text.
fast_plot_numeric <- function(x, col_name){
x <- as.character(x) # as.character - to pull all the data from the data set
x[is.na(x)] <- "-9" # v[is.na(v)] <- "-9" v[v == "NA"] <- "-9" = to convert the values to "-9"
x[x == "NA"] <- "-9"
x <- as.numeric(x) # as.numeric = to change the format of the data into its numeric format for plotting
ggplot(data.frame(x = x, Sex = df$Sex), aes(x = x, fill = Sex)) +
geom_bar(position = "dodge") +
theme_light() +
theme(axis.text.x = element_text(angle = 0, hjust = 0.5)) +
labs(x = col_name)
}
fast_plot_numeric(df$Age_at_assessment, "Age_at_assessment")Adding assertions
Assertions in R, is a statement which checks whether a condition is true, if it is false, the execution stops and returns an error code. The code for this is:
fast_plot_numeric <- function(x, col_name){
stopifnot(is.numeric(x))
x[is.na(x)] <- -9 # replace NA with -9 (no quotes, keeping it numeric)
plot_df <- data.frame(x = x, Sex = df$Sex)
ggplot(plot_df, aes(x = x, fill = Sex)) +
geom_bar(position = "dodge") +
theme_light() +
theme(axis.text.x = element_text(angle = 0)) +
labs(x = col_name)
}
fast_plot_numeric(df$Age_at_assessment, "Age_at_assessment")
fast_plot_numeric(df$Smoke_nic, "smoke_nic")
# Here the plot which contains numeric values has been pushed through the function to give the graph of the age of assessment. However the smoking nicotine variable which contains character values has been halted.Known encoding
This code can be used for example when you know the encoding of the variable and you need it to be converted for plotting.
fast_plot_known_encoding <- function(x, col_name){
x <- as.character(x)
x[is.na(x)] <- "NA"
x[x == "NA"] <- "NA"
x[x == "0"] <- "No"
x[x == "1"] <- "Yes"
ggplot(data.frame(x = x, Sex = df$Sex), aes(x = x, fill = Sex)) +
geom_bar(position = "dodge") +
theme_light() +
theme(axis.text.x = element_text(angle = 0, hjust = 0.5)) +
labs(x = col_name)
}
fast_plot_known_encoding(df$Talking_Therapy, "Talking_Therapy")Histogram
This changes the bar graphs to histograms used for decimals.
# This variable was given a fake data set with decimal numbers, therefore this section has not been filtered by sex.
df3 <- c(52.3, 55.8, 58.2, 61.5, 64.1, 66.8, 69.2, 71.5, 73.8, 76.2, 78.5, 80.9, 83.2, 85.6, 87.9, 90.2, 92.5, 94.8, 97.1, 99.4, 54.2, 57.1, 60.3, 63.4, 66.5, 69.7, 72.8, 75.9, 79.1, 82.2, 85.3, 88.5, 91.6, 94.7, 97.9, 52.8, 56.4, 59.7, 62.9, 66.1, 69.3, 72.5, 75.7, 78.9, 82.1, 85.3, 88.5, 91.7, 94.9, 98.1, 53.5, 56.9, 60.1, 63.2, 66.3, 69.4, 72.5, 75.6, 78.7, 81.8, 84.9, 88.0, 91.1, 94.2, 97.3, 54.7, 57.8, 60.9, 64.0, 67.1, 70.2, 73.3, 76.4, 79.5, 82.6, 85.7, 88.8, 91.9, 95.0, 98.1, 55.3, 58.5, 61.7, 64.9, 68.1, 71.3, 74.5, 77.7, 80.9, 84.1, 87.3, 90.5, 93.7, 96.9, 60.2, 65.4, 70.6, 75.8, 81.0, 86.2, 91.4, 96.6)
fast_plot_histogram <- function(x, col_name){
x <- as.character(x)
x[is.na(x)] <- "-9"
x[x == "NA"] <- "-9"
x <- as.numeric(x)
ggplot(data.frame(x = x), aes(x = x)) +
geom_histogram(position = "dodge", bins = 200, color = "white", fill = "tomato") +
theme_light() +
theme(axis.text.x = element_text(angle = 0, hjust = 0.5)) +
labs(x = col_name)
}
fast_plot_histogram(df3, "df3")Regession line example
# Create a dataset
set.seed(42) # for reproducibility, this pulls Rs random number generator. Without this rnorm() generates different random values every time you run the script. With set.seed you get the same data set each time.
n <- 100 # number of values
height <- rnorm(n, mean = 170, sd = 10) # This generates 100 random height values that follow a normal (bell-curve) distribution. The mean = 170 means the average height is 170 cm, and sd = 10 means most values cluster within 10 cm of that average. So you'd get heights ranging roughly between 150-190 cm, centered around 170.
weight <- 0.5 * height + rnorm(n, mean = -35, sd = 15) # This creates weight values with a deliberate relationship to height. This is because the height is multiplied by 0.5. Next subtracts 35 on average, plus adds some random noise. The rnorm(n, mean = -35, sd = 15) adds the noise which makes the data more realistic.
# Combine into a data frame
data <- data.frame(height = height, weight = weight)regression_function <- function(w, h) {
regression_model <- lm(h ~ w) # This makes the regression model, height vs "~" weight.
print(summary(regression_model)) # This should print the useful information about the regression line.
plot(w, h, main = "Height vs Weight", xlab = "Height (cm)", ylab = "Weight (kg)") # main = adds a title
abline(regression_model, col = "seagreen") # Adds regression line
}##
## Call:
## lm(formula = h ~ w)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.617 -6.909 -0.123 6.647 18.676
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 157.03095 3.37430 46.537 < 2e-16 ***
## w 0.27214 0.06618 4.112 8.16e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.666 on 98 degrees of freedom
## Multiple R-squared: 0.1472, Adjusted R-squared: 0.1385
## F-statistic: 16.91 on 1 and 98 DF, p-value: 8.158e-05
Using the if/else code
The next step is to write one function which checks whether the input is numeric or a character to then choose the appropriate plotting pathway.
Step 1
To determine if a variable contains numeric values, you can use
is.numeric(). However, this function only checks the data
type, not whether text-based data looks numeric (like “123” stored as
text).
Instead suppressWarnings(as.numeric(v)) is used. This
attempts to convert the values to numeric format. If the conversion
succeeds (even if some values become NA), the data is numeric-like. The
suppressWarnings() part suppresses conversion warning
messages that would otherwise appear when non-numeric text cannot be
converted.
This pattern works because: - Text that looks numeric (like “42” or
“3.14”) will convert successfully - Non-numeric text (like “Male” or
“Unknown”) will become NA without stopping the function -
suppressWarnings() keeps your output clean by hiding the
expected conversion warnings
Step 2.a
The next step is using the if/else code. This is how R makes decisions. It means;
- IF something is true <- do this
- ELSE <- fo something different.
The condition must be TRUE or FALSE
For this you need to use the curly brackets.
- if = the R if
- (condition) = what you would like the “if” to be
- open curly brackets to show the code if it is the “if”
For example
Step 2.b
Inside the function if decides which code the function runs.
The function returns whatever block is chosen.
The other block is completely ignored.
Example
## [1] "positive"
## [1] "zero or negative"
Example
## [1] "yippee R is amazing"
## [1] "not yet :("
Text and numeric example
fast_plot2 <- function(x, col_name) { # This is naming the function. x is the input.
# Test if xalues are numeric-like
numeric_test <- suppressWarnings(as.numeric(x)) # This tries to conxert all the xalues in "x" to numbers. If a xalue cannot be conxerted because it is text it is conxerted to "NA". Supress warnings hides any warning messages from the conxersion.
is_num <- !all(is.na(numeric_test)) # This checks is exery xalue in numeric_test is NA. Returns TRUE if all xalues are NA (meaning nothing was conxerted to numeric). Returns FALSE if at least some of the xalues haxe been conxerted to numeric.
# The "!" flips this result so is.num is TRUE if the data is numeric like and FALSE if there is text.
if (is_num) { # Checks if the data is numeric, if TRUE, this code block is executed
# ----- NUMERIC CASE -----
x <- suppressWarnings(as.numeric(x)) # Conxerts all the xalues in x to numeric format
x[is.na(x)] <- -9 # Conxerts the NAs into -9.
ggplot( # creates a ggplot
data.frame(x = x), # Puts the numeric xalues into a data frame with the column name "x"
aes(x = x) #shows the "x" column on the x-axis.
) +
geom_bar(position = "dodge", fill = "orange2") + # makes a bar graph, where the bars are placed side by side and coloured "orange2"
theme_light() + # Uses the light theme
labs(x = col_name) # Labels the x-axis label the column name
} else { # Exerything in these {} brackets will run when the data fails the numeric test.
# ----- TEXT CASE -----
x <- as.character(x)
x[is.na(x)] <- "NA"
x[x == "0"] <- "No"
x[x == "1"] <- "Yes"
ggplot(data.frame(x = x), aes(x = x)) +
geom_bar(position = "dodge", fill = "orchid") +
theme_light() +
theme(axis.text.x = element_text(angle = 90, hjust = 0.5)) +
labs(x = col_name)
}
}Text, numeric and histogram example
When you haxe more than one possible outcome you need to use and “if” inside of an “if”. This is called an “nested if statement”.
In this example the first if shows if the data is numeric or text. If the numbers contain decimals then a histogram is plotted, if the numbers are integers then a bar chart is plotted. Howexer if there is text the xariables skips the nested if and goes straight tot he text plotting.
fast_plot3 <- function(x, col_name) {
# Test if xalues are numeric-like
numeric_test <- suppressWarnings(as.numeric(x))
is_num <- !all(is.na(numeric_test))
if (is_num) {
# ----- NUMERIC CASE -----
x <- suppressWarnings(as.numeric(x))
x[is.na(x)] <- -9
# Check if xalues contain decimals
is_decimal <- any(x %% 1 != 0, na.rm = TRUE) # x %% 1 = The modulo operator which returns the remainder after dixision. e.g.
# For 2.5 %% 1 = 0.5 (remainder after dixiding by 1)
# ! = 0 = Checks if the remainder is NOT equal to zero e.g.
# 0.5 != 0 -> TRUE (has no decimal)
# 0 != 0 -> FALSE (no decimal, 0 is a whole number)
# na.rm = TRUE = Ignores NA xalues when checking
if (is_decimal) {
# ----- DECIMAL/CONTINUOUS CASE -----
ggplot(
data.frame(x = x),
aes(x = x)
) +
geom_histogram(bins = 30, fill = "seagreen4") +
theme_light() +
labs(x = col_name, y = "Frequency")
} else {
# ----- INTEGER CASE -----
ggplot(
data.frame(x = x),
aes(x = x)
) +
geom_bar(position = "dodge", fill = "orange2") +
theme_light() +
labs(x = col_name)
}
} else {
# ----- TEXT CASE -----
x <- as.character(x)
x[is.na(x)] <- "NA"
x[x == "0"] <- "No"
x[x == "1"] <- "Yes"
ggplot(data.frame(x = x), aes(x = x)) +
geom_bar(position = "dodge", fill = "orchid") +
theme_light() +
theme(axis.text.x = element_text(angle = 90, hjust = 0.5)) +
labs(x = col_name)
}
}