How to restructure data frames with R’s melt function

Converting data frames with the melt() function in R makes it easier to adapt to various requirements. Many methods of analysis such as linear models and ANOVA prefer data in a long format, because it’s more natural and easier to interpret.

What is R’s melt() function used for?

R’s melt() function belongs to the reshape2 package and is used to restructure data frames, particularly to convert them from a wide format to a long format. In a wide format, variables are organized in separate columns, whereas a long format offers better display for analyses and visualizations.

The melt() function in R is an essential tool for transforming data. It’s especially relevant when information is only available in a wide format, but certain analyses or graphics require a long format. This option for restructuring data increases the flexibility of data frames and allows for optimal use of various R analysis tools and visualization libraries.

What is the syntax of R’s melt() function?

The melt() function in R can be customized using different arguments.

melt(data.frame, na.rm = FALSE, value.name = "name", id.vars = 'columns')
R
  • data.frame: This refers to the data frame that you want to restructure
  • na.rm: An optional argument that has a default value of FALSE
  • value.name: This optional argument enables you to name the column that contains the values for the restructured variables in the new data set
  • id.vars: An optional argument that indicates which columns should be kept as identifiers. columns is used as a placeholder.

Let’s look at an example:

df <- data.frame(ID = 1:3, A = c(4, 7, NA), B = c(8, NA, 5))
R

The resulting data frame looks as follows:

ID    A      B
1  1     4      8
2  2     7  NA
3  3  NA     5
R

Now we’ll use melt() and transform the data frame into a long format:

melted_df <- melt(df, na.rm = FALSE, value.name = "Value", id.vars = "ID")
R

The restructured data frame melted_df looks like this:

ID  variable  Value
1  1                A              4
2  2                A              7
3  3                A          NA
4  1                B              8
5  2                B          NA
6  3                B             5
R

The result is a data frame that has been restructured into a long format. The ID column was retained as an identifier, the variable column contains what were previously column names (A and B) and the Value column contains the corresponding elements. Due tona.rm = FALSE, there are some missing values (marked with NA).

How to remove NA entries with R’s melt()

You can easily remove missing values in data frames with the option na.rm=True.

Let’s define a new data frame:

df <- data.frame(ID = 1:4, A = c(3, 8, NA, 5), B = c(6, NA, 2, 9), C = c(NA, 7, 4, 1))
R

The data frame has the following form:

ID    A     B      C
1   1     3     6    NA
2   2     8   NA      7
3   3   NA    2       4
4   4     5     9       1
R

Now we’ll restructure the data frame using melt():

melted_df <- melt(df, na.rm = TRUE, value.name = "Value", id.vars = "ID")
R

The new data frame melted_df now exists in a long format without NA values:

ID    variable  Value
1    1            A        3
2    2            A        8
3    4            A        5
4    1            B        6
5    3            B        2
6    4            B        9
7    2           C        7
8    3           C        4
9    4           C        1
R
Tip

If you want to learn about how to manipulate strings in R, take a look at the R substring() and R paste() tutorials in our Digital Guide.

Web Hosting
Fast, scalable hosting for any website
  • 99.9% uptime
  • PHP 8.3 with JIT compiler
  • SSL, DDoS protection, and backups
Was this article helpful?
We use cookies on our website to provide you with the best possible user experience. By continuing to use our website or services, you agree to their use. More Information.
Page top