Dplyr
- An R package (part of the tidyverse) for manipulating and analyzing tabular data.
- Provides concise functions to filter/subset rows and to group and summarise data.
- Common workflow uses the pipe (%>%) with verbs like filter(), group_by(), and summarise().
Definition
Section titled “Definition”Dplyr is a powerful R package for data manipulation and analysis. It is a part of the tidyverse, a collection of packages designed for data science in R. Dplyr offers a set of convenient functions for filtering, grouping, and summarizing datasets, making it an essential tool for data analysis.
Explanation
Section titled “Explanation”One key feature of dplyr is filtering and subsetting data based on criteria using filter(). For example, to select rows for houses with 3 bedrooms:
housing_prices %>%
filter(bedrooms == 3)This returns a new dataset containing only rows where bedrooms == 3, allowing focused analysis on that subset.
Another useful capability is grouping data and applying summary statistics using group_by() together with summarise(). For example, to compute the total amount spent by each customer:
customer_transactions %>%
group_by(customer_id) %>%
summarise(total_spent = sum(amount))This produces a dataset with the total amount spent per customer, enabling comparison of spending across customers.
Examples
Section titled “Examples”Filtering example
Section titled “Filtering example”housing_prices %>%
filter(bedrooms == 3)Grouping and summarising example
Section titled “Grouping and summarising example”customer_transactions %>%
group_by(customer_id) %>%
summarise(total_spent = sum(amount))Related terms
Section titled “Related terms”- tidyverse
- R
- filter()
- group_by()
- summarise()