Combine two data frames
- bind_cols()
- inner_join()
- left_join()
- full_join()
Our toy data frames
# A tibble: 3 × 2
name band
<chr> <chr>
1 Mick Stones
2 John Beatles
3 Paul Beatles
#band_members2
band_instruments
# A tibble: 3 × 2
name plays
<chr> <chr>
1 John guitar
2 Paul bass
3 Keith guitar
bind_cols()
This function just smashes two dataframes together.
Only use if you are sure that the rows match up together.
bind_cols( band_instruments, band_members)
# A tibble: 3 × 4
name...1 plays name...3 band
<chr> <chr> <chr> <chr>
1 John guitar Mick Stones
2 Paul bass John Beatles
3 Keith guitar Paul Beatles
full_join()
Keeps all the data and by default matches by a column name.
full_join(band_instruments, band_members)
# A tibble: 4 × 3
name plays band
<chr> <chr> <chr>
1 John guitar Beatles
2 Paul bass Beatles
3 Keith guitar <NA>
4 Mick <NA> Stones
left_join()
Keeps all data in the left dataframe, adding rows from the right dataframe if there is a key match.
left_join(band_instruments, band_members)
# A tibble: 3 × 3
name plays band
<chr> <chr> <chr>
1 John guitar Beatles
2 Paul bass Beatles
3 Keith guitar <NA>
inner_join()
Keeps only the data when keys match in both data frames. This is helpful when you want to know what values the dataframes have in common.
inner_join(band_instruments, band_members)
# A tibble: 2 × 3
name plays band
<chr> <chr> <chr>
1 John guitar Beatles
2 Paul bass Beatles
Different Key names
If the keys in both data frames after different we must specify how to join the data.
left_join(
band_instruments2,
band_members,
by = c("artist" = "name")
)
# A tibble: 3 × 3
artist plays band
<chr> <chr> <chr>
1 John guitar Beatles
2 Paul bass Beatles
3 Keith guitar <NA>