I have been searching for a clean solution to building a date dimension so here I am writing my own. In R!
If you don’t speak Kimballese, a date dimension is a calendar. It contains a row for each day in a year as well as attributes about those dates. For example, a date dimension for 2023 would contain 365 rows and have columns such as week number, day of week, whether the day is a holiday etc. There’s really no correct way of building a date dimension as different organisations might have different ways of aggregating data across time but it is useful to have a “starter” table.
Building the date table
First, we’ll build the table using lubridate. My locale is Lithuanian, but your output should adapt to your locale:
library(lubridate)
Attaching package: 'lubridate'
The following objects are masked from 'package:base':
date, intersect, setdiff, union
# A tibble: 6 × 12
date full_date_description day_of_week day_of_week_name
<date> <chr> <dbl> <ord>
1 2011-01-01 2011 m. sausis 01 d. 6 šeštadienis
2 2011-01-02 2011 m. sausis 02 d. 7 sekmadienis
3 2011-01-03 2011 m. sausis 03 d. 1 pirmadienis
4 2011-01-04 2011 m. sausis 04 d. 2 antradienis
5 2011-01-05 2011 m. sausis 05 d. 3 trečiadienis
6 2011-01-06 2011 m. sausis 06 d. 4 ketvirtadienis
# ℹ 8 more variables: calendar_iso_week <dbl>, calendar_week <dbl>,
# calendar_month <dbl>, calendar_month_name <ord>, calendar_quarter <int>,
# calendar_quarter_name <chr>, calendar_year <dbl>, is_weekday <int>
If this were a typical date dimension post, all you would have to do is adjust the date range and either export the table or write to a database. But this date dimension could be improved:
It doesn’t contain bank holidays
If your use case is different enough from mine, you would have to edit more than the date range to get a desired result.
Let’s use Nager.Date for this exercise. The API returns a list of dates that are bank holidays with quite a few useful attributes but I am keeping just the English and local holiday names.
# A tibble: 14 × 3
date holiday_name_local holiday_name
<date> <chr> <chr>
1 2023-01-01 Naujieji metai New Year's Day
2 2023-02-16 Lietuvos valstybės atkūrimo diena The Day of Restoration o…
3 2023-03-11 Lietuvos nepriklausomybės atkūrimo diena Day of Restoration of In…
4 2023-04-09 Velykos Easter Sunday
5 2023-04-10 Antroji Velykų diena Easter Monday
6 2023-05-01 Tarptautinė darbo diena International Working Day
7 2023-06-24 Joninės, Rasos St. John's Day
8 2023-07-06 Valstybės diena Statehood Day
9 2023-08-15 Žolinė Assumption Day
10 2023-11-01 Visų šventųjų diena All Saints' Day
11 2023-11-02 Vėlinės All Souls' Day
12 2023-12-24 Šv. Kūčios Christmas Eve
13 2023-12-25 Šv. Kalėdos Christmas Day
14 2023-12-26 Šv. Kalėdos St. Stephen's Day
Make it a function
To turn the code above into a function we are making the following changes:
Pulling start and end dates, date format, week start number and country code as function arguments
Passing the years from the date dimension to the get_holiday function. I want to create tables that span multiple years so we’re using purrr::map_df to get holidays for multiple years.