---
title: "Time"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Time}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
echo = TRUE,
warning = FALSE,
eval = T
)
```
```{r, echo = FALSE, eval = TRUE, include = FALSE, messages = FALSE}
library(bdc)
```
#### **Introduction**
This module of the *bdc* package extracts the collection year whenever possible from complete and legitimate date information and flags dubious (e.g., 07/07/10), illegitimate (e.g., 1300, 2100), or not supplied (e.g., 0 or NA) collecting year.
#### **Installation**
Check [**here**](https://brunobrr.github.io/bdc/#installation) how to install the bdc package.
#### **Reading the database**
Read the database created in the [**Space**](https://brunobrr.github.io/bdc/articles/space.html) module of the *bdc* package. It is also possible to read any datasets containing the \*\*required\*\* fields to run the function (more details [here](https://brunobrr.github.io/bdc/articles/integrate_datasets.html)).
```{r echo=TRUE, eval=FALSE}
database <-
readr::read_csv(here::here("Output/Intermediate/03_space_database.csv"))
```
```{r echo=FALSE, eval=TRUE}
database <-
readr::read_csv(system.file("extdata/outpus_vignettes/03_space_database.csv", package = "bdc"), show_col_types = FALSE)
```
```{r echo=F, message=FALSE, warning=FALSE, eval=TRUE}
DT::datatable(
database[1:15,], class = 'stripe', extensions = 'FixedColumns',
rownames = FALSE,
options = list(
pageLength = 3,
dom = 'Bfrtip',
scrollX = TRUE,
fixedColumns = list(leftColumns = 2)
)
)
```
⚠️**IMPORTANT:**
The results of the VALIDATION test used to flag data quality are appended in separate fields in this database and retrieved as TRUE (✅ ok) or FALSE (❌check carefully).
#### **1 - Records lacking event date information**
*VALIDATION*. This function flags records lacking event date information (e.g., empty or NA).
```{r}
check_time <-
bdc_eventDate_empty(data = database, eventDate = "verbatimEventDate")
```
#### **2 - Extract year from event date**
*ENRICHMENT*. This function extracts four-digit years from unambiguously interpretable collecting dates.
```{r}
check_time <-
bdc_year_from_eventDate(data = check_time, eventDate = "verbatimEventDate")
```
#### **3 - Records with out-of-range collecting year**
*VALIDATION*. This function identifies records with illegitimate or potentially imprecise collecting years. The year provided can be out-of-range (e.g., in the future) or collected before a specified year supplied by the user (e.g., 1900). Older records are more likely to be imprecise due to the locality-derived geo-referencing process.
```{r}
check_time <-
bdc_year_outOfRange(data = check_time,
eventDate = "year",
year_threshold = 1900)
```
#### **Report**
Here we create a column named **.summary** summing up the results of all **VALIDATION** tests. This column is **FALSE** when a record is flagged as FALSE in any data quality test (❌check carefully. potentially invalid or suspect record).
```{r}
check_time <- bdc_summary_col(data = check_time)
```
Creating a report summarizing the results of all tests of the *bdc* package. The report can be automatically saved if `save_report = TRUE.`
```{r eval=FALSE}
report <-
bdc_create_report(data = check_time,
database_id = "database_id",
workflow_step = "time",
save_report = FALSE)
report
```
```{r echo=FALSE, eval=TRUE}
report <-
readr::read_csv(
system.file("extdata/outpus_vignettes/04_Report_time.csv",
package = "bdc"),
show_col_types = FALSE
)
```
```{r echo=FALSE, message=FALSE, warning=FALSE, eval=TRUE}
DT::datatable(
report, class = 'stripe', extensions = 'FixedColumns',
rownames = FALSE,
options = list(
# pageLength = 5,
dom = 'Bfrtip',
scrollX = TRUE,
fixedColumns = list(leftColumns = 2)
)
)
```
#### **Figures**
Here we create figures (bar plots and histrogram) to make the interpretation of the results of data quality tests easier. See some examples below. Figures can be automatically saved if `save_figures = TRUE.`
```{r eval=FALSE}
figures <-
bdc_create_figures(data = check_time,
database_id = "database_id",
workflow_step = "time",
save_figures = FALSE)
# Check figures using
figures$time_year_BAR
```
![Number of records sampled over the years](https://raw.githubusercontent.com/brunobrr/bdc/master/vignettes/images/time_year_BAR.png)
![Summary of all tests of the time module; note that some database lack event date information](https://raw.githubusercontent.com/brunobrr/bdc/master/vignettes/images/time_.summary_BAR.png)
![Summary of all validation tests of the bdc package](https://raw.githubusercontent.com/brunobrr/bdc/master/vignettes/images/time_summary_all_tests_BAR.png)
#### **Saving a "raw" database**
Save the original database containing the results of all data quality tests appended in separate columns. You can use [qs::qread()]{.underline} instead of write_csv to save a large database in a compressed format.
```{r eval=FALSE}
check_time %>%
readr::write_csv(.,
here::here("Output", "Intermediate", "04_time_database.csv"))
```
#### **Filtering the database**
Let's remove potentially erroneous or suspect records flagged by the data quality tests applied in all modules of the *bdc* package to get a "clean", "fitness-for-use" database. Note that **25%** (45 out of 180 records) of original records were considered "fitness-for-use" after the data-cleaning process.
```{r}
output <-
check_time %>%
dplyr::filter(.summary == TRUE) %>%
bdc_filter_out_flags(data = ., col_to_remove = "all")
```
#### **Saving a clean "fitness-for-use" database**
You can use [qs::qsave()]{.underline} instead of write_csv to save a large database in a compressed format.
```{r eval=FALSE}
# use qs::qsave() to save the database in a compressed format and then qs:qread() to load the database
output %>%
readr::write_csv(.,
here::here("Output", "Intermediate", "05_cleaned_database.csv"))
```
```{r echo=FALSE, eval=TRUE}
output <-
readr::read_csv(system.file("extdata/outpus_vignettes/05_cleaned_database.csv", package = "bdc"), show_col_types = FALSE)
```
```{r echo=FALSE, message=FALSE, warning=FALSE, eval=TRUE}
DT::datatable(
output[1:15,], class = 'stripe', extensions = 'FixedColumns',
rownames = FALSE,
options = list(
pageLength = 3,
dom = 'Bfrtip',
scrollX = TRUE,
fixedColumns = list(leftColumns = 2)
)
)
```