Impact evaluation of Slovak active labour market policy programmes

Prepared by Soňa Dulíková, Lukáš Lafférs and Miroslav Štefánik (miroslav.stefanik@savba.sk)
Institute of Economic Research, Slovak Academy of Sciences
This work was supported by the Slovak Research and Development Agency under the contract no. APVV-17-0329.

The Support for hiring disadvantaged job seekers

This is an automated report providing evidence on Support for hiring disadvantaged job seekers, one of the active labour market policy (ALMP) programmes, implemented in Slovakia between during 2017.
The Support for hiring disadvantaged job seekers is provided based on the § 50 of the Act on Employment Services Col. 2004/5.

if(qualitative_data_condition) {
  knitr::knit_exit()
}

1. The description of the programme

Based on the Labour Market Policy Database (LMP) administrated by DG Employment of the European Commission, the Support for hiring disadvantaged job seekers is classified as an “Employment incentives” type of ALMP programme, with the programme specific code 41_SK6.

Support for hiring disadvantaged job seekers aims: Employers incentive to create generated job for disadvantaged job seekers.

Beneficiaries are registered disadvantaged job seekers.

The Eligible are disadvantaged job seekers that have been registered in the register of job seekers for at least 3 months. disadvantaged job seekers: a) a citizen under the age of 26 years who has completed a relevant level of education in full-time study within the last two years and before registration as a jobseeker had had no regular paid employment (hereinafter a graduate ). b) a citizen over the age of 50 years, c) a citizen who has been registered in the register of jobseekers for at least 12 consecutive months(hereinafter a long-term unemployed citizen ), d) a citizen who has completed education of a lower level than secondary vocational education pursuant to applicable legislation,13c) e) a citizen who did not have regular paid employment for at least 12 consecutive calendar months before the start of his/her registration in the register of jobseekers, f) a third-country national who has been granted asylum13d) or who has been given subsidiary protection,13e) g) a citizen who lives as a single adult person with one or more persons dependent on his/her care, or who cares for one or more school-age children, 13f) h) a citizen with a disability.

Implementation: The contribution for employing support a disadvantaged job seeker who has been registered in the register of job seekers for at least 3 months may be granted to the employer employing the disadvantaged job seeker in a generated job. The employer shall be obliged to maintain the job in respect of which a contribution is provided at least for 24 calendar months. The contribution shall be provided in monthly intervals and it is derived from the amount of the total price of labour calculated from the average gross wage of an employee in the Slovak Republic s economy for the previous calendar year. The sum and duration of the contribution depend on the type of the region that is entitled to receive state assistance, on the average rate of unemployment in the relevant district and on the legal status and the subject of business of the employer.

1.1 Participants and expenditures

#Preparing of participants and expenditures tables 
partSK <- subset(part, geo == 'SK' & year == format(as.Date(params$ep_start),"%Y") & age == 'TOTAL' & sex == 'T' & stk_flow == 'ENT' )
names(programesSK)[1] <- 'lmp_type'
partSK <- merge(partSK, programesSK, by = 'lmp_type', all = TRUE)
partSK <- subset(partSK, substr(Classification, 1, 1) == '2' | substr(Classification, 1, 1) == '4' | substr(Classification, 1, 1) == '5' | substr(Classification, 1, 1) == '6' | substr(Classification, 1, 1) == '7')
partSK_datapie <- subset(partSK, lmp_type == lmp_code | lmp_type == '2' | lmp_type == '4' | lmp_type == '5' | lmp_type == '6' | lmp_type == '7')
partSK_datapie$value <- ifelse(is.na(partSK_datapie$value), sum(partSK[substr(partSK$lmp_type, 1, 6) == substr(lmp_code, 1, 6) , 'value'], na.rm = TRUE), partSK_datapie$value)


expSK <- subset(exp, year == format(as.Date(params$ep_start),"%Y"))
expSK <- merge(expSK, programesSK, by = 'lmp_type', all = TRUE)
expSK <- subset(expSK, substr(Classification, 1, 1) == '2' | substr(Classification, 1, 1) == '4' | substr(Classification, 1, 1) == '5' | substr(Classification, 1, 1) == '6' | substr(Classification, 1, 1) == '7')
expSK_datapie <- subset(expSK, lmp_type == lmp_code | lmp_type == '2' | lmp_type == '4' | lmp_type == '5' | lmp_type == '6' | lmp_type == '7')
expSK_datapie$value <- ifelse(is.na(expSK_datapie$value), sum(expSK[substr(expSK$lmp_type, 1, 6) == substr(lmp_code, 1, 6) , 'value'], na.rm = TRUE), expSK_datapie$value)

#Share of expenditures and participants at programm

partSK_per <- partSK_datapie %>% select(lmp_type, value) %>% 
  subset(lmp_type != subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1]) %>%
  mutate(per = paste(round(100 * value / sum(value),2),'%'))

expSK_per <- expSK_datapie %>% select(lmp_type, value) %>% 
  subset(lmp_type != subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1]) %>%
  mutate(per = paste(round(100 * value / sum(value),2),'%'))

program_part <-  data.frame(lmp_type  = c('total', as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])),
                            participants = c(sum(partSK_per$value), subset(partSK_datapie, partSK_datapie$almp == params$measure)$value[1]),
                            per = c('100 %', paste(round(subset(partSK_datapie, partSK_datapie$almp == params$measure)$value[1] / sum(partSK_per$value) * 100,2),'%'))
                            )

program_exp <-  data.frame(lmp_type  = c('total', as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])),
                            expenditures = c(sum(expSK_per$value), subset(expSK_datapie, expSK_datapie$almp == params$measure)$value[1]),
                            per = c('100 %', paste(round(subset(expSK_datapie, expSK_datapie$almp == params$measure)$value[1] / sum(expSK_per$value) * 100,2),'%'))
                            )


type_share_par <- partSK_per$per[partSK_per$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$Classification[1])] 
prog_share_par <- program_part$per[program_part$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]
prog_numb_par <- program_part$participants[program_part$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]

type_share_exp <- expSK_per$per[expSK_per$lmp_type == subset(qualitative, qualitative$almp == params$measure)$Classification[1]]
prog_share_exp <- program_exp$per[program_exp$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]
prog_numb_exp <- program_exp$expenditures[program_exp$lmp_type == as.character(subset(qualitative, qualitative$almp == params$measure)$lmp_type.x[1])]

The left pie chart Participants displays the share of participants in ALMP programmes grouped by ALMP types of the LMP classification. Shares are based on LMP Database stock figures reported for the calendar year 2017.

The right pie chart Expenditure display the share of expenditure using the same LMP typology as the Participants pie chart. Comparing the share on the total pie of participants to the share on the total pie of expenditures provides an assessment of unit costs of the programme relative to the average. If the percentage of participants is higher than on total spending, then the programme is less expensive than the average ALMP programme (and vice versa).

Graph 1: Resources flowing to Support for hiring disadvantaged job seekers during 2017

#Preparing data for Pie Chart
#PARTICIPANTS
piechart_P <- select(partSK_datapie, lmp_type, value)

piechart_P$lmp_type <- ifelse(piechart_P$lmp_type == '2' | piechart_P$lmp_type == '4' | piechart_P$lmp_type == '5' | piechart_P$lmp_type == '6' | piechart_P$lmp_type == '7',       
                              qualitative$class[match(piechart_P$lmp_type, qualitative$Classification)], 
                              paste(qualitative$class[match(piechart_P$lmp_type, qualitative$lmp_type.x)], 
                                    qualitative$Labour.market.services[match(piechart_P$lmp_type, qualitative$lmp_type.x)], 
                                    sep =': ' ))

piechart_P <- filter(piechart_P, !duplicated(piechart_P$lmp_type))
piechart_P$lmp_type <- ifelse(piechart_P$lmp_type == 'Training: [Component] Projects and programmes - Projects - Education and training','Training: [Component] Projects and programmes - Projects - Education and training Repas',piechart_P$lmp_type)

piechart_P <- piechart_P[order(piechart_P$lmp_type),]
piechart_P$focus <- ifelse(
  piechart_P$lmp_type == paste(
    subset(qualitative, qualitative$almp == params$measure)$class[1],
    subset(qualitative, qualitative$almp == params$measure)$Labour.market.services[1],
    sep =': '), 
  0.05,
  0)

#EXPENDITURES 
piechart_E <- select(expSK_datapie, lmp_type, value)

piechart_E$lmp_type <- ifelse(piechart_E$lmp_type == '2' | piechart_E$lmp_type == '4' | piechart_E$lmp_type == '5' | piechart_E$lmp_type == '6' | piechart_E$lmp_type == '7',       
                              qualitative$class[match(piechart_E$lmp_type, qualitative$Classification)], 
                              paste(qualitative$class[match(piechart_E$lmp_type, qualitative$lmp_type.x)], 
                                    qualitative$Labour.market.services[match(piechart_E$lmp_type, qualitative$lmp_type.x)], 
                                    sep =': ' ))

piechart_E <- filter(piechart_E, !duplicated(piechart_E$lmp_type))
piechart_E$lmp_type <- ifelse(piechart_E$lmp_type == 'Training: [Component] Projects and programmes - Projects - Education and training','Training: [Component] Projects and programmes - Projects - Education and training Repas',piechart_E$lmp_type)

piechart_E <- piechart_E[order(piechart_E$lmp_type),]
piechart_E$focus <- ifelse(
  piechart_E$lmp_type == paste(
    subset(qualitative, qualitative$almp == params$measure)$class[1],
    subset(qualitative, qualitative$almp == params$measure)$Labour.market.services[1],
    sep =': '), 
  0.05,
  0)


## PIE CHART
#PARTICIPANTS
piechart_P$lmp_type <- gsub("(.{25,}?)\\s", "\\1\n", piechart_P$lmp_type)
ev_measure <- gsub("(.{25,}?)\\s", "\\1\n",
                   paste(subset(qualitative, qualitative$almp == params$measure)$class[1],
                         subset(qualitative, qualitative$almp == params$measure)$Labour.market.services[1],
                         sep =': '))

ev_type <- gsub("(.{25,}?)\\s", "\\1\n",
                subset(qualitative, qualitative$almp == params$measure)$class[1])

piechart_P$value <- ifelse(piechart_P$lmp_type == ev_type, piechart_P$value - piechart_P[piechart_P$lmp_type == ev_measure,"value"], piechart_P$value)

piechart_P <- piechart_P %>% 
  mutate(per = paste0(round(100 * value / sum(value),2), "%", by = ""),
         total = sum(value),
         end_angle = 2*pi*cumsum(value)/total,      
         start_angle = lag(end_angle, default = 0),   
         mid_angle = 0.5*(start_angle + end_angle))

rpie <- 1
rlabel <- 0.6 * rpie 

plot_piechart_P <- ggplot(piechart_P) + 
  theme_no_axes() + 
  theme_void() +
  coord_fixed()+
  geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = value, 
                   fill = lmp_type, explode  = focus),
               data = piechart_P, stat = 'pie',color='white')+
  ggtitle('Participants') +
  geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = ifelse(lmp_type == ev_type | lmp_type == ev_measure, per,"")),
            size = 4, position=position_jitter(width=0,height=0.3))+
  theme(plot.title = element_text(color="black", size=18, face="bold.italic"),
        legend.title = element_text(size=15, face="bold"),
        legend.text = element_text(size=12),
        legend.key.height=unit(1.1, "cm")) +
  guides(fill=guide_legend(title="Classification of LMP\n and measure")) +
  scale_fill_manual(values=c(ifelse(piechart_P$value != 0 &  piechart_P$lmp_type == ev_measure,  
                                    "steelblue3", 
                                    ifelse(piechart_P$lmp_type == ev_type, 
                                           "steelblue1", 
                                           gray.colors(6, start = 0.8)))))

#EXPENDITURES
piechart_E$lmp_type <- gsub("(.{25,}?)\\s", "\\1\n", piechart_E$lmp_type)

piechart_E$value <- ifelse(piechart_E$lmp_type == ev_type, piechart_E$value - piechart_E[piechart_E$lmp_type == ev_measure,"value"], piechart_E$value)

piechart_E <- piechart_E %>% 
  mutate(per = paste0(round(100 * value / sum(value),2), "%", by = ""),
         total = sum(value),
         end_angle = 2*pi*cumsum(value)/total,      
         start_angle = lag(end_angle, default = 0),   
         mid_angle = 0.5*(start_angle + end_angle))

rpie <- 1
rlabel <- 0.6 * rpie 

plot_piechart_E <- ggplot(piechart_E) + 
  theme_no_axes() + 
  theme_void() +
  coord_fixed()+
  geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = value, 
                   fill = lmp_type, explode  = focus),
               data = piechart_E, stat = 'pie',color='white')+
  ggtitle('Expenditure') +
  geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = ifelse(lmp_type == ev_type | lmp_type == ev_measure, per,"")),
            size = 4, position=position_jitter(width=0,height=0.3))+
  theme(plot.title = element_text(color="black", size=18, face="bold.italic"),
        legend.title = element_text(size=15, face="bold"),
        legend.text = element_text(size=12),
        legend.key.height=unit(1.1, "cm")) +
  guides(fill=guide_legend(title="Classification of LMP\n and measure")) +
  scale_fill_manual(values=c(ifelse(piechart_E$value != 0 &  piechart_E$lmp_type == ev_measure,  
                                    "steelblue3", 
                                    ifelse(piechart_E$lmp_type == ev_type, 
                                           "steelblue1", 
                                           gray.colors(6, start = 0.8)))))

Based on the LMP database, 2 738 individuals participated in the Support for hiring disadvantaged job seekers during 2017. This accounted for the 1.42 % percent of the total number of participants in all Slovak ALMP programmes (LMP types 2-7), while the share of the programme on the total expenditure was 3.62 %. The total roofing LMP type Employment incentives presents 60.68 % percent of the total Slovak ALMP (type 2-7) expenditure and 61.68 % percent of the total ALMP participants.

1.2 The Support for hiring disadvantaged job seekers in the context of ALMPs in Slovakia

Using administrative data, we first provide a picture of the importance of the Support for hiring disadvantaged job seekers in the context of Slovak ALMP. The following flowchart displays flow of job seekers registered into the database of unemployment during . Flows are based on the movement of these individuals during the two years following their registration; this period is divided into six months sub-periods (0/6/12/18/24). For each of these sub-periods, we observe the flows of registered jobseekers to employment, or their de-registration for another reason.
Job seekers can also move to one of the ALMP programmes. The lines highlighted in red represent the flow of jobseekers participating in measure P050, Support for hiring disadvantaged job seekers.

Graph 2: The Support for hiring disadvantaged job seekers in the structure of the flows of jobseekers registered during 2017

The next table shows full names of programmes shown in the graph above.

Table 1: Explanatory table for Graph 2

Sankey_description <- select(qualitative, almp, Labour.market.services, Labour.market.services_SK)
Sankey_description[32,1] <- 'P054'
Sankey_description<- rbind(Sankey_description, Sankey_description[32,])
Sankey_description[nrow(Sankey_description),1] <- 'P54O'
Sankey_description[28,1] <- 'P54D'

Sankey_tabel1 <- data.frame(Measures = c(setdiff(nastroj_kod, c("another reason", "employed"))))
Sankey_tabel1$'Name of programme' <- Sankey_description$Labour.market.services[match(Sankey_tabel1$Measures, Sankey_description$almp)]

Sankey_tabel1  %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE)%>%
  kable_classic('hover', full_width = FALSE)%>%
  column_spec(1,  border_right = TRUE)

Measures	Name of programme
P54R	[Component] Projects and programmes - Projects - Education and training Repas
P053	Commuting allowance
P054	Projects and programmes
P051	Support for graduate work experience
P54O	[Component] Projects and programmes - Self-employment
P060	Reimbursement of operating costs of sheltered workshops and workplaces
P54K	[Component] Projects and programmes - Projects - Education and training
P052	Work in minor services for municipalities or self-governing regions
P52A	Voluntary work
P049	Self-employment grant
P050	Support for hiring disadvantaged job seekers

## Preparing of df
# we observe only  young people under XX (age) which inflow in XXXX (entry year) 
df_r <- subset(df, age <= age_group_max)
df_r <- subset(df_r, format(as.Date(df_r$entry),"%Y")== format(as.Date(params$ep_start),"%Y"))
#df_r <- df_r %>% mutate_all(na_if,"") #if cells are empty -> change it to NAs
df_r <- subset(df_r, healthy < 3)

# DOVOD VYRADENIA 
dovod_vyradenia = c('V01','V02','V03','V1','V12','V15') #zamestnali sa 

# Opatrenie P032 vyhodiť (ak sa budú meniť aj iné opatrenia ako napr. 54R a 54Rp tak tu sa to opraví (%in% c()))
delete <- setdiff(c('P032','P54P','P54D'),params$measure)
Salmps <- subset(almps, !nastroj  %in% delete)

## Spojenie DF a ALMPS 
df_almps <- merge(Salmps, df_r, by = 'klient_id')

## Vyfiltruj iba tie klient_id. pri ktorých nástroj je params$measure
partic_measure <- df_almps[df_almps$nastroj == params$measure, 'klient_id']
df_almps <- filter(df_almps, klient_id %in% partic_measure) #tu su tí, ktorí boli na aj na params$measure ale aj na iných opatreniach 

# Podmienka prieniku času pri databáze nezamestnaných a v zúčastnili sa evaluated measure
df_almps_measure <- subset(df_almps, entry <= entrya & exita <= (exit +7) & nastroj == params$measure) # tu su tí, ktorí išli v skúmanom čase do opatrenia params$measure 
##### toto je moja základňa môj prvý stĺpec 

##### Podmienka prieniku času pri databáze nezamestnaných a iných programoch ako evaluated measue ale zároveň sú to tí, ktorí už niekedy na evaluated measure už boli 
df_almps_other <- subset(df_almps, entry <= entrya & exita <= (exit +7) & nastroj != params$measure)

# upravíme si dáta ktoré budeme používať pri grafe
df_almps_measure <- select(df_almps_measure, klient_id, entry, exit, entrya, exita, nastroj, dovod_vyradenia_kod)
df_almps_other <- select(df_almps_other, klient_id, entry, exit, entrya, exita, nastroj, dovod_vyradenia_kod)


#smojím iba tých ktorí boli aj v evaluated measure aj v iných opatreniach, all.x = TRUE lebo chceme aj tých ktorý boli iba na evaluated measure (nemuseli sa zúčastniť aj iných programov)
flow <- merge(df_almps_measure, df_almps_other, by = 'klient_id', all.x = TRUE) 

#dni od začatia opatrenia entrya.x po začatie iného opatrenia entrya.y alebo po zamestnanie/odchod z iného dôvodu exit.x
flow$days <- ifelse(is.na(flow$nastroj.y),
                    as.numeric(difftime(flow$exit.x, flow$entrya.x, units = 'days')),
                    as.numeric(difftime(flow$entrya.y, flow$entrya.x, units = 'days')))

#ak je dovod vyradenia NA ale zúčastnili sa na opatrení 
flow$dovod_vyradenia_kod.y <- ifelse(is.na(flow$dovod_vyradenia_kod.y) & !is.na(flow$nastroj.y), 'V01', flow$dovod_vyradenia_kod.y)

#vyfiltruj tých ktorý boli aj na evaluated measure aj na inom opatrení alebo sa zamestnali a days nie je záporné 
#podmienka -> entrya do ďalšieho projektu musí byť väčšie ako entrya do evaluated measure
#flow$days nemôže byť záporné 
flow <- filter(flow, days >= 0)

#nastroj -> ak dovod vyradenia sa rovna nejakému prvku z vektoru dovodov vyradenia tak -> employed inak another reason
flow  <- flow %>% 
  mutate(nastroj.y = case_when(is.na(flow$nastroj.y) & flow$dovod_vyradenia_kod.x %in% dovod_vyradenia ~ 'employed',
                               is.na(flow$nastroj.y) & !flow$dovod_vyradenia_kod.x %in% dovod_vyradenia ~ 'another reason',
                               !is.na(flow$nastroj.y) ~ flow$nastroj.y)
  )


#dataframe, ktorý budem používať pri tvorbe grafu
Sankey_measure <- flow %>% select(nastroj.x, nastroj.y, days) %>% 
           mutate(month = ceiling(days/30.417))

Sankey_measure <- Sankey_measure %>%mutate(
  time = case_when(
    Sankey_measure$month %in% seq(0,6,1)  ~ 6,
    Sankey_measure$month %in% seq(7,12,1)  ~ 12, 
    Sankey_measure$month %in% seq(13,18,1)  ~ 18,
    Sankey_measure$month %in% seq(19,100,1)  ~ 24,
  ) 
)

Sankey_measure$sources <- ifelse(Sankey_measure$time == 6 | Sankey_measure$time == 12 |
                                   Sankey_measure$time == 18 | Sankey_measure$time == 24, 
                                 Sankey_measure$time - 6, 
                                 Sankey_measure$time)


#zosumarizuj, koľký mladí išli do ktorého opatrenia, zamestnali sa alebo odišli z registra z iných dôvodov
San_measure <- Sankey_measure %>% select(nastroj.y, time, sources) %>%
  group_by(nastroj.y, time, sources) %>% summarise(num = n(), .groups = 'drop') %>%
  rename(nastroj = nastroj.y)


# rozdeľ opatrenia, na tie ostatné almps - OTHER ALMPS 
nastroj_kod <- c('another reason','employed')
NastrojKod <- San_measure[!San_measure$nastroj %in% nastroj_kod,]  %>%  group_by(nastroj) %>% summarise(num = sum(num), .groups = 'drop') %>%
  mutate(perc = num*100 / sum(num),
         cut = case_when(perc >= 5 ~ 1,
                         perc < 5 ~ 0))
nastroj_kod <- c(nastroj_kod, NastrojKod$nastroj[NastrojKod$cut == '1'])

San_aplmps <- subset(San_measure, nastroj %in% nastroj_kod | nastroj == params$measure)
San_other_aplmps <- subset(San_measure, !nastroj %in% nastroj_kod & !nastroj == params$measure)

San_other_aplmps <- San_other_aplmps %>%  group_by(sources , time) %>% summarise(num = sum(num), .groups = 'drop') 
San_other_aplmps$nastroj <- 'OTHER ALMPS'
San_other_aplmps <- relocate(San_other_aplmps, c(nastroj, time), .before = sources,)

San_measure <- rbind(San_aplmps, San_other_aplmps)
remove(San_aplmps, San_other_aplmps)


# uzly grafu (jedinečné), musia tu byť všetky opatrenia
node_m <- data.frame(
  name=c(as.character(San_measure$nastroj), as.character(San_measure$sources))%>% unique()
)

# definovanie koľko registrovaných bude medzi tými rokmi  
velky_df <- data.frame()
for (i in seq(6,24,6)){
  pocet <- San_measure %>%  group_by('sources' = sources >= i) %>% summarise(num = sum(num), .groups = 'drop') 
  pocet <- subset(pocet, sources == TRUE)
  pocet$sources <- i
  velky_df <- rbind(velky_df, pocet)
}

# musím si velky_df prisposobiť tak, aby roky boli ako nodes aby som to mohla spojiť s dataframe San s ktorým potom budem ďalej robiť graf
# preto sources budu ako nastroj -> aby som spravila nodes, years su sources ale sources su years -1 v skutočnosti (v san grafe)
colnames(velky_df) <- c('nastroj', 'num')
velky_df$time <- velky_df$nastroj
velky_df$sources <- San_measure$sources[match(velky_df$time, San_measure$time)] 
velky_df <- relocate(velky_df, num, .after = sources)

San_measure <- rbind(San_measure, velky_df)

#urobím IDsources a ID target podľa uzlov aby garf vedel ten flow medzi jednotlivími uzlami 
San_measure$IDsource <- match(San_measure$sources, node_m$name)-1 
San_measure$IDtarget <- match(San_measure$nastroj, node_m$name)-1

#Color 
time <- seq(0,24,6)
NOALMP <- c('another reason', 'employed')
node_m <- node_m %>% mutate(group = case_when(node_m$name %in% NOALMP ~ 'A',
                                              !node_m$name %in% NOALMP & !node_m$name %in% time ~ 'B',
                                              node_m$name %in% time ~ 'C'
)
)

San_measure$group <- 'type_a'

my_color <- 'd3.scaleOrdinal() .domain(["type_a", "A","B", "C"]) .range(["lightgray", "darkseagreen", "thistle", "rosybrown", "red"])'

San_measure <- as.data.frame(San_measure)

Sankey_ev.measure <- sankeyNetwork(Links = San_measure, Nodes = node_m,
                                   Source = "IDsource", Target = "IDtarget",
                                   Value = "num", NodeID = "name", 
                                   sinksRight=F, fontSize = 14,
                                   fontFamily = "sans-serif",
                                   width = 900,
                                   colourScale=my_color, LinkGroup="group", NodeGroup="group",
                                   nodePadding=10)

condition_for_Sankeyplot <- length(unique(San_measure$IDsource))>=1 && length(unique(San_measure$IDtarget))>=1 && !any(is.na(San_measure$IDsource)) && !any(is.na(San_measure$IDtarget))

The second flow chart shows the further state of the participants of the evaluated programme (further branching of the red line in graph 2) and what happened with them after participation in the evaluated programme. After participation in the programme, the participants can be employed, de-registered for another reason, or they can participate in another ALMP programme. We observed this behaviour for periods of two years after their participation. This period is divided into six months sub-periods (0/6/12/18).

Graph 2.2: Flowchart of participants in evaluation progname participated during evaluation period.

if(condition_for_Sankeyplot){
  Sankey_ev.measure
}

cond_for_table <- FALSE
if(condition_for_Sankeyplot){
  Sankey_tabel2 <- data.frame(Measures = c(setdiff(nastroj_kod, c("another reason", "employed"))))
  Sankey_tabel2$'Name of programme' <- Sankey_description$Labour.market.services[match(Sankey_tabel2$Measures, Sankey_description$almp)]
  
  cond_for_table <- nrow(Sankey_tabel2) >=1
}

The next table shows full names of programmes shown in graph above.

Table 1.2: Explanatory table for Graph 2.2

if (cond_for_table){
  Sankey_tabel2  %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE)%>%
    kable_classic('hover', full_width = FALSE)%>%
    column_spec(1,  border_right = TRUE)
}

Measures	Name of programme
P053	Commuting allowance
P54O	[Component] Projects and programmes - Self-employment
P54R	[Component] Projects and programmes - Projects - Education and training Repas

2. Data and the evaluation sample

This evaluation report is based on administrative data on registered unemployed jobseekers in Slovakia, inter-linkable with the database of participants in ALMP programmes. The export was provided by the Central Office of Labour, Social Affairs and Family of the Slovak Republic (COLSAF) at the beginning of 2021. It covered a period from January 2014 until December 2020. The raw data were processed using a data-preparation script, available on request from: miroslav.stefanik@savba.sk.
The data frame “df” covers all unemployment spells of unemployed jobseekers, with attributes collected at the moment of their registration as unemployed jobseekers (Application Form).

### DEFINE THE EVALUATION PERIOD #
ep_start <- as.Date(params$ep_start)
ep_end <- as.Date(params$ep_end)
un_spell <- spell
measure <- params$measure

    ########################################x
    ## SELECTING THE EVALUATION SAMPLE #
    ########################################x,
    
treated<-filter(almps, nastroj==toString(params$measure))

#Sub-groups to be dropped: 
# - those with ALMP participation 2 years before the EP
IDalmps_before<-unique(almps$klient_id[almps$entrya<ep_start & almps$entrya>=ep_start-730]) 
# - those with ALMP participation in other ALMP during the EP
IDalmps_during_ep<-unique(almps$klient_id[(almps$entrya<=ep_end & almps$entrya>=ep_start) & almps$nastroj!=toString(params$measure)])

###DEFINE THE ELIGIBILITY CRITERIA 
#the EC are measure specific, in the case of looping over multiple measures EC need to be elaborated t a form of table or a list and added to the parameters
#SUBSETTING THE BASE EVALUATION DATASET OF ELIGIBLE 
cond0<-as.logical(df$entry<=ep_end & df$exit>=ep_start) # Being on the register of unemployed during the evaluation period
cond1<-as.logical(df$age < age_group_max) 
cond2<-as.logical((df$exit-df$entry)>=un_spell) #LENGTH OF PREVIOUS UNEMPLOYMENT SPELL
cond3<-as.logical(df$entry>=ep_start-730) # Dropping old unemployment spells (cases inflowing more than 730 days before the start of the evaluation period)

dfe<-df[cond0 & cond1 & cond2 & cond3,]
n1 <- dim(dfe)[1]
sampleIDs<-unique(df$klient_id[cond0 & cond1 & cond2 & cond3])
n2 <- length(sampleIDs)

###
#### ONLY KEEP THE SPELLS OF PARTICIPANTS DURING WHICH THEY PARTICIPATED 
#### Creating dataframe of participants in the evaluated programme during the evaluation period.
dfa<-filter(treated, entrya<=ep_end & entrya>=ep_start)
npart0<-length(unique(dfa$klient_id))
npart1<-dim(dfa)[1]

#Drop other ALMP participations from the group of participants as well as the eligible non-participants
n3 <- nrow(filter(dfa, klient_id %in% IDalmps_before))
n4 <- nrow(filter(dfe, klient_id %in% IDalmps_before))
dfa<-filter(dfa, !klient_id %in% IDalmps_before)
dfe<-filter(dfe, !klient_id %in% IDalmps_before)

n5 <- nrow(filter(dfa, klient_id %in% IDalmps_during_ep))
n6 <- nrow(filter(dfe, klient_id %in% IDalmps_during_ep))
dfa<-filter(dfa, !klient_id %in% IDalmps_during_ep)
dfe<-filter(dfe, !klient_id %in% IDalmps_during_ep)



#### Only participants with one participation during the evaluation period are sampled. 
#### JS with multiple participations are droped from the sample
dfa<-dfa %>%
  group_by(klient_id) %>% 
  mutate(rep=n()) # rep is the number of participations of one JS repeating during 2014

n7 <- nrow(filter(dfa, rep!=1))
dfa<-filter(dfa, rep==1)
npart2<-dim(dfa)[1] # Number of participants after cleaning with multiple ALMP participations

###Participants who also participated in other ALMP measures (§54) are dropped
progOUT<- setdiff(c("P050", "P50A", "P50C","P50J", "P50K" ,"P51A", "P054", "P54D", "P54E", "P54O", "P54P", "P54U"),params$measure)
outIDs<-unique(almps$klient_id[(almps$entrya>=as.Date(params$ep_start) & almps$entrya<=as.Date(params$ep_end)+730) & as.logical(almps$nastroj %in% progOUT)])

n8 <- nrow(filter(dfa, klient_id %in% outIDs))
n9 <- nrow(filter(dfe, klient_id %in% outIDs))

dfa<-(filter(dfa, !klient_id %in% outIDs))
dfe<-(filter(dfe, !klient_id %in% outIDs))

npart3<-length(unique(dfa$klient_id)) # The number of participants after we drop participations in supported employment during the outcome observation period 

#MERGING PARTICIPATIONS AND UNEMPLOYMENT SPELLS
#First we add the date of the entry and exit from the registration into the table of participations in measure (evaluated measure params$measure). We only import entry dates for the individuals in the evaluation sample. 
dfa<-merge(dfa, select(dfe, klient_id, entry, exit), by="klient_id", all.x = TRUE)
nrowdfa <- nrow(dfa)

#Second we filter only the registrations of members of the evaluation sample during which the programme participation took place. 
dfa<-filter(dfa, dfa$exit+30>=dfa$entrya & dfa$entrya<=ep_end & entrya >= entry) # Keeping only the participations happening during an unemployment spell
n10 <- nrowdfa-dim(dfa)[1]
npart4<-dim(dfa)[1] # Number of participants after cleaning participations outside an unemployment spell (data quality issue)

## Participants #
particIDs<-unique(dfa$klient_id)
## Eligible #
# nonpartIDs<-sampleIDs[!(sampleIDs %in% particIDs)]
nonpart<-filter(dfe, !(klient_id %in% particIDs))
###Out of the participants only one-time participations happening during an unemployment spell are used 
partic<-merge(dfe, dfa, by = c("klient_id", "entry"), all.x = FALSE)

#LL: sanity check
#nonpart$klient_id
#partic$klient_id
#intersect(nonpart$klient_id,partic$klient_id)
#this should be empty. OK

### Cleaning and renaming #
partic$exit.y<-NULL
partic<-partic %>% rename(exit=exit.x)

nonpart$entrya<-NA
nonpart$exita<-NA
nonpart$nastroj<-NA
nonpart$naklady<-NA
nonpart$projekt<-NA
partic$rep<-NULL

#Filter extreme values (1%) of the waiting time until participation in the evaluated measure 
wte<-quantile(as.numeric(partic$entrya)-as.numeric(partic$entry),na.rm=TRUE, probs=0.99)
n11 <- nrow(filter(filter(partic, as.numeric(entrya)-as.numeric(entry)>wte)))
partic<-filter(partic, as.numeric(entrya)-as.numeric(entry)<=wte)

esample<-rbind(nonpart, partic)
esample$treated<-!is.na(esample$entrya)
#Filter extreme values of the waiting time until participation in the evaluated measure 

#LL:
#summary(esample$treated)

###Unemployment spells ending with LM placements
#esample<-filter(esample, dovod_vyradenia_kod == 'V01' | dovod_vyradenia_kod == 'V02' | dovod_vyradenia_kod == 'V03' | dovod_vyradenia_kod == 'V1' | dovod_vyradenia_kod == 'V12' | dovod_vyradenia_kod == 'V15')

    ########################################x
    ## GENERATING EXPLANATORY VARIABLES #
    ########################################x,
    esample$ent <- as.numeric(as.Date(ep_start))-as.numeric(as.Date(esample$entry))
    
    #LL: quantile(esample$ent)
    #niektore unemployment spells boli 
    # negativne (JS zacal byt nezamestnany po 1-1-2017), 
    # niektore pozitivne (JS zacal byt nezamestnany pred 1-1-2017)

    ########AGEG
    esample$ageg <- cut_interval(esample$age, 5, labels=FALSE)
    esample$ageg <- as.factor(esample$ageg)
    
    ####Extra columns for dummy variables go into the esample_est for further testing
    esample <- dummy_cols(esample, select_columns = c("ageg"), remove_first_dummy = TRUE)
    ageg_dummies<-colnames(esample)[grepl("ageg_", colnames(esample))]

  #######Regional Unemployment rate during the implementation period
    esample[, "UR_region"] <- esample[, paste0("UR_region_",year(ep_start), "")]
    esample[,grepl("UR_region_", colnames(esample))]<-NULL
    #esample$UR_region <- as.numeric(gsub(",", ".", gsub("\\.", "", esample$UR_region)))
    
    ##########Difference between entry into unemployment register and started of participation in measure
    esample$diff_entry <- ceiling(as.integer(as.Date(esample$entrya) - as.Date(esample$entry))/30.417)

#Cleaning
#nonpart<-NULL
#partic<-NULL




# Share of repeated unemployment after participation in ALMP
# Merge esample, history table
h_esample <- merge(select(esample, klient_id, entry, exit, dovod_vyradenia_kod, entrya, exita, nastroj, treated), dfh, by = 'klient_id', all.x = TRUE, all.y = FALSE)

# as.Date
entry <- paste('entry', seq(1:11), sep = '')
exit <- paste('exit', seq(1:11), sep = '')

for (e in entry){
  h_esample[ ,e] <- as.Date(h_esample[ ,e], origin = '1970-01-01')
}
for (x in exit){
  h_esample[ ,x] <- as.Date(h_esample[ ,x], origin = '1970-01-01')
}
h_esample$exita <- as.Date(h_esample$exita, origin = '1970-01-01')

#Dropping cases with over than 15 unemployment spells 
#h_esample<-filter(h_esample, is.na(entry16)) #subset (klient_id) jobseekers who became unemployment only 15 times 
#esample <- subset(esample, klient_id %in% h_esample$klient_id) 
#remove entry16+ and exit16+ columns
#h_esample<-select(h_esample, -entry16, -entry17, -entry18, -entry19, -entry20, -exit16, -exit17, -exit18, -exit19, -exit20)

esample_condition <- (mean(esample$treated) < 0.00001)

if(esample_condition) {
  knitr::knit_exit()
}

2.1 Description of participants and eligible

Our evaluation sample consists of 300 626 eligible individuals, registered as jobseekers during the evaluation period starting from 2017-01-01 until 2017-12-31, These jobseekers have generated in total 316 875 unemployment registrations during the period 2014-2020. Out of them, in 2 665 jobseekers participated in the evaluated measure Support for hiring disadvantaged job seekers during the evaluation period. 1 475 participants and 127 311 eligible JS were dropped from the sample, because of multiple participations in the evaluated program (or other relevant ALMPs) during and after the evaluation period. After cleaning we have sampled 1 190 JS with one-time participation during the evaluation period. At the same time, there was 189 564 eligible non-participants present in the register of unemployed jobseekers during 2017.

The group of participants and eligible show differences in a number of observed characteristics. Table 2 displays an overview of these differences on mean values of selected characteristics.

Table 2: Descriptive statistics of participants and eligible (selected characteristics)

####
## number of participants and eligible 
####

#separate table dfe with participants in ALMP  and eligible 
elig <- distinct_at(nonpart,vars(klient_id),.keep_all = TRUE)
part <- distinct_at(partic,vars(klient_id),.keep_all = TRUE)

#BASIC DESCRIPTIVE TABLE

# The number of participants and eligible in the sample'
#tab1<-cbind(sum(!is.na(dfe$entrya)),sum(is.na(dfe$entrya)))
tab1 <- cbind(format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
colnames(tab1) <- c('Participants', 'Eligible')
tab1 <- data.frame(cbind(Description = 'Number of observations', tab1))

####
## Age distribution
####

age_par <- part %>% select(age) %>% group_by(age) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

age_elig <- elig %>% select(age) %>% group_by(age) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

age <- merge(age_par, age_elig, by='age', all = TRUE)
age$Participants_percent <- ifelse(is.na(age$Participants_percent), paste(0,'%'), age$Participants_percent)
age$Participants_total <- ifelse(is.na(age$Participants_total), 0, age$Participants_total)

mean_age_par <- round(mean(part$age),1)
mean_age_elig <- round(mean(elig$age),1)

##### MEAN 
mean_age<-data.frame(mean_age_par,mean_age_elig)
mean_age <- cbind(Description = 'Age (years)', mean_age)
mean_age <- rename(mean_age, Participants = mean_age_par, Eligible=mean_age_elig)
####

age_elig$desc <- 'Eligible'
age_par$desc <- 'Participants'
age_elig <- age_elig %>% rename(total = Eligible_total, percent = Eligible_percent)
age_par <- age_par %>% rename(total = Participants_total, percent = Participants_percent)
age_r <- rbind(age_par,age_elig)

age_plot <- ggplot(age_r, aes(x = age, y = total, group= desc)) +
  geom_point(aes(color = desc), size = 1.5)+
  geom_line(aes(color = desc), size = 1) + 
  ylim(0,23000) +
  theme_light() +
  geom_text(aes(label = paste(percent, '\n\n' )), col ='black', size = 3, fontface ='italic')+
  labs(
    title = "Compare of age distribution (%)",
    x = "Age",
    y = "Total Count"
  )

####
## Gender distribution
####

gender_par <- part %>% select(male) %>% group_by(male) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

gender_elig <- elig %>% select(male) %>% group_by(male) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

gender <- merge(gender_par, gender_elig, by='male', all = TRUE)

####
male <- data.frame(gender_par[2,3], gender_elig[2,3])
male <- cbind(Description = 'Male', male)
male <- rename(male, Participants = Participants_percent, Eligible=Eligible_percent)


####
##  Education distribution
####

education_par <- part %>% select(noedu, primary, lsec, usec, tertiary) %>% 
  group_by(noedu, primary, lsec, usec, tertiary) %>% dplyr::summarise(Participants_total = n(), .groups = 'drop')  %>%
  mutate(Participants = round(100 * Participants_total / sum(Participants_total),2))
education_par <- education_par[!(is.na(education_par$noedu)),]
education_par <- reshape2::melt(education_par, measure.vars = c('noedu', 'primary', 'lsec', 'usec', 'tertiary'))
education_par <- education_par[education_par$value == 1,] 
education_par <- select(education_par, -value)

education_elig <- elig %>% select(noedu, primary, lsec, usec, tertiary) %>% 
  group_by(noedu, primary, lsec, usec, tertiary) %>% dplyr::summarise(Eligible_total = n(), .groups = 'drop')  %>%
  mutate(Eligible = round(100 * Eligible_total / sum(Eligible_total),2))
education_elig <- education_elig[!(is.na(education_elig$noedu)),]
education_elig <- reshape2::melt(education_elig, measure.vars = c('noedu', 'primary', 'lsec', 'usec', 'tertiary'))
education_elig <- education_elig[education_elig$value == 1,] 
education_elig <- select(education_elig, -value)

education <- merge(education_par, education_elig, by='variable', all = TRUE)
education <- select(education, variable, Participants, Eligible)
education <- rename(education, Description = variable)

education <- education%>%mutate(
  Description = case_when(
    education$Description ==  'noedu' ~ 'No education',    
    education$Description ==  'primary' ~ 'Primary', 
    education$Description ==  'lsec' ~ 'Lower secondary',
    education$Description ==  'usec' ~ 'Upper secondary',
    education$Description ==   'tertiary' ~ 'Tertiary', 
    TRUE~as.character(education$Description)
  ) 
)

education <- education %>% group_by(Description) %>%
  dplyr::summarise(Participants = paste0(sum(Participants),  "%"), Eligible = paste0(sum(Eligible),  "%")) 

x <- c('No education','Primary','Lower secondary', 'Upper secondary', 'Tertiary')
education <- education[match(x, education$Description),]

####
##  skills
####

l_skills_par <- part %>%  select(flang) %>%  
  mutate(flang = case_when(part$flang == 1 ~ 'Foreign language')) %>%  
  group_by(flang) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = flang) %>% filter(row_number() == 1L)

l_skills_elig <- elig %>%  select(flang) %>%  
  mutate(flang = case_when(elig$flang == 1 ~ 'Foreign language')) %>%  
  group_by(flang) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = flang) %>% filter(row_number() == 1L)

PC_skills_par <- part %>%  select(pc) %>%  
  mutate(pc = case_when(part$pc == 1 ~ 'PC skill')) %>%  
  group_by(pc) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = pc) %>% filter(row_number() == 1L)

PC_skills_elig <- elig %>%  select(pc) %>%  
  mutate(pc = case_when(elig$pc == 1 ~ 'PC skill')) %>%  
  group_by(pc) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = pc) %>% filter(row_number() == 1L)

d_skills_par <- part %>%  select(drive) %>%  
  mutate(drive = case_when(part$drive == 1 ~ 'Driving license')) %>%  
  group_by(drive) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>%
  rename(Description = drive) %>% filter(row_number() == 1L)

d_skills_elig <- elig %>%  select(drive) %>%  
  mutate(drive = case_when(elig$drive == 1 ~ 'Driving license')) %>%  
  group_by(drive) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = drive) %>% filter(row_number() == 1L)


l_skills <- merge(l_skills_par, l_skills_elig, by='Description', all = TRUE)
PC_skills <- merge(PC_skills_par, PC_skills_elig, by='Description', all = TRUE)
d_skills <- merge(d_skills_par, d_skills_elig, by='Description', all = TRUE)

skills <- rbind(l_skills, PC_skills, d_skills)
skills <- select(skills, Description, Participants_percent, Eligible_percent)
skills <- rename(skills, Participants = Participants_percent, Eligible=Eligible_percent)

####
##  region
####

part <- part %>% mutate(
  okres = case_when(
    grepl('SK010',okres)  ~ 'Bratislavský', 
    grepl('SK021',okres)  ~ 'Trnavský', 
    grepl('SK022',okres)  ~ 'Trenčiansky', 
    grepl('SK023',okres)  ~ 'Nitriansky', 
    grepl('SK031',okres)  ~ 'Žilinský', 
    grepl('SK032',okres)  ~ 'Banskobystrický', 
    grepl('SK041',okres)  ~ 'Prešovský', 
    grepl('SK042',okres)  ~ 'Košický', 
    TRUE~as.character(okres)
  )
)

elig <- elig %>% mutate(
  okres = case_when(
    grepl('SK010',okres)  ~ 'Bratislavský', 
    grepl('SK021',okres)  ~ 'Trnavský', 
    grepl('SK022',okres)  ~ 'Trenčiansky', 
    grepl('SK023',okres)  ~ 'Nitriansky', 
    grepl('SK031',okres)  ~ 'Žilinský', 
    grepl('SK032',okres)  ~ 'Banskobystrický', 
    grepl('SK041',okres)  ~ 'Prešovský', 
    grepl('SK042',okres)  ~ 'Košický', 
    TRUE~as.character(okres)
  )
)

okres_par <- part %>% select(okres) %>% group_by(okres) %>% dplyr::summarise(Participants_total = n())  %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2), "%"))

okres_elig <- elig %>% select(okres) %>% group_by(okres) %>% dplyr::summarise(Eligible_total = n())  %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%"))

okres <- merge(okres_par, okres_elig, by='okres', all = TRUE)
okres <- select(okres, okres, Participants, Eligible)
okres <- rename(okres, Description = okres)
okres <- okres[okres$Description != 'N/A',]

####
##  Previous employment
####

prev_emp_part <- part  %>% select(empl) %>% group_by(empl) %>% summarise(Participants_total = n())  %>%
  mutate(Participants_percent = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>% 
  rename(Description = empl, Participants = Participants_percent)  

prev_emp_elig <- elig  %>% select(empl) %>% group_by(empl) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = empl, Eligible = Eligible_percent)

prev_emp <- merge(prev_emp_part, prev_emp_elig, by='Description', all = TRUE)
prev_emp <- select(prev_emp, Description, Participants, Eligible)
prev_emp <- prev_emp[prev_emp$Description == 1,] 
prev_emp$Description[prev_emp$Description == 1 ] <- 'Previous employment'

####
##  Nationality
####

nat_part <- part  %>% select(slovak, hungarian, roma, czech, othern) %>% 
  group_by(slovak, hungarian, roma, czech, othern) %>% summarise(Participants_total = n(), .groups = 'drop') %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%')) 
nat_part$othern <- ifelse(nat_part$othern == 'TRUE', 1,0)
nat_part <- reshape2::melt(nat_part, measure.vars = c('slovak', 'hungarian', 'roma', 'czech', 'othern'))
nat_part <- nat_part[nat_part$value == 1,] 
nat_part <- select(nat_part, -value)

nat_elig <- elig  %>% select(slovak, hungarian, roma, czech, othern) %>% 
  group_by(slovak, hungarian, roma, czech, othern) %>% summarise(Eligible_total = n(), .groups = 'drop') %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2),'%')) 
nat_elig$othern <- ifelse(nat_elig$othern == 'TRUE', 1,0)
nat_elig <- reshape2::melt(nat_elig, measure.vars = c('slovak', 'hungarian', 'roma', 'czech', 'othern'))
nat_elig <- nat_elig[nat_elig$value == 1,] 
nat_elig <- select(nat_elig, -value)

nat <-  merge(nat_part, nat_elig, by='variable', all = TRUE)
nat <- select(nat, variable, Participants, Eligible)
nat <- rename(nat, Description = variable)

nat <- nat %>%mutate(
  Description = case_when(
    nat$Description == 'slovak' ~ 'Slovak', 
    nat$Description == 'hungarian' ~ 'Hungarian', 
    nat$Description == 'czech'~ 'Czech', 
    nat$Description == 'roma' ~ 'Roma', 
    nat$Description == 'othern'~ 'Other', 
  ) 
)


x <- c('Slovak','Hungarian', 'Czech','Roma','Other')
nat <- nat[match(x, nat$Description),]

####
##  Length of the unemployment spell
####

part$un_spell <- as.integer(part$exit - part$entry)
elig$un_spell <- as.integer(elig$exit - elig$entry)

un_spell <- cbind(round(mean(part$un_spell),2), round(mean(elig$un_spell),2))
un_spell <- data.frame(cbind(Description = 'Length of the unemployment spell', un_spell))
un_spell <- rename(un_spell, Participants = V2, Eligible=V3)


####
##  Length of spell between unemployment and participation
####

#Rozdiel medzi evidenciou nezamestnanosti a nastúpenia do AOTP 
part$spell_b <- as.integer(part$entrya - part$entry)

spell_bup <- part %>% select(spell_b) %>% 
  mutate(month = ceiling(spell_b/30.417)) #roundup -> ceiling

spell_bup_p <- ggplot(data=spell_bup, aes(month)) + 
  geom_histogram(binwidth=1, fill="grey", color="black", alpha=0.9) +
  theme_light() +
  theme(legend.position = "none")+
  labs(
    title = "Inflow into the programme in months\nsince the start of the unemployment",
    x = "Months",
    y = "Total Count"
  )  + 
  scale_x_continuous()

####
##  Length of AOTP
####

part$spell_aotp <- as.integer(part$exita - part$entrya)

spell_aotp <- part %>% select(spell_aotp) %>% 
  mutate(month = ceiling(spell_aotp/30.417)) #roundup -> ceiling

spell_aotp_p <- ggplot(data=spell_aotp, aes(month)) + 
  geom_histogram(binwidth=1, fill="grey", color="black", alpha=0.9) +
  theme_light() +
  theme(legend.position = "none") +
  labs(
    title = "Length of participation\n (in months)",
    x = "Months",
    y = "Total Count"
  ) + 
  scale_x_continuous(breaks = scales::breaks_extended(length(unique(spell_aotp$month))))


####
##  Compare of length spell
####

spell_p <- ggarrange(spell_bup_p, spell_aotp_p)


####
##  In flow
####

a <- format(seq(as.Date(ep_start),length=3,by="1 month"),"%Y-%m")
b <- format(seq((ymd(as.Date(ep_start)) %m+% months(3)),length=3,by="1 month"),"%Y-%m")
c <- format(seq((ymd(as.Date(ep_start)) %m+% months(6)),length=3,by="1 month"),"%Y-%m")
d <- format(seq((ymd(as.Date(ep_start)) %m+% months(9)),length=3,by="1 month"),"%Y-%m")

#vstúpili do programu
in_part <- part %>% select(entrya) %>% group_by(format(as.Date(entrya),"%Y-%m")) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants_percent = round(100 * Participants_total / sum(Participants_total),2)) 
colnames(in_part)[1] <- 'Description'
in_part <- filter(in_part, str_detect(in_part$Description, (format(as.Date(ep_start),"%Y"))))

in_part <- in_part%>%mutate(
  Description = case_when(
    in_part$Description %in% a ~  paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% b ~  paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% c ~  paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    in_part$Description %in% d ~  paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

in_part <- in_part %>% select(Description, Participants_total) %>% group_by(Description) %>%
  summarise(Participants_total = sum(Participants_total)) %>% 
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%'))

#sa stali nezamestnaný 
in_elig <- elig %>% select(entry) %>% group_by(format(as.Date(entry),"%Y-%m")) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 
colnames(in_elig)[1] <- 'Description'
in_elig <- filter(in_elig, str_detect(in_elig$Description, (format(as.Date(ep_start),"%Y"))))

in_elig <- in_elig%>%mutate(
  Description = case_when(
    in_elig$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    in_elig$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

in_elig <- in_elig %>% select(Description, Eligible_total) %>% group_by(Description) %>%
  summarise(Eligible_total  = sum(Eligible_total)) %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 

inflow <- merge(in_part, in_elig, by='Description', all = FALSE)
inflow <- select(inflow, Description, Participants, Eligible)


####
##  Outflow
####

#vystúpili z programu

out_part <- part %>% select(exita) %>% group_by(format(as.Date(exita),"%Y-%m")) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants_percent = round(100 * Participants_total / sum(Participants_total),2)) 
colnames(out_part)[1] <- 'Description'
out_part <- filter(out_part, str_detect(out_part$Description, (format(as.Date(ep_start),"%Y"))))

out_part <- out_part%>%mutate(
  Description = case_when(
    out_part$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    out_part$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

out_part <- out_part %>% select(Description, Participants_total) %>% group_by(Description) %>%
  summarise(Participants_total = sum(Participants_total)) %>% 
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2),'%'))

#vystúpili z evidencie -> zamestnali sa 
out_elig <- elig %>% select(exit) %>% group_by(format(as.Date(exit),"%Y-%m")) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible_percent = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 
colnames(out_elig)[1] <- 'Description'
out_elig <- filter(out_elig, str_detect(out_elig$Description, (format(as.Date(ep_start),"%Y"))))

out_elig <- out_elig%>%mutate(
  Description = case_when(
    out_elig$Description %in% a ~ paste0('1Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% b ~ paste0('2Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% c ~ paste0('3Q.', (format(as.Date(ep_start),"%Y"))), 
    out_elig$Description %in% d ~ paste0('4Q.', (format(as.Date(ep_start),"%Y"))), 
  ) 
)

out_elig <- out_elig %>% select(Description, Eligible_total) %>% group_by(Description) %>%
  summarise(Eligible_total  = sum(Eligible_total)) %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) 

outflow <- merge(out_part, out_elig, by='Description', all = FALSE)
outflow <- select(outflow, Description, Participants, Eligible)


####
##  Children in the household
####

child_part <- part  %>% select(kids) %>% group_by(kids) %>% summarise(Participants_total = n())  %>%
  mutate(Participants = paste0(round(100 * Participants_total / sum(Participants_total),2), "%")) %>% 
  rename(Description = kids)  

child_elig <- elig  %>% select(kids) %>% group_by(kids) %>% summarise(Eligible_total = n())  %>%
  mutate(Eligible = paste0(round(100 * Eligible_total / sum(Eligible_total),2), "%")) %>%
  rename(Description = kids)

child <- merge(child_part, child_elig, by='Description', all = TRUE)
child <- select(child, Description, Participants, Eligible)
child <- child[child$Description == 1,] 
child$Description[child$Description == 1] <- 'Children in the household'


####
##  Fields of study
####

study_part <- part  %>% select(odbor) %>% group_by(odbor) %>%
  summarise(Participants_total = n()) %>%
  mutate(Participants = round(100 * Participants_total / sum(Participants_total),2)) 

study_elig <- elig  %>% select(odbor) %>% group_by(odbor) %>%
  summarise(Eligible_total = n()) %>%
  mutate(Eligible = round(100 * Eligible_total / sum(Eligible_total),2)) 

study <- merge(study_part, study_elig, by='odbor', all = TRUE)
study <- select(study, odbor, Participants, Eligible)
study$Participants <- ifelse(is.na(study$Participants), 0, study$Participants)

a <- as.character(c(seq(11,19,1)))
b <- as.character(c(seq(21,39,1)))
c <- as.character(c(seq(41,49,1)))
d <- as.character(c(seq(51,59,1)))
e <- as.character(c(seq(61,79,1)))
f <- as.character(c(seq(81,89,1)))
g <- as.character(c(seq(91,98,1)))

study <- study%>%mutate(
  odbor = case_when(
    study$odbor %in% a ~ 'Natural Science', 
    study$odbor %in% b ~ 'Technical sciences', 
    study$odbor %in% c ~ 'Agricultural, forestry and veterinary sciences', 
    study$odbor %in% d ~ 'Medical and pharmaceutical sciences', 
    study$odbor %in% e ~ 'Social sciences and services', 
    study$odbor %in% f ~ 'Sciences of culture and art', 
    study$odbor %in% g ~ 'Military and security sciences',
    study$odbor == 99 || study$odbor == 0 || study$odbor == 10 ~ 'General sciences and services ',
    TRUE~as.character(study$odbor)
  ) 
)

study <- study %>% select(odbor, Participants, Eligible) %>%
  group_by(odbor)  %>% 
  summarise(Participants = sum(Participants), Eligible = sum(Eligible))  %>%
  mutate(Participants = paste0(Participants, "%")) %>%
  mutate(Eligible = paste0(Eligible, "%")) 
study <- rename(study, Description = odbor)


####
##  SUMMARIZE ####
####

colnames(tab1)<-c("Description", "Participants", "Eligible")
colnames(mean_age)<-c("Description", "Participants", "Eligible")
colnames(male)<-c("Description", "Participants", "Eligible")
colnames(prev_emp)<-c("Description", "Participants", "Eligible")
colnames(un_spell)<-c("Description", "Participants", "Eligible")
colnames(child)<-c("Description", "Participants", "Eligible")

tables <- c('tab1', 'mean_age', 'male', 'prev_emp', 'un_spell', 'child')
basics <- data.frame()

for (name in tables){
  table <- get(name)
  table <- mutate(table, across(everything(), as.factor))
  basics <- bind_rows(basics, table)
}

tables <- c('basics', 'education', 'study', 'skills', 'okres', 'nat', 'inflow', 'outflow')

for (name in tables){
  table <- get(name)
  table <- add_column(table, Variable = name, .after = "Eligible")
  colnames(table) <- c("Description", "Participants", "Eligible", "Variable")
  assign(name, table)
}


sum_table <- rbind(basics, education, study, skills, okres, nat, inflow, outflow)
sum_table <- sum_table %>% relocate(Variable, .before = Description) %>%mutate(
  Variable = case_when(
    sum_table$Variable == 'basics' ~ 'Basics',
    sum_table$Variable == 'education' ~ 'Education level',
    sum_table$Variable == 'study' ~ 'Field of study',
    sum_table$Variable == 'skills' ~ 'Skills',
    sum_table$Variable == 'okres' ~ 'Region',
    sum_table$Variable == 'nat' ~ 'Nationality',
    sum_table$Variable == 'inflow' ~ 'Inflow',
    sum_table$Variable == 'outflow' ~ 'Outflow',
    TRUE ~ as.character(sum_table$Variable)
  )
) 

sum_table$Participants <- ifelse(is.na(sum_table$Participants) | gsub('\\D','', sum_table$Participants) == "", 
                                 ifelse(is.na(sum_table$Participants) | gsub('\\D','', sum_table$Participants) == "", 
                                        ifelse(str_detect(as.character(sum_table$Eligible), regex("%")), '0%',0), 
                                        as.character(sum_table$Participants)),
                                 as.character(sum_table$Participants))

sum_table[,2:4]  %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE) %>%
  pack_rows(index = table(fct_inorder(sum_table$Variable)))

Description	Participants	Eligible
Basics
Number of observations	1 190	189 564
Age (years)	39.8	38.1
Male	41.68%	50.61%
Previous employment	12.1%	9.02%
Length of the unemployment spell	865.45	337.65
Children in the household	20.25%	12.6%
Education level
No education	0.34%	0.81%
Primary	7.73%	14.78%
Lower secondary	35.13%	29.28%
Upper secondary	42.35%	34.11%
Tertiary	14.45%	21.01%
Field of study
Agricultural, forestry and veterinary sciences	6.05%	4.54%
General sciences and services	19.92%	34.76%
Medical and pharmaceutical sciences	2.01%	1.53%
Military and security sciences	0.75%	0.36%
Natural Science	0.58%	0.69%
Sciences of culture and art	0.75%	1.09%
Social sciences and services	29.65%	25.43%
Technical sciences	40.25%	31.6%
Skills
Foreign language	60.84%	58.72%
PC skill	56.97%	51.97%
Driving license	62.44%	52.49%
Region
Banskobystrický	16.05%	12.38%
Bratislavský	2.1%	11.64%
Košický	21.43%	15.3%
Nitriansky	9.83%	12.35%
Prešovský	31.34%	17.74%
Trenčiansky	6.3%	9.67%
Trnavský	3.03%	8.74%
Žilinský	9.92%	12.18%
Nationality
Slovak	90.5%	90.17%
Hungarian	7.98%	8.52%
Czech	0.5%	0.47%
Roma	0%	0.19%
Other	1.01%	0.65%
Inflow
1Q.2017	24.37%	27.72%
2Q.2017	33.11%	24.83%
3Q.2017	27.56%	24.21%
4Q.2017	14.96%	23.24%
Outflow
1Q.2017	2.44%	28.65%
2Q.2017	17.68%	31.05%
3Q.2017	29.27%	21.66%
4Q.2017	50.61%	18.65%

Graph 3: Timing and length of participation in the evaluated program

require(gridExtra)
if (nrow(spell_bup) > 5){
  mx <- max(spell_bup$month)
  mn <- min(spell_bup$month)
  plot1 <- ggplot(data=spell_bup, aes(month)) + 
    geom_histogram(binwidth=3, bins=30, breaks=seq(mn, mx, by=0.5), alpha=0.8,col="white", fill="steelblue3")  +
    theme_minimal()+
    ggtitle("Inflow into the program in months\nsince the start of the unemployment") +
    xlab("Months") +
    ylab("Total Count")+ 
    scale_x_continuous(breaks= pretty_breaks())
  
}

if (nrow(spell_aotp) > 5){
  
  mx <- max(spell_aotp$month)
  mn <- min(spell_aotp$month)
  plot2 <- ggplot(data=spell_aotp, aes(month)) + 
    geom_histogram(binwidth=3, bins=30, breaks=seq(mn, mx, by=0.5), alpha=0.8,col="white", fill="steelblue3")  +
    theme_minimal()+
    ggtitle("Length of participation\n(in months)") +
    xlab("Months") +
    ylab("Total Count")+ 
    scale_x_continuous(breaks= pretty_breaks())
  
}
grid.arrange(plot1, plot2, ncol=2)

3. The impact of participation in the Support for hiring disadvantaged job seekers during 2017

The ultimate objective of ALMP programmes is improving employment chances of its beneficiaries. Constrained by the data provided, we construct four outcome indicators. All four of them are based on the presence in (absence from) the register of JSs administrated by COLSAF. They are, therefore, proxies of the employment status.

#Estimation parameters
#LL: there will be four different samples according to how long have JSs been unemployed prior to receiving the training
Ssamples <- seq(1,4)

# Month of participation since the start of the evaluation period
participation_month<-((year(as.Date(esample$entrya))-min(year(ep_start)))*12)+month(as.Date(esample$entrya)) 

#LL: participation quarter, if JS was treated on 3 Feb, pcpQ == 1
pcpQ<-ceiling(participation_month/3)
max_pcpQ<-max(pcpQ, na.rm = TRUE)

#LL: create an object that will store data
Mdata<-c()

#LL: these are the periods that we will look at
# the negative values correspond to "placebo" effects. We should not see any effect there, or only a small one.
OQm <- seq(-12,36,3)
OQ <- -4:12
O_vars <- c(paste("empl",OQ, sep=""), "firstempl", "cumempl")
O_vars<-str_replace(O_vars, "-", ".")
#LL: O_vars stores outcome variables
# empl.2 means employment 2 quarters before the start of the evaluation period.
# empl0 will correspond to the last quarter of the year before the start of the eval. period
# empl1 will correspond to the first quarter of the start year of the eval. period


#LL: This is a list of baseline covariates that will be used throughout the analysis
# they correspond to a reasonable minimum of information that should be controlled for
# in order to have a meaningful comparison
list_vars <- c('ent', 'male', 'married','kids',
             'slovak', 'noedu','primary', 'lsec', 'usec',
             'flang', 'drive', 'pc',
             'unpast', 'min_urad', 'min_BA',
             'UR_region', 'roma_share', 'population', 'age')

#LL: In order to have a credible comparison groups. We need to look how similar the groups are, how balanced they are.
# we compare the mean differences BEFORE adjustment and AFTER adjustment
Balance_vars <- list_vars

#LL: We allocate objects that will store the results

#LL: number of treated units
N<-nrow(esample[esample$treated==TRUE,])

#LL: number of treated units in a particular esample
N_sp <- matrix(NA, nrow=length(Ssamples))  

#LL: results array ATT. We are interested in the average treatment effect on the TREATED subpopulation.
# in this particular case, most LM programs are intended for a specific subpopulation.
# it is therefore of less interest to look at the whole population of JSs (ATE)
resultsArray_ATT  <- array(NA, dim=c(length(O_vars),length(Ssamples))) 
dimnames(resultsArray_ATT)[[1]] <- c(O_vars)

#LL: we also store standard errors that quantify STATISTICAL uncertainty of our estimates
resultsArray_se  <- array(NA, dim=c(length(O_vars),length(Ssamples))) 
dimnames(resultsArray_se)[[1]] <- c(O_vars)
results <- array(NA, dim=c(length(O_vars),2))

### 1. Counting of ATTs 
Sesample <- esample

###Four sub-samples based on the waiting time until participation in the evaluated measure (cutoffs p25, p50, p75): 
#LL: these will make 4 (roughly equally sized) different groups according to how long JSs have been unemployed
# this is an important determinant of the effect and it is important to control for it
wtc25<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.25)
wtc50<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.50)
wtc75<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.75)

 minToP<-min(as.numeric(Sesample$entrya)-as.numeric(Sesample$entry), na.rm = TRUE)

partic<-Sesample[Sesample$treated==TRUE,]
nonpart<-Sesample[Sesample$treated==FALSE,]

#LL: Nonparticipants have only criterium of minimal length of unemployment
# (thus one non-participant can be used multiple times)
esample1<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>minToP,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc25,])
esample2<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc25,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc25 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc50,])
esample3<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc50,], 
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc50 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc75,])
esample4<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc75,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc75,])

#LL: We loop over 4 different lenghts of the prior unemployments
for (s in Ssamples) {
    
    esampleS <- get(paste0('esample', s, by = ''))
    
    # we count how many participants are there in a particular group.
    N_sp[s,] <- nrow(esampleS[esampleS$treated==TRUE,])
    
    if (mean(esampleS$treated) > 0.0001){
      
      #LL: we pick a particular subsample of variables.
      esampleS <-esampleS[,c("treated", "klient_id", 
                             "entry", "exit", "entrya", 
                             list_vars)]
      
      # Month of participation since the start of the evaluation period 
      esampleS$participation_month<-((year(as.Date(esampleS$entrya))-year(ep_start))*12)+
        month(as.Date(esampleS$entrya))
      
      #LL: maybe simplify it to
      #month(as.Date(esampleS$entrya))
      #?
      # we can use this with current specification of esample ( participants in the evaluated programme during the evaluation period (ep -> one year) + eligible unemployed at the same period)
      # if we change  specification of esample (i.e. What happened with those ones who became unemployed in a specific year (2017) and they can enter into the program after 3 months since they become unemployed (but they can enter into program 24 months since they become unemployed too))
      
      # Participation quarter
      esampleS$pcpQ<-ceiling(esampleS$participation_month/3)
      
      #LL: instead, we could have just 
      #esampleS$pcpQ<-quarter(as.Date(esampleS$entrya))
      # without the need to define participation_month
      
      # Month of the start of the unemployment since the start of the evaluation period
      esampleS$Ustart_month<-((year(as.Date(esampleS$entry))-min(year(ep_start)))*12)+
        month(as.Date(esampleS$entry)) 
      #LL: notice that this is a relative number, thus it can be negative(!)
      
      # Inflow quarter 
      esampleS$infQ<-ceiling(esampleS$Ustart_month/3)
      #LL: infQ = 0 means that this JS entered the register in Q4 of 2016 (if ep_start=="2017-01-01")
      
      # Outflow quarter
      esampleS$Uend_month<-((year(as.Date(esampleS$exit))-min(year(ep_start)))*12)+
        month(as.Date(esampleS$exit)) 
      esampleS$outQ<-ceiling(esampleS$Uend_month/3)
      
      ##Difference between entry into unemployment register and started of participation in measure
      esampleS$diff_entry <- ceiling(as.integer(as.Date(esampleS$entrya) - as.Date(esampleS$entry))/30.417)
      #LL: NOTEL why 30.417?
      # table(ceiling(difftime(as.Date(esampleS$entrya), as.Date(esampleS$entry), units = "days")/30.417)) 
      # parameters of difftime units  = c("auto", "secs", "mins", "hours", "days", "weeks")
      # haven't "months" unit 
      
      #Adding unemployment history
      #LL: previous unemployment history is an important predictor of both the treatment and outcomes
      # it is important to control for it.
      esampleS<-merge(esampleS, 
                      h_esample[,c("klient_id", "entry", 
                                   paste0("entry", seq(1:11), sep=""), 
                                   paste0("exit", seq(1:11), sep=""))],
                      by=c("klient_id", "entry"), all.y = FALSE)
      
      #Imputing the start of the unemployment spell 
      #LL: wait, isn't it 
      # #Imputing the start of the programme ? YES 
      
      #LL: we wish to make a meaningful comparison.
      # But we're facing a problem because for the non-participants, we don't have date of entry to the course
      # well simply because they did not participate(!)
      # what we do is the following.
      # for every suitable non-participant we chose one particular quarter at random from the evaluation year that is "feasible"
      # "feasible" means that the non-participant _could_have_ potentially participated in that quarter 
      # given that we have a large donor pool, this randomness will not impact our estimates much.
      

            
      #LL: In Pmatrix, we will store for every non-participant, the feasible quarters.
      # e.g. if it is [0 1 1 1], it means that with prob. 1/3 we pick quarter 2,3 or 4
      Pmatrix <- matrix(NA, nrow = nrow(esampleS[esampleS$treated==FALSE,]), ncol=max_pcpQ)
  
          #LL: we loop through the participation quarters
          for (p in 1:max_pcpQ){  
            
            #LL: NOTE!!!! 
            # we restrict ourselves to only non-participants who are at the (beginning of the) particular pcpQ unemployed
            ind <- (esampleS$outQ[esampleS$treated==FALSE]>=p)
            
            # Quarter of inflow to unemployment of participants entering the programme in quarter P
            PinfQ<-unique(esampleS$infQ[esampleS$treated==TRUE & esampleS$pcpQ==p])
            # Only allowing non-participants inflowing to unemployment during the quarters when participants in this sub-sample were inflowing
            #D<-esampleS[as.logical(esampleS$treated==T & esampleS$pcpQ == p) |
            #              as.logical(esampleS$treated==F & esampleS$infQ %in% PinfQ),]
            #LL: NOTE: what is this last commented bit?

            Pmatrix[ind,p]<-as.numeric((esampleS$infQ[esampleS$treated==FALSE])[ind]  %in% PinfQ)
            Pmatrix[!ind,p] <- 0
            
            #LL: does, for this particular quarter of inflow, exists anyone from the list of participants?
            # in other words
            # can we, in a given sample for a given quarter match it to at least one participant?
            
          }
      
        hh <- function(j){
              sample(which(j==1),1)
        }
        
        sumIsZero    <- as.logical(apply(Pmatrix, 1, FUN=sum)==0)
        sumIsNonZero <- as.logical(apply(Pmatrix, 1, FUN=sum)!=0)

        esampleS[esampleS$treated == FALSE,]$pcpQ[sumIsZero] <- 0
        #LL: for every non-participant, for whose inflow we cannot match to ANY participant
        # for any of the four quarters, we assign zero
        
        esampleS[esampleS$treated == FALSE,]$pcpQ[sumIsNonZero] <- apply(Pmatrix[sumIsNonZero,],1,hh)
        #LL: for every non-participant we pick one of the feasible quarters RANDOMLY for given esampleS (we have four of these)
        # (remember: feasible means that there exists at least one participant for inflow)
        #LL: NOTE: did we fix the seed?? if not, we should.
  
        
                #LL: loop across different time periods
                for (q in OQm) { 
                  
                  cond <- FALSE
                  
                  #LL: we add variables whether someone is in the register.
                  # we consider at most 11 unemployment spells.
                    for (n in 1:11) { 
                      entry_n <- esampleS[ ,paste0('entry', n, '')]
                      exit_n <- esampleS[, paste0('exit', n, '')]
                      
                      cond <- cond | isInRegister(p = esampleS$pcpQ, q, entry_n, exit_n) 
                    }
                  
                  esampleS[ ,paste0('empl',ceiling(q/3), '')] <- ifelse(cond, 0,1)
                }
             
           #LL: NOTE: can't this be in the same loop (?)
              for(q in OQm){
                esampleS <- esampleS[!is.na(esampleS[ ,paste0('empl',ceiling(q/3), '')]),]
              }
        
            #Pre-estimation data preparation
            #Additional outcome variables
            esampleS[ ,"cumempl"] <- rowSums(esampleS[, paste0("empl", seq(0,12,1), sep="")])
            esampleS[ ,"firstempl"] <- ceiling(as.numeric((as.Date(esampleS$exit)-
                                                      (as.Date(ep_start) + months(esampleS$pcpQ*3)))/90)) 
            #LL: NOTE: here persons with multiple unempl spell WITHIN the studied year could create problems
            
            # Cleaning the Xs
            y <- data.frame(treated = esampleS[ ,'treated'])
            D <- data.frame(esampleS) #D - su tvoje data ako data.frame()
            spec <- as.formula(cbind(y,D[,c(list_vars, 
                                            paste0("empl.", seq(1,4,1), sep=""))])) 
            #je tvoja specifikacia modelu, je to object as.formula()
            colTh <- 0.8 
            #Treshold for the acceptable correlation between vars ( je maximalne tolerovana korelacia medzi dvoma premennymi)
            dumTh <- 0.0001 
            #Treshold for the acceptable concentration of dummy variables (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumTh <- 0.005
            #Treshold for the acceptable concentration of dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumThN <- 5
            #Treshold for the minimal number of observations of a dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            result <- DatPrep(D = D, spec = spec, colTh = colTh, dumTh = dumTh, TdumTh = TdumTh, TdumThN = TdumThN) 
            
        for (col in colnames(result$D)){
          D <-  D[!is.na(D[,col]),]
        }    
            
        # ESTIMATION:
        m.1 <- matchit(as.formula(paste(result$spec[2], '~', result$spec[3] , sep = ' ')), 
                       data = D,
                       method = "nearest", 
                       exact = c("infQ"),
                       distance = "glm", link = "probit")
        #plot(summary(m.1))
        m.data1 <- match.data(m.1)
        match.matrix <- data.frame("untreated" = m.1$match.matrix[,1] ,
                                   "treated" = rownames(m.1$match.matrix))
        m.data1[as.character(match.matrix[, 'untreated']), 'entrya'] <- m.data1[as.character(match.matrix[, 'treated']),'entrya']
        m.data1$diff_entry <- ceiling(as.integer(as.Date(m.data1$entrya) - as.Date(m.data1$entry))/30.417)
        
        assign(paste("Mdata",s, sep=""), m.data1)
        Mdata <- bind_rows(Mdata, m.data1)
        

        assign(paste0('balancegraph',s, by=''),summary(m.1, subclass = TRUE))
        
        
        #distance in m.data1 is Propensity score
        #trim = 0.005
        #m.data1 <- m.data1[!(m.data1$distance <= trim | m.data1$distance >= (1-trim)),]
  
        
            for (iQ in O_vars){
              fit <- lm(as.formula(paste(iQ , '~',  result$spec[2], '+' , result$spec[3], by = ' ')), 
                        data = m.data1, 
                        weights = weights)
              res <- coeftest(fit, vcov. = vcovCL, cluster = ~subclass)
              att <- res[2,1]
              se <- res[2,2]
              
              resultsArray_ATT[iQ,s] <- att
              resultsArray_se[iQ,s] <- se
              
        }
    }
  } 


for (iQ in 1:length(O_vars)){
      results[iQ,1] <- sum((resultsArray_ATT[iQ,]*(N_sp/N)))
      results[iQ,2] <- sum((resultsArray_se[iQ,]*(N_sp/N)))
      #results[iQ,2] <- sum((resultsArray_se[iQ,]^2*(N_sp/N)))
}

# LL: sanity check
#plot(resultsArray_ATT[1:17,1])
#lines(resultsArray_ATT[1:17,2])
#lines(resultsArray_ATT[1:17,3])
#lines(resultsArray_ATT[1:17,4])

results<-cbind(O_vars, results)   

#Mdata<-rbind(Mdata1, Mdata2, Mdata3, Mdata4)

resultsPSM <- matrix(NA, nrow=length(O_vars), ncol = 2)

for (iQ in 1:length(O_vars)){
            fit <- lm(as.formula(paste(O_vars[iQ] , '~',  result$spec[2], '+' , result$spec[3], by = ' ')), 
                      data = Mdata, 
                      weights = weights)
            res <- coeftest(fit, vcov. = vcovCL, cluster = ~subclass)
            #LL: source of clustering?
            att <- res[2,1]
            se <- res[2,2]
            
            resultsPSM[iQ,1] <- att
            resultsPSM[iQ,2] <- se
            
        }

resultsPSM<-cbind(O_vars, resultsPSM)

#LL: why do results and results PSM give different values?
# I see, it is the standard errors. They are effectively cut in half
# that is in line with square-root convergence, because the sample size is quadrupled.

Graph 4: Balance of the groups of participants and eligible before and after weighting

match.vars <- c('distance',list_vars,paste0("empl.", seq(1,4,1), sep=""),'infQ')


balance_g  <- array(NA, dim=c(length(match.vars),2,length(Ssamples))) 
dimnames(balance_g)[[1]] <-match.vars
dimnames(balance_g)[[2]] <- c('Balance_for_All_Data','Balance_for_Matched_Data')

for(i in 1:length(Ssamples)){
  
  if(exists(paste0('balancegraph', i, by = ''))){
    x <- get(paste0('balancegraph', i, by = ''))
    balance_g[,1,i] <- abs(x$sum.all[,3][match(rownames(balance_g), names(x$sum.all[,3]))])
    balance_g[,2,i] <- abs(x$sum.matched[,3][match(rownames(balance_g), names(x$sum.matched[,3]))])
  }                  
}

balance_f <- array(NA, dim=c(length(match.vars),2)) 
dimnames(balance_f)[[1]] <-match.vars
dimnames(balance_f)[[2]] <- c('Balance_for_All_Data','Balance_for_Matched_Data') 

for (iQ in 1:length(match.vars)){
      balance_f[iQ,1] <- sum(sum((as.numeric(balance_g[iQ,1,])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      balance_f[iQ,2] <-sum(sum((as.numeric(balance_g[iQ,2,])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
}

balance_fg <- data.frame(balance = c(balance_f[,1], balance_f[,2]),
                 Balance = c(rep("before matching",length(match.vars)),rep("after matching",length(match.vars))),
                 names = c(match.vars,match.vars) ) 

p <- balance_fg %>% 
    mutate(names = fct_reorder(names, balance)) %>%
    ggplot(aes(x=balance, y=names,col=Balance)) + 
    geom_vline(xintercept = 0.05, linetype="dotted", color = 'darkgrey') + 
    geom_vline(xintercept = 0.0, color = 'darkgrey')+
    geom_vline(xintercept = 0.1, color = 'darkgrey')+
    geom_point()+
    theme_minimal()+
    ylab('Variables')+
    xlab('Absolute Standardized Meand Difference')+
    theme(plot.caption = element_text(hjust = 0), legend.position = "top")+
    labs(caption="Source: ÚPSVaR")
  
p

After inspecting the achieved balance of the “control group” and participants, we can proceed with reporting the estimated effects of participation in the Support for hiring disadvantaged job seekers during 2017.

3.1 Labour market outcomes of participants and eligible

First, we construct a proxy of the employment rate. It is a share of persons out of the register of unemployed JSs observed on the first day of quarters before and after participation (Quarter 0 was the quarter when participation started).

Graph 5: The share of individuals out of the unemployment register (proxy for employment rate)

graph <- matrix(NA, nrow=length(O_vars[1:17]), ncol = 2)
rownames(graph) <- O_vars[1:17]
for(o in O_vars[1:17]){
  graph[o,1] <- apply(Mdata[Mdata$treated == TRUE,o,drop = FALSE], 2, mean, na.rm = TRUE)
  graph[o,2] <- apply(Mdata[Mdata$treated == FALSE,o,drop = FALSE], 2, mean, na.rm = TRUE)
}

graph <- data.frame(cbind(O_vars[1:17],graph))
colnames(graph) <- c('O_vars','treated', 'untreated')
graph$treated <- as.numeric(as.character(graph$treated))
graph$untreated <- as.numeric(as.character(graph$untreated))
graph$O_vars <- factor(graph$O_vars, levels=c(O_vars[1:17]))
graph$Q<-seq(-4,12,1)

graphER<- ggplot(data=graph) +
  geom_line(aes(x=Q, y=treated, colour="Treated"), size = 1 , group = 1) +
  geom_line(aes(x=Q, y=untreated, colour="Control group"), size = 1 , group = 1) +
  scale_x_continuous(breaks=seq(-4,12,1))+
  theme_minimal()+
  labs( 
       y = "Share of treated",
       x = "Quarters before and after the start of the participation (0)",
       colour = "Group", 
       caption="Source: ÚPSVaR") +
  theme(plot.caption = element_text(hjust = 0), legend.position = "top")

graphER

Second, we look at the number of quarters until the first exit from the register of unemployed JSs. This indicates how participation in the ALMP programme contributed to the shortening of the participants’ unemployment spell. The graph displays the whole distribution of participants and the control group based on the values of this outcome indicator.

Graph 6: Quarters until the first exit

#### THE NUMBER OF MONTHS UNTIL THE FIRST EXIT
########## Months until the first exit
# how many months after the entrya JS got a job

firstempl_P <- data.frame('quarter' = Mdata[Mdata$treated==TRUE, 'firstempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Participants") 

firstempl_E <- data.frame('quarter' = Mdata[Mdata$treated==FALSE, 'firstempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Eligible") 

firstempl <- rbind(firstempl_P, firstempl_E)

ggplot(firstempl, aes(fill=Group, y=percent, x=quarter)) + 
          geom_bar(position="dodge", stat="identity") +
          facet_wrap(~Group) +
          theme_minimal() + 
          scale_fill_manual(values=c('grey', 'steelblue3')) +
          scale_x_continuous(breaks=seq(0,16,1)) +
          theme(legend.position="none") +
          xlab("Quarters after the start of the participation (0)") + 
          ylab("") +
          labs(caption="Source: COLSAF")+
          scale_y_continuous(labels = percent)

The following graph displays the cumulative number of months spent out of the register during the observation period following the evaluation program’s end.

Graph 7: Number of quarters out of the unemployment register after the end of the participation

## Plotting the number of months in cumulative employment
###### Participants

empl36m_P <- data.frame('quarter' = Mdata[Mdata$treated==TRUE, 'cumempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Participants") 

empl36m_E <- data.frame('quarter' = Mdata[Mdata$treated==FALSE, 'cumempl']) %>% 
  group_by(quarter) %>% 
  summarise(total = n(), .groups = 'drop') %>%
  mutate(percent =  round( total / sum(total), 4),
         Group = "Eligible") 

empl36m <- rbind(empl36m_P, empl36m_E)

ggplot(empl36m, aes(fill=Group, y=percent, x=quarter)) + 
  geom_bar(position="dodge", stat="identity") +
  facet_wrap(~Group) +
  theme_minimal() + 
  scale_fill_manual(values=c('grey', 'steelblue3')) +
  scale_x_continuous(breaks=seq(0,16,1)) +
  theme(legend.position="none") +
  xlab("Quarters after the start of the participation (0)") + 
  ylab("") +
  labs(caption="Source: COLSAF")+
  scale_y_continuous(labels = percent)

3.2 Estimation of the average treatment effects on the treated (ATTs)

Now, we explore the average treatment effects on the treated (ATTs).

Graph 8: The average treatment effects on the treated (ATTs) estimated for the participation in the Support for hiring disadvantaged job seekers during 2017

graphATT

3.3 Statistical significance and heterogeneity of ATT effects

Further, we report the ATTs on the quarterly proxy for employment status, complemented by the indicators of:

Number of months until the first exit from the unemployment register JSs (firtempl)
Cumulative number of months outside the unemployment register of unemployed JSs (cumempl)

Table 3: The average treatment effect on the treated (ATT)

resultsDF <- select(resultsDF, -pval)
colnames(resultsDF) <- c("", "effect", "se", "sig.")

resultsDF %>% kbl(format = 'html', booktabs = TRUE , align = 'c') %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE) %>%
  row_spec(1:nrow(resultsDF), font_size = '1cm') %>%
  column_spec(1:ncol(resultsDF),width = "1.1cm") %>%
  kableExtra::footnote(number = paste('Significance (sig.):',  0.0000, ' " *** " ', 0.001, ' " ** " ', 0.01, ' " * " ', 0.05, ' " . " ', 0.1, '" "',  1))

	effect	se	sig.
empl.4	-0.019	0.012
empl.3	0.018	0.012
empl.2	0.007	0.007
empl.1	0.000	0.000
empl0	-0.094	0.009	***
empl1	-0.129	0.010	***
empl2	-0.123	0.011	***
empl3	-0.081	0.013	***
empl4	-0.044	0.015	**
empl5	0.098	0.018	***
empl6	0.348	0.016	***
empl7	0.269	0.016	***
empl8	0.204	0.016	***
empl9	0.180	0.016	***
empl10	0.147	0.016	***
empl11	0.144	0.016	***
empl12	0.137	0.016	***
firstempl	-0.821	0.120	***
cumempl	1.056	0.112	***
¹ Significance (sig.): 0 " * " 0.001 " " 0.01 " * " 0.05 " . " 0.1 " " 1

Additionally, we explore the heterogeneity in the effects programme participation has on various subgroups of participants.

We explore the ATTs for subgroups based on:

Gender - differences in effects for women and men
Education level - differences in effects for JSs with no education, primary education, lower and secondary education.
Share of Roma in the city of residence - differences in effects for JSs who lives in the city with the share of Roma below 10% and above 10%
The size of the city of residence - differences in effects for JSs who lives in the city with a population less than 4 000 and more than 4 000

Table 4: The average treatment effect on the treated: subgroup based difference between entry into unemployment register and started of participation in measure

diff_entry

				Timing of participation
	Sample			0-6 months			7-12 months			13+ months
	effect	se	sig.	effect	se	sig.	effect	se	sig.	effect	se	sig.
empl.4	-0.019	0.012		-0.028	0.037		0.072	0.034		-0.036	0.007	***
empl.3	0.018	0.012		0.182	0.038	***	-0.090	0.033	**	-0.005	0.003	.
empl.2	0.007	0.007		0.096	0.033	**	-0.054	0.014	***	0.000	0.000
empl.1	0.000	0.000		-0.003	0.003		0.000	0.000		0.000	0.000
empl0	-0.094	0.009	***	-0.217	0.033	***	-0.170	0.023	***	-0.014	0.008	.
empl1	-0.129	0.010	***	-0.298	0.035	***	-0.210	0.025	***	-0.028	0.010	**
empl2	-0.123	0.011	***	-0.324	0.036	***	-0.165	0.029	***	-0.027	0.012
empl3	-0.081	0.013	***	-0.232	0.036	***	-0.102	0.030	***	-0.011	0.019
empl4	-0.044	0.015	**	-0.200	0.037	***	-0.045	0.035		0.015	0.022
empl5	0.098	0.018	***	-0.039	0.042		0.152	0.040	***	0.126	0.026	***
empl6	0.348	0.016	***	0.203	0.035	***	0.413	0.033	***	0.372	0.024	***
empl7	0.269	0.016	***	0.161	0.034	***	0.290	0.031	***	0.300	0.023	***
empl8	0.204	0.016	***	0.107	0.033	***	0.234	0.032	***	0.228	0.024	***
empl9	0.180	0.016	***	0.106	0.030	***	0.200	0.034	***	0.200	0.023	***
empl10	0.147	0.016	***	0.116	0.031	***	0.162	0.034	***	0.150	0.022	***
empl11	0.144	0.016	***	0.126	0.032	***	0.159	0.036	***	0.143	0.023	***
empl12	0.137	0.016	***	0.128	0.033	***	0.129	0.035	***	0.140	0.021	***
firstempl	-0.821	0.120	***	0.960	0.258	***	-0.594	0.255		-1.575	0.186	***
cumempl	1.056	0.112	***	-0.361	0.280		1.047	0.269	***	1.595	0.163	***
¹ Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
² Sample: 2 382 observations 0-6 months: 491 observations 7-12 months: 567 observations 13+ months: 1 322 observations

Table 5: The average treatment effect on the treated: subgroup based on gender

gender

				Gender
	Sample			Women			Men
	effect	se	sig.	effect	se	sig.	effect	se	sig.
empl.4	-0.019	0.012		0.001	0.016		-0.056	0.020	**
empl.3	0.018	0.012		0.025	0.016		0.004	0.021
empl.2	0.007	0.007		0.011	0.011		0.006	0.014
empl.1	0.000	0.000		0.000	0.000		-0.001	0.001
empl0	-0.094	0.009	***	-0.091	0.012	***	-0.100	0.015	***
empl1	-0.129	0.010	***	-0.115	0.014	***	-0.147	0.018	***
empl2	-0.123	0.011	***	-0.098	0.017	***	-0.155	0.019	***
empl3	-0.081	0.013	***	-0.034	0.020	.	-0.144	0.023	***
empl4	-0.044	0.015	**	0.008	0.023		-0.114	0.025	***
empl5	0.098	0.018	***	0.152	0.026	***	0.021	0.031
empl6	0.348	0.016	***	0.346	0.020	***	0.343	0.025	***
empl7	0.269	0.016	***	0.275	0.020	***	0.258	0.025	***
empl8	0.204	0.016	***	0.213	0.022	***	0.190	0.024	***
empl9	0.180	0.016	***	0.184	0.021	***	0.174	0.023	***
empl10	0.147	0.016	***	0.150	0.020	***	0.138	0.024	***
empl11	0.144	0.016	***	0.157	0.020	***	0.124	0.025	***
empl12	0.137	0.016	***	0.152	0.019	***	0.117	0.024	***
firstempl	-0.821	0.120	***	-1.135	0.176	***	-0.391	0.186
cumempl	1.056	0.112	***	1.300	0.159	***	0.706	0.187	***
¹ Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
² Sample: 2 382 observations Women: 1 379 observations Men: 1 003 observations

Table 6: The average treatment effect on the treated: subgroup based on education level

education

				Education
	Sample			No education			Primary			Lower secondary			Upper secondary
	effect	se	sig.	effect	se	sig.	effect	se	sig.	effect	se	sig.	effect	se	sig.
empl.4	-0.019	0.012		0.230	NaN		0.087	0.065		-0.069	0.022	**	-0.003	0.018
empl.3	0.018	0.012		-0.383	NaN		0.080	0.060		0.005	0.021		0.035	0.018	.
empl.2	0.007	0.007		-0.888	NaN		0.039	0.053		0.006	0.017		0.002	0.014
empl.1	0.000	0.000		0.000	NaN		0.000	0.000		0.002	0.002		0.000	0.000
empl0	-0.094	0.009	***	0.000	NaN		-0.156	0.046	***	-0.094	0.017	***	-0.095	0.014	***
empl1	-0.129	0.010	***	0.000	NaN		-0.209	0.067	**	-0.133	0.021	***	-0.127	0.016	***
empl2	-0.123	0.011	***	0.000	NaN		-0.129	0.079		-0.115	0.024	***	-0.126	0.019	***
empl3	-0.081	0.013	***	0.000	NaN		-0.097	0.083		-0.080	0.030	**	-0.074	0.024	**
empl4	-0.044	0.015	**	0.837	NaN		-0.031	0.086		-0.023	0.031		-0.053	0.027
empl5	0.098	0.018	***	0.000	NaN		0.090	0.087		0.110	0.034	***	0.084	0.030	**
empl6	0.348	0.016	***	-0.562	NaN		0.330	0.074	***	0.349	0.029	***	0.341	0.024	***
empl7	0.269	0.016	***	2.060	NaN		0.309	0.075	***	0.251	0.027	***	0.245	0.024	***
empl8	0.204	0.016	***	2.060	NaN		0.255	0.076	***	0.187	0.028	***	0.176	0.025	***
empl9	0.180	0.016	***	0.163	NaN		0.203	0.076	**	0.169	0.026	***	0.163	0.025	***
empl10	0.147	0.016	***	-0.724	NaN		0.204	0.071	**	0.160	0.028	***	0.116	0.024	***
empl11	0.144	0.016	***	0.163	NaN		0.256	0.071	***	0.144	0.027	***	0.098	0.024	***
empl12	0.137	0.016	***	-0.501	NaN		0.261	0.074	***	0.134	0.027	***	0.099	0.024	***
firstempl	-0.821	0.120	***	-3.552	NaN		-1.273	0.709	.	-0.791	0.225	***	-0.740	0.194	***
cumempl	1.056	0.112	***	3.497	NaN		1.287	0.682	.	1.059	0.219	***	0.849	0.192	***
¹ Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
² Sample: 2 382 observations No education: 11 observations Primary: 174 observations Lower secondary: 831 observations Upper secondary: 1 029 observations

Table 7: The average treatment effect on the treated: subgroup based on share of Roma in the city of residence

romas

				Roma share
	Sample			0-10%			10-100%
	effect	se	sig.	effect	se	sig.	effect	se	sig.
empl.4	-0.019	0.012		-0.009	0.013		-0.042	0.031
empl.3	0.018	0.012		0.022	0.014		0.004	0.027
empl.2	0.007	0.007		0.000	0.009		0.025	0.022
empl.1	0.000	0.000		-0.001	0.001		0.005	0.005
empl0	-0.094	0.009	***	-0.097	0.011	***	-0.101	0.024	***
empl1	-0.129	0.010	***	-0.129	0.012	***	-0.141	0.027	***
empl2	-0.123	0.011	***	-0.138	0.014	***	-0.105	0.031	***
empl3	-0.081	0.013	***	-0.097	0.016	***	-0.074	0.033
empl4	-0.044	0.015	**	-0.055	0.017	***	-0.049	0.038
empl5	0.098	0.018	***	0.082	0.022	***	0.087	0.042
empl6	0.348	0.016	***	0.336	0.019	***	0.343	0.033	***
empl7	0.269	0.016	***	0.253	0.019	***	0.278	0.033	***
empl8	0.204	0.016	***	0.181	0.018	***	0.255	0.035	***
empl9	0.180	0.016	***	0.150	0.018	***	0.252	0.035	***
empl10	0.147	0.016	***	0.116	0.018	***	0.211	0.036	***
empl11	0.144	0.016	***	0.104	0.019	***	0.232	0.036	***
empl12	0.137	0.016	***	0.108	0.018	***	0.201	0.034	***
firstempl	-0.821	0.120	***	-0.681	0.140	***	-0.870	0.288	**
cumempl	1.056	0.112	***	0.814	0.131	***	1.389	0.288	***
¹ Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
² Sample: 2 382 observations 0-10%: 1 752 observations 10-100%: 630 observations

Table 8: The average treatment effect on the treated: subgroup based on the size of the city of residence

City

				Type of city
	Sample			Village			City
	effect	se	sig.	effect	se	sig.	effect	se	sig.
empl.4	-0.019	0.012		-0.019	0.020		-0.021	0.018
empl.3	0.018	0.012		0.033	0.018	.	0.003	0.017
empl.2	0.007	0.007		0.004	0.013		0.010	0.013
empl.1	0.000	0.000		0.000	0.000		0.000	0.000
empl0	-0.094	0.009	***	-0.099	0.015	***	-0.093	0.013	***
empl1	-0.129	0.010	***	-0.132	0.017	***	-0.128	0.015	***
empl2	-0.123	0.011	***	-0.134	0.020	***	-0.117	0.018	***
empl3	-0.081	0.013	***	-0.076	0.023	***	-0.089	0.022	***
empl4	-0.044	0.015	**	-0.064	0.025	**	-0.033	0.024
empl5	0.098	0.018	***	0.029	0.028		0.151	0.027	***
empl6	0.348	0.016	***	0.301	0.024	***	0.385	0.022	***
empl7	0.269	0.016	***	0.236	0.024	***	0.292	0.021	***
empl8	0.204	0.016	***	0.174	0.024	***	0.225	0.023	***
empl9	0.180	0.016	***	0.176	0.024	***	0.178	0.024	***
empl10	0.147	0.016	***	0.151	0.025	***	0.134	0.022	***
empl11	0.144	0.016	***	0.138	0.024	***	0.143	0.023	***
empl12	0.137	0.016	***	0.118	0.023	***	0.146	0.022	***
firstempl	-0.821	0.120	***	-0.616	0.189	***	-0.934	0.188	***
cumempl	1.056	0.112	***	0.818	0.191	***	1.193	0.176	***
¹ Significance codes (sig.): 0 " ** " 0.001 " * " 0.01 " * " 0.05 " . " 0.1 " " 1
² Sample: 2 382 observations Village: 1 087 observations City: 1 295 observations

4. Technical appendix

The technical appendix provides additional details on the data processing procedures, complemented by results obtained by an alternative estimation strategy based on inverse probability weighting.

4.1 Sample Selection

This section displays additional detail about the selection and cleaning of the evaluation sample.

info_table <- data.frame(matrix(ncol=2,nrow=0, dimnames=list(NULL, c("Desciption", "Value"))))
info_table[1,] <- c('Start of the evaluation period', paste(ep_start))
info_table[2,] <- c('End of the evaluation period', paste(ep_end))

info_table <-
  info_table[,-3] %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE)


sample_selection <- data.frame(matrix(ncol=3,nrow=0, dimnames=list(NULL, c("Desciption", "Dropped","Total"))))

sample_selection[1,] <- c('Total registrations ', 0,format(n1, big.mark=" ", scientific=FALSE))
sample_selection[2,] <- c('Total eligible jobseekers', 0,format(n2, big.mark=" ", scientific=FALSE))
sample_selection[3,] <- c('Total participations in the evaluated measure', 0,format(npart1, big.mark=" ", scientific=FALSE))
sample_selection[4,] <- c('Total participants in the evaluated measure', 0 ,format(npart0, big.mark=" ", scientific=FALSE))

x<-n1-n4
y<-npart1-n3
sample_selection[5,] <- c('Dropped eligible JSs', format(n4, big.mark=" ", scientific=FALSE), format(x, big.mark=" ", scientific=FALSE))
sample_selection[6,] <- c('Dropped participants', format(n3, big.mark=" ", scientific=FALSE), format(y, big.mark=" ", scientific=FALSE))

x<-x-n6
y<-y-n5
sample_selection[7,] <- c('Dropped eligible JSs', format(n6, big.mark=" ", scientific=FALSE),format(x, big.mark=" ", scientific=FALSE))
sample_selection[8,] <- c('Dropped participants', format(n5, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-y-n7
sample_selection[9,] <- c('Dropped participants', format(n7, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

x<-x-n9
y<-y-n8
sample_selection[10,] <- c('Dropped eligible JSs', format(n9, big.mark=" ", scientific=FALSE),format(x, big.mark=" ", scientific=FALSE))
sample_selection[11,] <- c('Dropped participants', format(n8, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-nrowdfa-n10
sample_selection[12,] <- c('Inflating participants by merging dataframes', 0,format(nrowdfa, big.mark=" ", scientific=FALSE))
sample_selection[13,] <- c('Dropped participants', format(n10, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

y<-y-n11
sample_selection[14,] <- c('Dropped participants', format(n11, big.mark=" ", scientific=FALSE),format(y, big.mark=" ", scientific=FALSE))

sample_selection[15,] <- c('Eligibles', format(x-(length(unique(esample$klient_id[esample$treated==FALSE]))), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
sample_selection[16,] <- c('Participants', format(y-length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE), format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE))

sample_selection[17,] <- c('Eligibles', 0, format(length(unique(esample$klient_id[esample$treated==FALSE])), big.mark=" ", scientific=FALSE))
sample_selection[18,] <- c('Participants', 0, format(length(unique(esample$klient_id[esample$treated==TRUE])), big.mark=" ", scientific=FALSE))

sample_selection$Variable <- c('All registrations (before cleaning)', 'All registrations (before cleaning)', 'All registrations (before cleaning)','All registrations (before cleaning)',
                             
                             'Dropped JSs with ALMP participation 2 years before the EP', 'Dropped JSs with ALMP participation 2 years before the EP',
                             
                             'Dropped JSs with ALMP participation in other ALMP during the EP', 'Dropped JSs with ALMP participation in other ALMP during the EP',
                             
                             'Dropped participants with multiple ALMP participations in evaluated programme',
                             
                             'Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP',  'Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP',
                             
                             'Dropped participations which not happening during an unemployment spell','Dropped participations which not happening during an unemployment spell', 
                             
                             'Dropped participations with extreme values (1%) of the waiting time until participation in the evaluated measure ',
                             
                             'Dropped JSs with multiple registrations', 'Dropped JSs with multiple registrations',
                             
                             'Total registrations after cleaning', 'Total registrations after cleaning'
)

sample_selection <-sample_selection[,-4] %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE)%>%
  pack_rows(index = table(fct_inorder(sample_selection$Variable)))

Table 9: Information table about the beginning and end of the evaluation period

info_table

Desciption	Value
Start of the evaluation period	2017-01-01
End of the evaluation period	2017-12-31

Table 10: Data cleaning documentation

sample_selection

Desciption	Dropped	Total
All registrations (before cleaning)
Total registrations	0	316 875
Total eligible jobseekers	0	300 626
Total participations in the evaluated measure	0	2 665
Total participants in the evaluated measure	0	2 663
Dropped JSs with ALMP participation 2 years before the EP
Dropped eligible JSs	47 278	269 597
Dropped participants	542	2 123
Dropped JSs with ALMP participation in other ALMP during the EP
Dropped eligible JSs	44 685	224 912
Dropped participants	297	1 826
Dropped participants with multiple ALMP participations in evaluated programme
Dropped participants	0	1 826
Dropped JSs with ALMP participation in supported employment programs (ALMP) during the EP
Dropped eligible JSs	25 180	199 732
Dropped participants	61	1 765
Dropped participations which not happening during an unemployment spell
Inflating participants by merging dataframes	0	1 792
Dropped participants	588	1 204
Dropped participations with extreme values (1%) of the waiting time until participation in the evaluated measure
Dropped participants	13	1 191
Dropped JSs with multiple registrations
Eligibles	10 168	189 564
Participants	1	1 190
Total registrations after cleaning
Eligibles	0	189 564
Participants	0	1 190

4.2 Description of explanatory variables (used in balancing)

Description of variables, used in the estimation of ATTs. We also show their balance before and after matching in graph 4 and balance before and after weighting in graph 10.

Table 11: List and description of variables used in estimation

#TREBA DOPLNIŤ DO POSTUP_EXPORT
Q[["df"]][["flang"]] <- "knowledge of a foreign language (1: Yes, 0: No)"
Q[["df"]][["drive"]] <- "driving license holder (1: Yes, 0: No)"
Q[["df"]][["pc"]] <- "computer skills (1: Yes, 0: No)"
Q[["df"]][["unpast"]] <- "registered citizen in the past (1: Yes, 0: No), unemployed in past"
Q[["df"]][["min_urad"]] <- "traveling time to the nearest Labour office (in minutes)"
Q[["df"]][["min_BA"]] <- "traveling time to the Capital city - Bratislava (in minutes)"
Q[["df"]][["population"]] <- "number of inhabitants of the place of residence"



list_vars_table <- data.frame(Variables = list_vars)

desc <- c()
for (i in list_vars_table$Variables){
  Q[["df"]][[i]][1]
  desc <- append(desc, ifelse(is.null(Q[["df"]][[i]][1]), NA, Q[["df"]][[i]][1]))
}

list_vars_table$Description <- desc
list_vars_table[list_vars_table$Variables == "ent","Description"] <- "difference between entry into unemployment register and started of evalueted period"
list_vars_table[list_vars_table$Variables == "UR_region","Description"] <- "unemployment rate in region during valuated period"
list_vars_table[list_vars_table$Variables == "roma_share","Description"] <- "share of Roma in the place of residence"

list_vars_table[,-3] %>% kbl(format = 'html', booktabs = TRUE , align = 'c', row.names = FALSE) %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE)

Variables	Description
ent	difference between entry into unemployment register and started of evalueted period
male	combination of klient_id and the date of entry is an Unique ID. (we created and used dup column to identify duplicates and remove them)
married	marital status: single
kids	nationality: other
slovak	marital status: other
noedu	kids under 10 years
primary	level of education: no education
lsec	level of education: primary
usec	level of education: ower secondary
flang	knowledge of a foreign language (1: Yes, 0: No)
drive	driving license holder (1: Yes, 0: No)
pc	computer skills (1: Yes, 0: No)
unpast	registered citizen in the past (1: Yes, 0: No), unemployed in past
min_urad	traveling time to the nearest Labour office (in minutes)
min_BA	traveling time to the Capital city - Bratislava (in minutes)
UR_region	unemployment rate in region during valuated period
roma_share	share of Roma in the place of residence
population	number of inhabitants of the place of residence
age	gender (1: man, 0: woman)

4.3 Alternative estimation strategy (Inverse probability weighting)

In this section, we explore the sensitivity of our results to an alternative model specifications. The main results, reported above, were obtained by [propensity score matching] (https://cran.r-project.org/web/packages/MatchIt/MatchIt.pdf). An alternative estimation strategy was applied using an [inverse probability weighting estimator] (https://www.rdocumentation.org/packages/causalweight/versions/1.0.2/topics/treatweight).

#Estimation parameters
Ssamples <- seq(1,4)
participation_month<-((year(as.Date(esample$entrya))-min(year(ep_start)))*12)+month(as.Date(esample$entrya)) # Month of participation since the start of the evaluation period
pcpQ<-ceiling(participation_month/3)
max_pcpQ<-max(pcpQ, na.rm = TRUE)

OQm <- seq(-12,36,3)
OQ <- -4:12
O_vars <- c(paste("empl",OQ, sep=""), "firstempl", "cumempl")
O_vars<-str_replace(O_vars, "-", ".")

list_vars <- c('ent', 'male', 'married','kids',
           'slovak', 'noedu','primary', 'lsec', 'usec',
           'flang', 'drive', 'pc',
           'unpast', 'min_urad', 'min_BA',
           'UR_region', 'roma_share', 'population', 'age')

#     # All potentially useful explanatory variables (Xs)
#     list_vars <- c('ent', 'male', 'single', 'married','kids',
#                    'slovak', 'hungarian', 'roma', 
#                    'noedu','primary', 'lsec', 'usec', 'tertiary'
#                    , 'zaujem_vzdel',
#                    'flang', 'drive', 'pc',
#                    'healthy', 'barrier', 'graduate', 'ziad_undn_sp', 'cvyhl_poisteu', 
#                    'empl', 'unpast', 'employee', 'selfempl', 'zaujem_szco',
#                    'look_ptime', 'commute', 'relocate', 'zaujem_zam_zahr',
#                    'min_kraj', 'min_urad', 'min_BA', 
#                    'UR_region', 'roma_share', 'population', 
#                    ageg_dummies,
# #                    paste0("urad_",seq(from=1, to=46), sep=""), 
#                    paste0("isco1_",seq(from=1, to=3), sep=""),
#                    paste0("odbor1_",seq(from=1, to=5), sep=""))
# #                   paste0(colnames(df)[grepl("nace1_",colnames(df))], sep=""))

Balance_vars <- list_vars
# Result Matrixes
  N<-nrow(esample[esample$treated==TRUE,])
  N_sp <- matrix(NA, nrow=length(Ssamples)) 
  
  resultsArray_ATT  <- array(NA, dim=c(length(O_vars),7,length(Ssamples))) 
  dimnames(resultsArray_ATT)[[1]] <- c(O_vars)
  dimnames(resultsArray_ATT)[[2]] <- c('ATT', 'se', 'pval', 'Y1', 'Y0', 'SampleSize', 'Sign.')
  results <- array(NA, dim=c(length(O_vars),4))

  balance_matrix_w <- array(NA, dim=c(length(list_vars),length(Ssamples))) # S x P x Q2
  dimnames(balance_matrix_w)[[1]] <- c(list_vars)
  
  results_bv_W <- array(NA, dim=c(1,length(list_vars)))
  
  balance_matrix_un <- array(NA, dim=c(length(list_vars),length(Ssamples))) # S x P x Q2
  dimnames(balance_matrix_un)[[1]] <- c(list_vars)
  
  results_bv_un <- array(NA, dim=c(1,length(list_vars)))
  
### 1. Counting of ATTs 
Sesample <- esample

###Four sub-samples based on the waiting time until participation in the evaluated measure (cutoffs p25, p50, p75): 
wtc25<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.25)
wtc50<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.50)
wtc75<-quantile(as.numeric(Sesample$exita)-as.numeric(Sesample$entry),na.rm=TRUE, probs=0.75)

 minToP<-min(as.numeric(Sesample$entrya)-as.numeric(Sesample$entry), na.rm = TRUE)

partic<-Sesample[Sesample$treated==TRUE,]
nonpart<-Sesample[Sesample$treated==FALSE,]

#NonPart majú iba spodné kritérium minimálnej dĺžky nezamestnanosti 
esample1<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>minToP,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc25,])
esample2<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc25,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc25 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc50,])
esample3<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc50,], 
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc50 &
                         as.numeric(partic$exita)-as.numeric(partic$entry)<=wtc75,])
esample4<-rbind(nonpart[as.numeric(nonpart$exit)-as.numeric(nonpart$entry)>=wtc75,],
                partic[as.numeric(partic$exita)-as.numeric(partic$entry)>wtc75,])

DataSample <- c()

for (s in Ssamples) {
    
    esampleS <- get(paste0('esample', s, by = ''))
    
    
    N_sp[s,] <- nrow(esampleS[esampleS$treated==TRUE,])
    
    if (mean(esampleS$treated) > 0.0001){
      esampleS <-esampleS[,c("treated", "klient_id", 
                             "entry", "exit", "entrya", 
                             list_vars)]
      
      # Month of participation since the start of the evaluation period 
      esampleS$participation_month<-((year(as.Date(esampleS$entrya))-year(ep_start))*12)+
        month(as.Date(esampleS$entrya))
      # Participation quarter
      esampleS$pcpQ<-ceiling(esampleS$participation_month/3)
      
      # Month of the start of the unemployment since the start of the evaluation period
      esampleS$Ustart_month<-((year(as.Date(esampleS$entry))-min(year(ep_start)))*12)+
        month(as.Date(esampleS$entry)) 
      # Inflow quarter 
      esampleS$infQ<-ceiling(esampleS$Ustart_month/3)
      
      #Adding unemployment history
      esampleS<-merge(esampleS, 
                      h_esample[,c("klient_id", "entry", 
                                   paste0("entry", seq(1:11), sep=""), 
                                   paste0("exit", seq(1:11), sep=""))],
                      by=c("klient_id", "entry"), all.y = FALSE)
      
      
      #Imputing the start of the unemployment spell 
      Pmatrix <- matrix(NA, nrow = nrow(esampleS[esampleS$treated==FALSE,]), ncol=max_pcpQ)
  
          for (p in 1:max_pcpQ){  
            
            # Quarter of inflow to unemployment of participants entering the programme in quarter P
            PinfQ<-unique(esampleS$infQ
                          [esampleS$treated==TRUE & esampleS$pcpQ==p])
            # Only allowing non-participants inflowing to unemployment during the quarters when participants in this sub-sample were inflowing
            #D<-esampleS[as.logical(esampleS$treated==T & esampleS$pcpQ == p) |
            #              as.logical(esampleS$treated==F & esampleS$infQ %in% PinfQ),]
            Pmatrix[,p]<-as.numeric(esampleS$infQ[esampleS$treated==FALSE]  %in% PinfQ)
          }
      
        hh <- function(j){
              sample(which(j==1),1)
        }
        
        esampleS$pcpQ[as.logical(apply(Pmatrix, 1, FUN=sum)==0) & esampleS$treated == FALSE] <- 0
        Pmatrix<-Pmatrix[as.logical(apply(Pmatrix, 1, FUN=sum)>0),]
        
        esampleS$pcpQ[as.logical(apply(Pmatrix, 1, FUN=sum)>0) & esampleS$treated == FALSE] <- apply(Pmatrix,1,hh)
        
  
        
                for (q in OQm) { 
                  
                  cond <- FALSE
                  
                    for (n in 1:11) { 
                      entry_n <- esampleS[ ,paste0('entry', n, '')]
                      exit_n <- esampleS[, paste0('exit', n, '')]
                      
                      cond <- cond | isInRegister(p = esampleS$pcpQ, q, entry_n, exit_n) 
                    }
                  
                  esampleS[ ,paste0('empl',ceiling(q/3), '')] <- ifelse(cond, 0,1)
                }
             
              for(q in OQm){
                esampleS <- esampleS[!is.na(esampleS[ ,paste0('empl',ceiling(q/3), '')]),]
              }
        
            #Pre-estimation data preparation
            #Additional outcome variables
            esampleS[ ,"cumempl"] <- rowSums(esampleS[, paste0("empl", seq(0,12,1), sep="")])
            esampleS[ ,"firstempl"] <- ceiling(as.numeric((as.Date(esampleS$exit)-
                                                      (as.Date(ep_start) + months(esampleS$pcpQ*3)))/90))    
        
            # Cleaning the Xs
            y <- data.frame(treated = esampleS[ ,'treated'])
            D <- data.frame(esampleS) #D - su tvoje data ako data.frame()
            spec <- as.formula(cbind(y,D[,c(list_vars, 
                                            paste0("empl.", seq(1,4,1), sep=""))])) 
            #je tvoja specifikacia modelu, je to object as.formula()
            colTh <- 0.8 
            #Treshold for the acceptable correlation between vars ( je maximalne tolerovana korelacia medzi dvoma premennymi)
            dumTh <- 0.0001 
            #Treshold for the acceptable concentration of dummy variables (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumTh <- 0.005
            #Treshold for the acceptable concentration of dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            TdumThN <- 5
            #Treshold for the minimal number of observations of a dummy variables in the sub-group of participants (hovori kolko (100[%] x dumTh) percent je minimalny vyskyt hodnot 1 alebo 0 pri dummy premennych. Ak je dumTh = 0.005, tak aspon pol-percent pozorovani musi mat 1 alebo 0.)
            result <- DatPrep(D = D, spec = spec, colTh = colTh, dumTh = dumTh, TdumTh = TdumTh, TdumThN = TdumThN) 
            
            for (col in colnames(result$D)){
               D <-  D[!is.na(D[,col]),]
             }  
        # ESTIMATION:
            
            d = D$treated*1
            x = as.matrix(D[,c(list_vars, paste0("empl.", seq(1,4,1), sep=""))])
            y_mat <- D[,c(paste0("empl.", seq(1,4,1), sep=""), 
                                    paste0("empl", seq(0,12,1), sep=""), 
                                    'firstempl', 'cumempl')]
            
            att <- treatweight_pmp(y = y_mat, d, x, s = NULL, z = NULL, selpop = FALSE, trim = 0.05, ATET = TRUE, logit = TRUE, boot = 10)
            #att <- treatselDML(y = y_mat, d, x, s = d, z=x, selected=1)
            
            resultsArray_ATT[,1,s] <- round(att$effect,3)
            resultsArray_ATT[,2,s] <- round(att$se,3)
            resultsArray_ATT[,3,s] <- round(att$pval,3)
            resultsArray_ATT[,4,s] <- round(att$y1,3)
            resultsArray_ATT[,5,s] <- round(att$y0,3)
            resultsArray_ATT[,6,s] <- format(length(d)-att$ntrimmed, big.mark=" ", scientific=FALSE)
            resultsArray_ATT[,7,s] <- stars.pval(att$pval)
            
            
            DataSample <- bind_rows(DataSample, D)
            
            #Balance
            #Generating the propensity score variable
            PSmodel<-glm(result$spec, family=binomial(link = "logit"), data=D)
            #print(summary(PSmodel))
            D$PSvar<-as.numeric(PSmodel$fitted.values)
            
            #LL: old weights INCORRECT
            #w_ATE <- D$treated/D$PSvar + (1-D$treated)/(1-D$PSvar)
            
            #LL: new weights CORRECT
            w_ATE <- D$treated + (1-D$treated)*D$PSvar/(1-D$PSvar)
            
                          #Balance_vars <- colnames(result$D)[colnames(result$D) %in% list_vars]
                          for (bv in Balance_vars){
                          
                              if (apply(D[,bv,drop = FALSE] ,2,function(x) { all(x %in% c(0:1)) }) ) {
                                #unweighted discrete
                                p_treat <- apply(D[D$treated==1,bv,drop = FALSE],2,mean)
                                p_contr <- apply(D[D$treated==0,bv,drop = FALSE],2,mean)
                                balance_matrix_un[bv,s] <- abs( 100*(p_treat - p_contr )/sqrt( (p_treat*(1-p_treat) + p_contr*(1-p_contr))/2 ) )

                                #weighted discrete
                                p_treat <- t(w_ATE[D$treated==1]) %*% D[D$treated==1,bv] / sum(w_ATE[D$treated==1], na.rm = TRUE)
                                p_contr <- t(w_ATE[D$treated==0]) %*% D[D$treated==0,bv] / sum(w_ATE[D$treated==0], na.rm = TRUE)
                                
                                balance_matrix_w[bv,s] <- abs( 100*(p_treat - p_contr )/sqrt( (p_treat*(1-p_treat) + p_contr*(1-p_contr))/2 ) )
                            
                              } else {
                                #unweighted continuous
                                balance_matrix_un[bv,s] <- abs( 100*(apply(D[D$treated==1,bv,drop = FALSE],2,mean) - apply(D[D$treated==0,bv,drop = FALSE],2,mean))/
                                                  sqrt( (apply(D[D$treated==1,bv,drop = FALSE],2,sd)^2 + apply(D[D$treated==0,bv,drop = FALSE],2,sd)^2)/2 ) )

                                #weighted continuous
                                p_treat <- t(w_ATE[D$treated==1]) %*% D[D$treated==1,bv] / sum(w_ATE[D$treated==1], na.rm = TRUE)
                                p_contr <- t(w_ATE[D$treated==0]) %*% D[D$treated==0,bv] / sum(w_ATE[D$treated==0], na.rm = TRUE)
                                p_treat_var <- ( sum(w_ATE[D$treated==1]) / (sum(w_ATE[D$treated==1])^2 - sum(w_ATE[D$treated==1]^2)) )* 
                                  t(w_ATE[D$treated==1]) %*% ((D[D$treated==1,bv] - c(p_treat))^2)
                                p_contr_var <- ( sum(w_ATE[D$treated==0]) / (sum(w_ATE[D$treated==0])^2 - sum(w_ATE[D$treated==0]^2)) ) * 
                                  t(w_ATE[D$treated==0]) %*% ((D[D$treated==0,bv] - c(p_contr))^2)
                                
                                balance_matrix_w[bv,s] <- abs(100* (p_treat - p_contr ) / 
                                            sqrt( (p_treat_var + p_contr_var )/2 ) )
                              }

                          }
    }
} 
  

for (iQ in 1:length(O_vars)){
      results[iQ,1] <- sum(sum((as.numeric(resultsArray_ATT[iQ,'ATT',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      results[iQ,2] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'se',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      results[iQ,3] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'Y1',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
      results[iQ,4] <-sum(sum((as.numeric(resultsArray_ATT[iQ,'Y0',])*(N_sp/N)), na.rm = TRUE), na.rm = TRUE)
}

results<-cbind(O_vars, results) 
colnames(results) <- c('O_vars', 'ATT', 'se','Y1', 'Y0')

results_bv <- array(NA, dim=c(length(list_vars),2)) 
dimnames(results_bv)[[1]] <- c(list_vars)
dimnames(results_bv)[[2]] <- c("unweighted", "weighted")


for (bVar in 1:length(list_vars)){
      results_bv[bVar,1] <- abs(sum(sum((balance_matrix_w[bVar,]*(N_sp/N)), na.rm=TRUE), na.rm=TRUE))
      results_bv[bVar,2] <- abs(sum(sum((balance_matrix_un[bVar,]*(N_sp/N)), na.rm=TRUE), na.rm=TRUE))
}

We can explore the sensitivity of our results by displaying the average treatment effects on the treated (ATTs) computing by different approach by inverse probability weighting.

Graph 10: Balance of the groups of participants and eligible before and after weighting x

BVDF <- data.frame(balance_vars = c(rownames(results_bv),rownames(results_bv)),
                   balance = c(results_bv[,1], results_bv[,2]),
                   Balance = c(rep("before weighting",nrow(results_bv)),rep("after weighting",nrow(results_bv))))
                          
BVDF %>% subset(!is.na(balance)) %>% 
    mutate(balance_vars = fct_reorder(balance_vars, balance)) %>%
    ggplot(aes(x=balance, y=balance_vars,col=Balance)) + 
    geom_point() +
    ylab('Variables')+
    xlab('Absolute Standardized Meand Difference')+
    theme_minimal()+
    theme(plot.caption = element_text(hjust = 0), legend.position = "top")+
    labs(caption="Source: ÚPSVaR")+
    geom_vline(xintercept = 0.0, color = 'darkgrey')

We show a proxy of the employment rate to see a share of persons out of the register of unemployed JSs observed at the first day of quarters before and after the start of participation (Quarter 0).

Graph 11: The share of individuals out of the unemployment register (proxy for employment rate)

graph <- matrix(NA, nrow=length(O_vars[1:17]), ncol = 2)
rownames(graph) <- O_vars[1:17]
for(o in O_vars[1:17]){
  graph[o,1] <- apply(DataSample[DataSample$treated == TRUE,o,drop = FALSE], 2, mean, na.rm = TRUE)
  graph[o,2] <- apply(DataSample[DataSample$treated == FALSE,o,drop = FALSE], 2, mean, na.rm = TRUE)
}

graph <- data.frame(cbind(O_vars[1:17],graph))
colnames(graph) <- c('O_vars','treated', 'untreated')
graph$treated <- as.numeric(as.character(graph$treated))
graph$untreated <- as.numeric(as.character(graph$untreated))
graph$O_vars <- factor(graph$O_vars, levels=c(O_vars[1:17]))
graph$Q<-seq(-4,12,1)

graphER<- ggplot(data=graph) +
  geom_line(aes(x=Q, y=treated, colour="Treated"), size = 1 , group = 1) +
  geom_line(aes(x=Q, y=untreated, colour="Control group"), size = 1 , group = 1) +
  scale_x_continuous(breaks=seq(-4,12,1))+
  theme_minimal()+
  labs(
       y = "Share of treated",
       x = "Quarters before and after the start of the participation (0)",
       colour = "Group", 
       caption="Source: ÚPSVaR") +
  theme(plot.caption = element_text(hjust = 0), legend.position = "top")

graphER

Graph 12: The average treatment effects on the treated (ATTs) estimated for the participation in the Support for hiring disadvantaged job seekers during 2017

graphATT

Finaly, we report the ATTs in table computing computing by different approach by inverse probability weighting.

Table 12: The average treatment effect on the treated

resultsDF %>% kbl(format = 'html', booktabs = TRUE , align = 'c') %>%
  column_spec(1,  border_right = TRUE) %>%
  kable_classic('hover', full_width = FALSE) %>%
  row_spec(1:nrow(resultsDF), font_size = '1cm') %>%
  column_spec(1:ncol(resultsDF),width = "1.1cm") %>%
  kableExtra::footnote(number = paste('Significance codes (sig.):',  0.0000, ' " *** " ', 0.001, ' " ** " ', 0.01, ' " * " ', 0.05, ' " . " ', 0.1, '" "',  1))

	efekt	se	sig.
empl.4	0.000	0.000
empl.3	0.000	0.000
empl.2	0.000	0.000
empl.1	0.000	0.000
empl0	-0.104	0.010	***
empl1	-0.165	0.012	***
empl2	-0.157	0.016	***
empl3	-0.098	0.021	***
empl4	-0.060	0.024
empl5	0.088	0.026	***
empl6	0.363	0.026	***
empl7	0.301	0.020	***
empl8	0.238	0.022	***
empl9	0.208	0.021	***
empl10	0.178	0.021	***
empl11	0.179	0.022	***
empl12	0.181	0.021	***
firstempl	-1.064	0.174	***
cumempl	1.153	0.148	***
¹ Significance codes (sig.): 0 " * " 0.001 " " 0.01 " * " 0.05 " . " 0.1 " " 1

References

Daniel E. Ho, Kosuke Imai, Gary King, Elizabeth A. Stuart (2011). MatchIt: Nonparametric Preprocessing for Parametric Causal Inference. Journal of Statistical Software, Vol. 42, No. 8, pp. 1-28. https://doi.org/10.18637/jss.v042.i08

Frölich, M., Huber, M. (2014): “Treatment Evaluation With Multiple Outcome Periods Under En- dogeneity and Attrition”, Journal of the American Statistical Association, 109, 1697-1711.

https://bookdown.org/yihui/rmarkdown-cookbook/parameterized-reports.html

LMP Qualitative data in MS Excel, downloaded from (accessed at 14th of April 2021) : https://ec.europa.eu/social/main.jsp?catId=1143&intPageId=3227&langId=en