Absenteeism at Work

Sigma
8 min readApr 9, 2022

Project contributors: Shrish Sharma, Samrudh Rahul, Krithik Chopra

Rewind a year and a half, and we were surrounded by reports of individuals, including our loved ones, losing their jobs owing to the pandemic. But have you ever pondered why this month’s pattern was mass firing? It’s because of the financial losses that businesses have suffered or are likely to suffer. This was owing to a large imbalance between supply and demand, as well as government limitations for establishing factories and businesses. Companies were unable to earn a profit even while operating at a breakeven position. As a result, the businesses attempted to cut expenses in every way they could, and the first place they looked was at employee pay. For a long time, no company was able to produce enough work for all its employees.

So, consider the reasons why corporations terminate their most important asset, their people. Is it founded on personal experience? Is it only dependent on age? According to one poll, firms utilised the epidemic to punish their workers by terminating employees with irregular attendance, regardless of age or experience. This term is called the ‘Absenteeism at Work’.

Absenteeism refers to absence from work that extends beyond what would be considered reasonable. Companies don’t encourage this especially if it is a paid leave or if the employee absents himself/herself on the busy days of the year. Companies, usually MSME (that have just one worker for a certain position, and if he/she is missing, the workflow is disrupted) severely prohibit absenteeism unless there are good reasons, and they are within a control limit. As a result, terminating these sporadic employees also offers those who work very effectively a sense of accomplishment. Some, of the top reasons for which absenteeism may occur are:

• Mental and physical sickness: Overworked employees with high-responsibility roles sometimes need a mental break due to high stress and lack of appreciation for their contributions. Physical sickness may occur for everyone and is quite natural to absent themselves to take rest.

• Harassment: Employees who feel cornered by senior management or co-workers are more likely to take time off to avoid the unpleasant situation.

• Loved One’s sickness: Employees might have to miss days of work if they must look after their loved ones when they become sick.

• Hangover: According to the National Institute of Mental Health, depression is the leading cause of absenteeism. This syndrome frequently leads to drug and alcohol misuse, resulting in further missed workdays.

• Lack of interest: Employees who feel dispassionate about their jobs are likely to blow off work simply due to the lack of motivation.

• Injuries or illnesses: Employees don’t come to work for a variety of reasons, including illness, injuries, and doctor’s visits, especially during flu season.

Now that it’s obvious that absenteeism affects a company in multiple ways. Let us look at how company suffers this setback:

• Poor Quality of Work: It is obvious that when a worker is absent another co-worker who must look after his work additionally is subjected to stress and pressure and this eventually leads to poor quality of work.

• Financial Loss for The Company And Employee: The corporation is required to pay the employee’s wage even though he or she is not there, which is a loss for the company. If an employee does not receive the paid leave for a variety of reasons, he/she consequently loses his/her pay, which has an impact on his/her spending patterns.

• Negative Company Culture: When a particular employee is absent quiet frequently, his/her workload is always shifted to co-workers, and this creates a hatred for the employee among co-workers. This affects the company culture.

• Demotivated And Demoralized Employees: When an employee absents himself/herself frequently, his/her pressure increases gradually and one point of time, he/she transfers it to other employees, spoiling the entire enthusiasm of other workers and gradually leads to worse productivity levels.

Let’s analyse a dataset on what medical grounds employees absent themselves. This is a dataset containing 740 records of why employees absent themselves. The following is the dataset description:

1. Individual identification (ID)
2. Reason for absence (ICD).

Absences attested by the International Code of Diseases (ICD) stratified into 21 categories (I to XXI) as follows:

I Certain infectious and parasitic diseases
II Neoplasms
III Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism
IV Endocrine, nutritional, and metabolic diseases
V Mental and behavioral disorders
VI Diseases of the nervous system
VII Diseases of the eye and adnexa
VIII Diseases of the ear and mastoid process
IX Diseases of the circulatory system
X Diseases of the respiratory system
XI Diseases of the digestive system
XII Diseases of the skin and subcutaneous tissue
XIII Diseases of the musculoskeletal system and connective tissue
XIV Diseases of the genitourinary system
XV Pregnancy, childbirth, and the puerperium
XVI Certain conditions originating in the perinatal period
XVII Congenital malformations, deformations, and chromosomal abnormalities
XVIII Symptoms, signs, and abnormal clinical and laboratory findings, not elsewhere classified
XIX Injury, poisoning and certain other consequences of external causes
XX External causes of morbidity and mortality
XXI Factors influencing health status and contact with health services.

And 7 categories without (CID) patient follow-up (22), medical consultation (23), blood donation (24), laboratory examination (25), unjustified absence (26), physiotherapy (27), dental consultation (28).
3. Month of absence
4. Day of the week (Monday (2), Tuesday (3), Wednesday (4), Thursday (5), Friday (6))
5. Seasons
6. Transportation expense
7. Distance from Residence to Work (kilometers)
8. Service time
9. Age
10. Workload Average/day
11. Hit target
12. Disciplinary failure (yes=1; no=0)
13. Education (high school (1), graduate (2), postgraduate (3), master and doctor (4))
14. Son (number of children)
15. Social drinker (yes=1; no=0)
16. Social smoker (yes=1; no=0)
17. Pet (number of pet)
18. Weight
19. Height

20. Body mass index
21. Absenteeism time in hours- target variable

Cleaning and preprocessing

First let us look at the count of each reason to look whether any reason has very less count so that boot strapping can be done.

Also, we examine month of absence and day of week for any trend.

We then look for the day on which people frequently absent themselves.

So, Monday is the day with high absenteeism rate (some good old Monday blues playing out here xD).

We then find out the value distribution for ‘Disciplinary failure’ , ‘Social Smoker’ , ‘Social Drinker’ columns.

We also look for which seasons have high absenteeism. We observe that winter has the highest rate of absenteeism due to high chances of getting body ailments.

We eliminate outliers in columns: Transportation expense, Distance from residence to work, Service Time, Age, Work load avg./day, Height.
‘Disciplinary failure’ column is dropped as it contains only one value.

We then look for relationship between transportation expense and target variable, absenteeism time in hours.

The graph is very random, and nothing could be inferred from it so we now move onto analyse correlation between independent variables to eliminate rows with high correlation.

From this we can clearly understand there is a strong correlation between weight and Body Mass Index but this is well known before. So, we can omit ‘Weight’ column for training the model.

Feature Extraction:

We drop the ‘ID’ column because that would be misinterpreting as it is nothing to do with dependent variable, i.e absenteeism in hours.
We also drop the ‘social smoker’ and ‘seasons’ column as that has merely 1% correlation with target variable.

Training and Testing the data:

We now split the dataset into two data frames where one contains information about independent variables and other the target variables. We further split them into training and test data with test data containing 25% of values. We then implement linear regression, decision tree, support vector, lasso regression, ridge regression, random forest regressor models on the training dataset and look for mean square score.

Out of these we find decision tree regression model gives a root mean square value of 0.246 and the second highest by random forest with 2.47. So here we notice a huge difference in the scores. To check if data is overfitting or underfitting, we now go for checking out test data scores (mean square error).

So here model 0 is linear regression with model 5 being random forest regressor.

Conclusion on dataset:

So we can say random forest regression model has the best mean square error and this is the best algorithm for the given dataset. Decision tree is overfitting as evident from the difference in training and test data scores.

Ways to reduce absenteeism:

• Take disciplinary action: Arrange a one-on-one discussion with the employee as soon as a pattern of absence arises to address it and determine the core cause. If an employee’s absences do not improve, disciplinary action, including dismissal, may be taken.

• Sessions for employees: Psychological sessions for employees with psychiatrists can be made mandatory in all organization and this should be more helpful to those dealing with family and health issues.

• High morale: Incorporating health and wellness programs, personal development opportunities, workplace culture events, and initiatives focused on employee well-being will raise the standards of work of employee and instil a good morale among the workforces. Another approach may be to commend the work of punctual employees.

--

--