Project Overview
his project explores the relationship between career progression factors and employee retention using the "HR-Employee-Attrition" dataset. The main goal is to understand how factors such as job level, years in a current role, years since the last promotion, monthly income, and stock option levels influence the length of tenure at a company. The analysis employs both statistical tests (T-tests) and regression models to draw insights into these relationships.
Objectives
- Identify significant career progression factors that predict employee tenure.
- Determine the impact of financial incentives on employee retention.
- Provide actionable insights for improving retention strategies through career development.
Key Results
- The regression analysis indicates that job level, years in a current role, and years since the last promotion significantly predict the number of years employees stay at a company.
- The Durbin-Watson statistic suggests no autocorrelation in the residuals, confirming the independence of variables.
- The R-squared value of 0.684 shows that the model explains 68.4% of the variance in employee tenure.
- Stock option levels were found to be statistically insignificant in predicting tenure.
Methodology
- Exploratory Data Analysis (EDA): Initial analysis to understand the dataset, including size, shape, data types, and descriptive statistics.
- T-tests: Two-sample tests to compare tenure between employees who left and those who stayed.
- Regression Models: Multivariate linear regression to assess the impact of various predictors on employee tenure.
Skills Demonstrated
- Data Analysis: Comprehensive exploratory analysis to identify patterns and outliers.
- Statistical Testing: Conducting and interpreting T-tests to compare means between groups.
- Regression Modeling: Building and validating linear regression models to predict outcomes based on multiple predictors.
- Data Visualization: Creating plots (box plots, scatter plots, histograms) to illustrate findings and enhance understanding.
Technologies Used
- Python (pandas, numpy, matplotlib, seaborn, statsmodels)
- Jupyter Notebook
GitHub Repository
Find the full code on GitHub.
Visualizations