Machine Learning
FastAPI
Streamlit
MLOps
A full end-to-end machine learning system for credit risk modeling that predicts:
- Probability of Default (PD) using XGBoost
- Loss Given Default (LGD) using LightGBM
- Expected Loss (PD × LGD)
- Customer Segmentation using KMeans
- Behavioural Anomaly Detection using Isolation Forest
- Autoencoder embeddings for deep behavioural representation
- Default-rate time-series forecasting with ARIMA
The project includes a complete data → model → deployment pipeline with a FastAPI microservice for real-time scoring and a Streamlit dashboard with human-friendly inputs, tooltips, and live predictions.
System Architecture
- DuckDB for fast SQL-based feature engineering
- XGBoost & LightGBM for PD/LGD modeling
- PyTorch Autoencoder for behavioural embeddings
- statsmodels for ARIMA forecasting
- FastAPI REST service with model caching
- Dockerized backend for reproducible deployment
- Streamlit UI for scoring and visualizations
This project showcases full-stack ML engineering: data cleaning, feature pipelines, supervised + unsupervised models, deep learning, microservice deployment, dashboarding, and documentation.
Tech Stack
Python, DuckDB, XGBoost, LightGBM, PyTorch, scikit-learn, statsmodels, FastAPI, Streamlit, Docker.