OverPrompt – IPL Analytics & Chat Agent

LLM NLP Streamlit

OverPrompt is an IPL-focused cricket assistant that combines ball-by-ball Cricsheet data with a tool-using LLM agent. It lets users ask natural-language questions about IPL matches and players, then answers using deterministic statistics computed from structured data instead of hallucinated web knowledge.

The system is built as a full LLM application with a clean separation of layers:

Data layer: Cricsheet IPL YAML ingested into a single Parquet file for ball-by-ball analysis.
Analytics layer: Python/Pandas tools for player stats, team performance, top-N rankings, and match-level summaries.
LLM agent layer: A custom planner that interprets the user’s question, routes to the right tools, and formats safe, non-hallucinated answers.
Visualization layer: Momentum worms, phase-dominance charts, H2H comparisons, pressure curves, and win-probability plots.
UI layer: A Streamlit web app that exposes the agent as a chat-style interface.

This project demonstrates practical LLM skills: tool calling, hybrid semantic + symbolic entity resolution, multi-provider LLM configuration (OpenAI / Gemini / Ollama), and prompt design that forces the model to stay grounded in the underlying IPL dataset.

Tech stack: Python, Pandas, PyArrow/Parquet, rapidfuzz, jellyfish, sentence-transformers, OpenAI/Gemini, Streamlit.