OverPrompt is an IPL-focused cricket assistant that combines ball-by-ball Cricsheet data with a tool-using LLM agent. It lets users ask natural-language questions about IPL matches and players, then answers using deterministic statistics computed from structured data instead of hallucinated web knowledge.
The system is built as a full LLM application with a clean separation of layers:
- Data layer: Cricsheet IPL YAML ingested into a single Parquet file for ball-by-ball analysis.
- Analytics layer: Python/Pandas tools for player stats, team performance, top-N rankings, and match-level summaries.
- LLM agent layer: A custom planner that interprets the userβs question, routes to the right tools, and formats safe, non-hallucinated answers.
- Visualization layer: Momentum worms, phase-dominance charts, H2H comparisons, pressure curves, and win-probability plots.
- UI layer: A Streamlit web app that exposes the agent as a chat-style interface.
This project demonstrates practical LLM skills: tool calling, hybrid semantic + symbolic entity resolution, multi-provider LLM configuration (OpenAI / Gemini / Ollama), and prompt design that forces the model to stay grounded in the underlying IPL dataset.
Tech stack: Python, Pandas, PyArrow/Parquet, rapidfuzz, jellyfish, sentence-transformers, OpenAI/Gemini, Streamlit.