Usama Saleem

Software Engineer + UX Designer

Cutting Query Response Time by 75% with LLMs

During my time as an AI Research Engineer at the Data Driven Analysis Lab at Concordia University, I built an LLM-powered chatbot that lets users chat with their codebases. The project, now live at askgit.io, needed a complete rethink of how we ingested data. If it's not up anymore, the YouTube video is still around: https://www.youtube.com/watch?v=4lNj4ptWDos

The Problem

Large codebases are painful to navigate. Finding patterns, understanding architecture, tracing how a function gets used across hundreds of repos — traditional search tools just don't cut it. They're pattern matching, not reasoning.

The Solution: A Smarter Ingestion Pipeline

Instead of indexing every file indiscriminately, I built a pipeline that parses ASTs to pull out meaningful chunks, groups related code so context stays intact, and stores vector embeddings that capture actual semantic meaning.

The impact: query response time dropped 75%, ingestion got 95% faster, and the system handled 1,000+ repositories at once.

Presenting at FM+SE in Mexico City

Presentation at FM+SE Mexico City

The ingestion work earned me a presentation slot at the FM+SE conference in Mexico City. Audience voted it top 5%, which led to a tech partnership afterward.

What I Learned

The model is only as good as what's feeding it. Vector databases like Pinecone helped, but the real gains came from how we structured and chunked the code before embedding it. Chunk strategy, metadata choices, and update handling — those are the things I'd spend more time on upfront if I did it again.