Local RAG

A Python code snippet showing how to use the local Retrieval Augmented Generation (RAG) gist

Picking up a few certs at work recently, I had to complete a lab on agentic RAG via the Responses API. The original lab instructions called for using OpenAI to generate the embeddings and Pinecone as the vector store, but since I was footing the bill for the tokens and I really didn’t want to sign up for a Pinecone account for a single short project I decided I’d just use a quick and dirty local implementation instead. In my version I use SentenceTransformers models with a retrieve and re-rank approach to semantic search.

The code’s simple enough that it’s not really worth putting up in a repo, but I did post it as a gist if anyone’s interested, based on the medical dataset used to train HuatuoGPT-o1. The structure of the local VS might seem odd, but I wrote it to be (mostly) compatible with Pinecone’s API so I could just swap it in for the lab. It might be useful if you’re interested in “sovereign” data and AI for example and want to DIY, or maybe as the starting point for a mocked Pinecone test integration.

If you’d like to see the vector store in action, here’s the video I submitted to complete the lab. For the complete sovereign data & AI solution, you’d probably want to swap out the calls to OpenAI with a local LLM e.g. Phi.

@misc{chen2024huatuogpto1medicalcomplexreasoning,
title={HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs},
author={Junying Chen and Zhenyang Cai and Ke Ji and Xidong Wang and Wanlong Liu and Rongsheng Wang and Jianye Hou and Benyou Wang},
year={2024},
eprint={2412.18925},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.18925},
}