In my previous post, I gave an overview of question answering (Q&A) systems that use LLMs over private enterprise data. I covered the architectures of these systems, the common tools developers use to build these systems when the enterprise data used is structured, i.e., data exists as records stored in some DBMS, relational or graph. I was referring to these systems as RAG systems using structured data. In this post, I cover RAG systems that use unstructured data, such as text files, pdf documents, or internal html pages in an enterprise. I will refer to these as RAG-U systems or sometimes simply as RAG-U (should have used the term RAG-S in the previous post!).
To remind readers, I decided to write these two posts after doing a lot of reading in the space to understand the role of knowledge graph (KGs) and graph DBMSs in LLM applications. My goals are (i) to overview the field to readers who want to get started but are intimidated by the area; and (ii) point to several future work directions that I find important.1
- In this post I'm only covering approaches that ultimately use retrieve some unstructured data (or a transformation of it) to put it into LLM prompts. I am not covering approaches that query a pre-existing KG directly and use the records in it as additional data into a prompt. See this post by Ben Lorica for an example. The 3 point bullet point after the "Knowledge graphs significantly enhance RAG models" describes such an approach. According to my organization of RAG approaches, such approaches would fall under RAG using structured data, since KGs are structured records.↩