The increasing number of academic papers poses significant challenges for researchers to efficiently acquire key details. While retrieval augmented generation (RAG) shows great promise in large language model (LLM) based automated question answering, previous works often isolate neural and symbolic retrieval despite their complementary strengths. Moreover, conventional single-view chunking neglects the rich structure and layout of PDFs, e.g., sections and tables. In this work, we propose NeuSym-RAG, a hybrid neural symbolic retrieval framework which combines both paradigms in an interactive process. By leveraging multi-view chunking and schema-based parsing, NeuSym-RAG organizes semi-structured PDF content into both the relational database and vectorstore, enabling LLM agents to iteratively gather context until sufficient to generate answers. Experiments on three full PDF-based QA datasets, including a self-annotated one AirQA-Real, show that NeuSym-RAG stably defeats both the vector-based RAG and various structured baselines, highlighting its capacity to unify both retrieval schemes and utilize multiple views.
The multiview document parsing process transforms raw PDF files into a structured database (DuckDB) through a comprehensive pipeline:
The retrieved metadata, parsed elements and predicted summaries will all be populated into the symbolic DB. We handcraft the DB schema in advance, which is carefully designed and universal for PDF documents.
The multimodal vector encoding process transforms structured database content into vector representations:
These data entries will be inserted into the VS, categorized into different collections based on the encoding model and modality.
After the population of the database and vectorstore, we can start the iterative agent interaction process. The RAG agent can proactively retrieves context from both the DB and VS. In each turn, agents predict one action to interact with the environment and obtain the real-time observation. Five parameterized actions are supported during the interaction, including:
{
"action_type": "RetrieveFromDatabase",
"parameters": {
"sql": "SELECT ..." // str, required
}
}
{
"action_type": "RetrieveFromVectorstore",
"parameters": {
"query": "...", // str, required
"collection_name": "...", // str, required
"table_name": "...", // str, required
"column_name": "...", // str, required
"filter": "", // str, optional
"limit": 5 // int, optional
}
}
{
"action_type": "CalculateExpr",
"parameters": {
"expr": "2 + 3 * 4" // str, required
}
}
{
"action_type": "ViewImage",
"parameters": {
"paper_id": "...", // str, required
"page_number": 1, // int, required
"bounding_box": [] // List[float], optional
}
}
{
"action_type": "GenerateAnswer",
"parameters": {
"answer": ... // Any, required
}
}
We evaluate NeuSym-RAG and Classic-RAG on three full-PDF QA datasets using various LLMs. NeuSym-RAG consistently outperforms Classic-RAG across all datasets and models. Notably:
Model | AirQA-Real | M3SciQA | SciDQA | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
text | table | image | formula | metadata | AVG | table | image | AVG | table | image | formula | AVG | |
Classic-RAG | |||||||||||||
GPT-4o-mini | 12.3 | 11.9 | 12.5 | 16.7 | 13.6 | 13.4 | 17.9 | 10.6 | 15.6 | 59.4 | 60.4 | 59.3 | 59.8 |
GPT-4V | 13.2 | 13.9 | 10.0 | 13.9 | 13.6 | 14.7 | 12.1 | 8.8 | 11.1 | 56.6 | 56.8 | 58.1 | 57.4 |
Llama-3.3-70B-Instruct | 8.7 | 7.9 | 9.5 | 16.7 | 0.0 | 10.0 | 12.7 | 8.1 | 11.3 | 56.8 | 58.8 | 58.9 | 58.0 |
Qwen2.5-VL-72B-Instruct | 9.6 | 5.9 | 11.9 | 11.1 | 13.6 | 10.5 | 11.6 | 11.6 | 11.6 | 54.8 | 56.9 | 56.3 | 56.2 |
DeepSeek-R1 | 11.7 | 13.9 | 9.5 | 30.6 | 9.1 | 13.9 | 11.9 | 9.5 | 11.2 | 63.9 | 61.3 | 61.7 | 62.4 |
NeuSym-RAG | |||||||||||||
GPT-4o-mini | 33.0 | 12.9 | 11.9 | 19.4 | 18.2 | 30.7 | 18.7 | 16.6 | 18.0 | 63.0 | 63.6 | 62.5 | 63.0 |
GPT-4V | 38.9 | 18.8 | 23.8 | 38.9 | 27.3 | 37.3 | 13.7 | 13.4 | 13.6 | 62.6 | 63.5 | 63.2 | 63.1 |
Llama-3.3-70B-Instruct | 30.6 | 11.9 | 16.7 | 16.7 | 27.3 | 29.3 | 26.3 | 17.6 | 23.6 | 55.5 | 57.3 | 56.6 | 56.4 |
Qwen2.5-VL-72B-Instruct | 43.4 | 15.8 | 11.9 | 25.0 | 27.3 | 39.6 | 20.2 | 22.7 | 21.1 | 60.2 | 60.6 | 61.8 | 60.5 |
DeepSeek-R1 | 33.2 | 16.8 | 11.9 | 27.8 | 18.2 | 32.4 | 19.0 | 13.7 | 17.4 | 64.3 | 64.6 | 63.9 | 64.5 |
To further analyze the contribution of each component, we compare NeuSym-RAG with a series of structured agent baselines. The illustration below summarizes all baselines compared in the ablation study, differing in retrieval paradigm, multi-view support, and whether iterative interaction is allowed.
On the whole, NeuSym-RAG defeats all adversaries. It verifies that multiple views and combining two retrieval strategies both contribute to the eventual performance. Notably:
Method | Neural | Symbolic | Multi-view | # Interaction(s) | sgl. | multi. | retr. | subj. | obj. | AVG |
---|---|---|---|---|---|---|---|---|---|---|
Question only | ❌ | ❌ | ❌ | 1 | 5.7 | 8.0 | 0.4 | 9.4 | 2.7 | 4.0 |
Title + Abstract | 1 | 5.7 | 14.0 | 0.0 | 13.1 | 3.6 | 5.4 | |||
Full-text w/. cutoff | 1 | 28.3 | 10.7 | 0.4 | 26.2 | 7.6 | 11.2 | |||
Classic RAG | ❌ | ❌ | 1 | 18.2 | 4.0 | 9.4 | 8.4 | 11.0 | 10.5 | |
Iterative Classic RAG | ≥2 | 8.2 | 10.0 | 15.2 | 5.6 | 13.2 | 11.8 | |||
Two-stage Neu-RAG | ❌ | 2 | 19.5 | 10.0 | 5.3 | 15.9 | 9.4 | 10.7 | ||
Iterative Neu-RAG | ≥2 | 37.7 | 18.7 | 48.4 | 32.7 | 38.3 | 37.3 | |||
Two-stage Sym-RAG | ❌ | 2 | 12.2 | 5.4 | 9.4 | 10.6 | 8.7 | 9.1 | ||
Iterative Sym-RAG | ≥2 | 32.1 | 14.7 | 33.6 | 27.1 | 28.3 | 28.0 | |||
Graph-RAG | ❌ | 2 | 22.2 | 11.1 | 0.0 | 21.1 | 11.5 | 15.6 | ||
Hybrid-RAG | 2 | 23.3 | 9.3 | 5.7 | 16.8 | 10.5 | 11.8 | |||
NeuSym-RAG(Ours) | ≥2 | 28.3 | 32.3 | 58.2 | 27.1 | 42.6 | 39.6 |
@inproceedings{cao-etal-2025-neusym,
title = "{N}eu{S}ym-{RAG}: Hybrid Neural Symbolic Retrieval with Multiview Structuring for {PDF} Question Answering",
author = "Cao, Ruisheng and
Zhang, Hanchong and
Huang, Tiancheng and
Kang, Zhangyi and
Zhang, Yuxin and
Sun, Liangtai and
Li, Hanqi and
Miao, Yuxun and
Fan, Shuai and
Chen, Lu and
Yu, Kai",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.acl-long.311/",
pages = "6211--6239",
ISBN = "979-8-89176-251-0"
}