Case Studies: AI-Assisted Programming & LLM SQL Generation

Transforming Software Development with AI Tools

Executive Summary

Large Language Models (LLMs) have revolutionized natural language querying for SQL generation. This study demonstrates that strategies like Retrieval-Augmented Generation (RAG) significantly improve query accuracy. Even mid-sized models (7B parameters) achieved high performance with optimization techniques, presenting cost-efficient, scalable solutions for database querying.

Experiment Design

The experiment tested SQL generation across models ranging from 1B to 90B parameters using 85 natural language questions against a complex schema:

Without Entity Mapping or RAG
With Entity Mapping
With Retrieval-Augmented Generation (RAG)

The database schema (ArcGisMasterView) included fields related to property sales, ownership, and zoning details.

Key Findings

Performance Without Optimization:

Smaller models (1B–11B) struggled with only 1%–5% accuracy.
Larger models (70B–90B) performed slightly better, reaching up to 2% accuracy.

Impact of Entity Mapping:

Improved performance significantly for 9B+ models.
gemma2-9b-it and llama-3.1-70b models achieved 97% accuracy.

Benefits of Retrieval-Augmented Generation (RAG):

gemma-7b-it reached 89.41% accuracy.
llama3-70b and llama3-90b achieved 100% accuracy.

Model Performance Results

Model	Without Entity Mapping	With Entity Mapping	With RAG
llama-3.2-1b-preview	1.18%	4.71%	23.53%
llama-3.2-3b-preview	4.71%	57.65%	41.18%
gemma-7b-it	0.00%	25.88%	89.41%
llama-3.1-8b-instant	0.00%	34.12%	67.06%
gemma2-9b-it	1.18%	97.65%	97.65%
llama3-70b-tool-use-preview	1.18%	91.76%	100.00%
llama-3.1-70b-versatile	1.18%	98.82%	98.82%
llama-3.2-90b-text-preview	2.35%	98.82%	100.00%

Analysis

The experiment highlights RAG as a transformative strategy for improving LLM accuracy in SQL generation, particularly for mid-sized models (7B–11B parameters). Combining entity mapping with RAG pushes accuracy rates near perfection for larger models.

Conclusion

RAG and entity mapping dramatically improve the performance of LLMs for SQL generation, enabling high accuracy at scale. Mid-sized models can deliver efficient, cost-effective results with these optimization techniques.