LLM benchmarking highlights balance of speed, security & cost
AI experts at Insiders Technologies have published the results of their latest benchmarking of large language models (LLMs) for the insurance and financial sectors, focusing on areas including performance, data protection, and costs.
The second edition of the LLM benchmarking tool from Insiders Technologies evaluated 25 models with data drawn from real-world insurance and financial documents. The assessment built upon the previous quarter's work, which had emphasised information classification and extraction, by also examining speed, data protection, and cost structure in productive environments for Intelligent Document Processing (IDP).
A key finding of the analysis was the noted efficiency of Gemini 2 Flash, particularly its high output and processing speed, which the experts found advantageous for managing large volumes of data under time constraints.
The benchmarking revealed that globally available models maintained a lead in overall rankings. Anthropic's Claude 3.7 Sonnet earned first place with a score of 90.17, narrowly ahead of January's leading model, Claude Sonnet 3.5, at 89.61 points. Third in the ranking was OpenAI's GPT-40, which scored 86.33.
In addition to commercial LLMs, the comparative study included the Insiders Private LLM. This model, hosted within the ISO 27001-certified Insiders cloud, is optimised to prioritise data protection and compliance requirements. It is described as a deliberate compromise for processing high-sensitivity documents such as SEPA mandates and medical data, with a focus on heightened data protection, local processing, operational transparency, and control - attributes considered particularly relevant to industries handling sensitive information.
The benchmarking project underscored the ongoing challenge of achieving a balance between LLM performance and information security. Insiders Technologies seeks to address this through a best-of-breed approach, integrating the leading LLMs into its products using its OvAItion Engine, which offers customers tailored options according to their needs.
"Especially in highly regulated sectors such as the insurance industry, it is not just the pure performance of an LLM that matters, but also aspects such as data protection, cost control, and processing speed," explains Dr. Alexander Lück, OvAItion/Data Management team and responsible for LLM benchmarking at Insiders Technologies.
"Our benchmark shows that there is no one-size-fits-all model. Instead, we enable our customers to find the optimal balance between efficiency and security with a customized setup of leading LLMs and our private model."
Each new model released on the market is assessed through the Insiders LLM benchmarking process, with insights gained used to inform continued product development. This process aims to promote consistent quality for the company's clientele across the insurance and finance sectors.
The Q2-2025 edition of the LLM Benchmarking assessed models on multiple criteria to help customers in highly regulated industries map their requirements to the technologies available, in line with regulatory expectations around data security and cost control. The models tested included recent high-performance offerings such as Claude 3.7 Sonnet, Gemini 2 Flash, Llama 3.3 70b, and DeepSeek.
The results will inform both the choice and integration of LLMs for automated document processing in insurance and financial institutions, whose demands for privacy, compliance and efficiency are significant due to the nature of documents handled.