Logo
User Name

Dženan Hamzić

Društvene mreže:

Dzenan Hamzic, Markus Wurzenberger, Florian Skopik, Max Landauer, Andreas Rauber

The rapid advancement of Large Language Models (LLMs) has transformed natural language processing, yet comprehensive evaluation methods are necessary to ensure their reliability, particularly in Retrieval-Augmented Generation (RAG) tasks. This study aims to evaluate and compare the performance of open-source LLMs by introducing a rigorous evaluation framework. We benchmark 20 LLMs using a combination of established metrics such as BLEU, ROUGE, BERTScore, along with and a novel metric, RAGAS. The models were tested across two distinct datasets to assess their text generation quality. Our findings reveal that models like nous-hermes-2-solar-10.7b and mistral-7b-instruct-v0.1 consistently excel in tasks requiring strict instruction adherence and effective use of large contexts, while other models show areas for improvement. This research contributes to the field by offering a comprehensive evaluation framework that aids in selecting the most suitable LLMs for complex RAG applications, with implications for future developments in natural language processing and big data analysis.

The number of loan requests is rapidly growing worldwide representing a multi-billion-dollar business in the credit approval industry. Large data volumes extracted from the banking transactions that represent customers’ behavior are available, but processing loan applications is a complex and time-consuming task for banking institutions. In 2022, over 20 million Americans had open loans, totaling USD 178 billion in debt, although over 20% of loan applications were rejected. Numerous statistical methods have been deployed to estimate loan risks opening the field to estimate whether machine learning techniques can better predict the potential risks. To study the machine learning paradigm in this sector, the mental health dataset and loan approval dataset presenting survey results from 1991 individuals are used as inputs to experiment with the credit risk prediction ability of the chosen machine learning algorithms. Giving a comprehensive comparative analysis, this paper shows how the chosen machine learning algorithms can distinguish between normal and risky loan customers who might never pay their debts back. The results from the tested algorithms show that XGBoost achieves the highest accuracy of 84% in the first dataset, surpassing gradient boost (83%) and KNN (83%). In the second dataset, random forest achieved the highest accuracy of 85%, followed by decision tree and KNN with 83%. Alongside accuracy, the precision, recall, and overall performance of the algorithms were tested and a confusion matrix analysis was performed producing numerical results that emphasized the superior performance of XGBoost and random forest in the classification tasks in the first dataset, and XGBoost and decision tree in the second dataset. Researchers and practitioners can rely on these findings to form their model selection process and enhance the accuracy and precision of their classification models.

...
...
...

Pretplatite se na novosti o BH Akademskom Imeniku

Ova stranica koristi kolačiće da bi vam pružila najbolje iskustvo

Saznaj više