BH akademski imenik

Beyond RAG for Cyber Threat Intelligence: A Systematic Evaluation of Graph-Based and Agentic Retrieval

13. 4. 2026.

0

Dzenan Hamzic, Florian Skopik, Max Landauer, Markus Wurzenberger, Andreas Rauber

Cyber threat intelligence (CTI) analysts must answer complex questions over large collections of narrative security reports. Retrieval-augmented generation (RAG) systems help language models access external knowledge, but traditional vector retrieval often struggles with queries that require reasoning over relationships between entities such as threat actors, malware, and vulnerabilities. This limitation arises because relevant evidence is often distributed across multiple text fragments and documents. Knowledge graphs address this challenge by enabling structured multi-hop reasoning through explicit representations of entities and relationships. However, multiple retrieval paradigms, including graph-based, agentic, and hybrid approaches, have emerged with different assumptions and failure modes. It remains unclear how these approaches compare in realistic CTI settings and when graph grounding improves performance. We present a systematic evaluation of four RAG architectures for CTI analysis: standard vector retrieval, graph-based retrieval over a CTI knowledge graph, an agentic variant that repairs failed graph queries, and a hybrid approach combining graph queries with text retrieval. We evaluate these systems on 3,300 CTI question-answer pairs spanning factual lookups, multi-hop relational queries, analyst-style synthesis questions, and unanswerable cases. Results show that graph grounding improves performance on structured factual queries. The hybrid graph-text approach improves answer quality by up to 35 percent on multi-hop questions compared to vector RAG, while maintaining more reliable performance than graph-only systems.

Preuzmi PDF

Vidi više

Cybersecurity Text Classification: Challenging the Perceived Superiority of LLMs Over Conventional Machine Learning

8. 12. 2025.

0

Dzenan Hamzic, Markus Wurzenberger, Florian Skopik, Max Landauer, L. Linauer, Andreas Rauber

BigData Congress [Services Society]

This paper presents a comprehensive evaluation of multilingual cybersecurity text classification using conventional machine learning (ML) models, sentence-transformer embeddings, and open-source large language models (LLMs). We construct a manually labeled dataset of English and German news articles and benchmark models across zero-shot and fewshot settings while accounting for LLM knowledge cutoffs. Our results show that classic ML models, when combined with highquality embeddings, achieve performance equal to or better than state-of-the-art LLMs. For instance, an Multi-Layer Perceptron (MLP) classifier with multilingual-e5-large embeddings reaches an F1-score of 0.99 in the pre-cutoff setting, matching Qwen2.5-72B's few-shot performance ($F 1=0.99$) post-cutoff. Notably, this level of performance is achieved with over 99% lower computational requirements. Several embedding-based ML pipelines outperform all zero-shot LLMs, highlighting their costefficiency and robustness. These findings challenge the presumed superiority of LLMs and underline the importance of cutoffaware evaluations in practical applications.

Preuzmi PDF

Vidi više

Data-Driven Predictive Analytics for Dynamic Aviation Systems: Optimising Fleet Maintenance and Flight Operations Through Machine Learning

4. 11. 2025.

4

Elmin Marevac, E. Kadušić, Nataša Živić, Dzenan Hamzic, Narcisa Hadzajlic

Future Internet

The aviation industry operates as a complex, dynamic system generating vast volumes of data from aircraft sensors, flight schedules, and external sources. Managing this data is critical for mitigating disruptive and costly events such as mechanical failures and flight delays. This paper presents a comprehensive application of predictive analytics and machine learning to enhance aviation safety and operational efficiency. We address two core challenges: predictive maintenance of aircraft engines and forecasting flight delays. For maintenance, we utilise NASA’s C-MAPSS simulation dataset to develop and compare models, including one-dimensional convolutional neural networks (1D CNNs) and long short-term memory networks (LSTMs), for classifying engine health status and predicting the Remaining Useful Life (RUL), achieving classification accuracy up to 97%. For operational efficiency, we analyse historical flight data to build regression models for predicting departure delays, identifying key contributing factors such as airline, origin airport, and scheduled time. Our methodology highlights the critical role of Exploratory Data Analysis (EDA), feature selection, and data preprocessing in managing high-volume, heterogeneous data sources. The results demonstrate the significant potential of integrating these predictive models into aviation Business Intelligence (BI) systems to transition from reactive to proactive decision-making. The study concludes by discussing the integration challenges within existing data architectures and the future potential of these approaches for optimising complex, networked transportation systems.

Preuzmi PDF

Vidi više

Enhancing Cyber Situational Awareness with AI: A Novel Pipeline Approach for Threat Intelligence Analysis and Enrichment

2025.

2

Dzenan Hamzic, Florian Skopik, Max Landauer, Markus Wurzenberger, Andreas Rauber

ARES

Preuzmi PDF

Vidi više

TTP Classification with Minimal Labeled Data: A Retrieval-Based Few-Shot Learning Approach

2025.

1

Dzenan Hamzic, Florian Skopik, Max Landauer, Markus Wurzenberger, Andreas Rauber

ARES

Preuzmi PDF

Vidi više

Evaluation and Comparison of Open-Source LLMs Using Natural Language Generation Quality Metrics

15. 12. 2024.

4

Dzenan Hamzic, Markus Wurzenberger, Florian Skopik, Max Landauer, Andreas Rauber

BigData Congress [Services Society]

The rapid advancement of Large Language Models (LLMs) has transformed natural language processing, yet comprehensive evaluation methods are necessary to ensure their reliability, particularly in Retrieval-Augmented Generation (RAG) tasks. This study aims to evaluate and compare the performance of open-source LLMs by introducing a rigorous evaluation framework. We benchmark 20 LLMs using a combination of established metrics such as BLEU, ROUGE, BERTScore, along with and a novel metric, RAGAS. The models were tested across two distinct datasets to assess their text generation quality. Our findings reveal that models like nous-hermes-2-solar-10.7b and mistral-7b-instruct-v0.1 consistently excel in tasks requiring strict instruction adherence and effective use of large contexts, while other models show areas for improvement. This research contributes to the field by offering a comprehensive evaluation framework that aids in selecting the most suitable LLMs for complex RAG applications, with implications for future developments in natural language processing and big data analysis.

Preuzmi PDF

Vidi više

Machine Learning for an Enhanced Credit Risk Analysis: A Comparative Study of Loan Approval Prediction Models Integrating Mental Health Data

4. 1. 2024.

29

Adnan Alagic, Nataša Živić, E. Kadušić, Dzenan Hamzic, Narcisa Hadzajlic, Mejra Dizdarević, Elmedin Selmanovic

Machine Learning and Knowledge Extraction

The number of loan requests is rapidly growing worldwide representing a multi-billion-dollar business in the credit approval industry. Large data volumes extracted from the banking transactions that represent customers’ behavior are available, but processing loan applications is a complex and time-consuming task for banking institutions. In 2022, over 20 million Americans had open loans, totaling USD 178 billion in debt, although over 20% of loan applications were rejected. Numerous statistical methods have been deployed to estimate loan risks opening the field to estimate whether machine learning techniques can better predict the potential risks. To study the machine learning paradigm in this sector, the mental health dataset and loan approval dataset presenting survey results from 1991 individuals are used as inputs to experiment with the credit risk prediction ability of the chosen machine learning algorithms. Giving a comprehensive comparative analysis, this paper shows how the chosen machine learning algorithms can distinguish between normal and risky loan customers who might never pay their debts back. The results from the tested algorithms show that XGBoost achieves the highest accuracy of 84% in the first dataset, surpassing gradient boost (83%) and KNN (83%). In the second dataset, random forest achieved the highest accuracy of 85%, followed by decision tree and KNN with 83%. Alongside accuracy, the precision, recall, and overall performance of the algorithms were tested and a confusion matrix analysis was performed producing numerical results that emphasized the superior performance of XGBoost and random forest in the classification tasks in the first dataset, and XGBoost and decision tree in the second dataset. Researchers and practitioners can rely on these findings to form their model selection process and enhance the accuracy and precision of their classification models.

Preuzmi PDF

Vidi više

Portfolio Optimization with Factor Views

1. 4. 2021.

0

Dzenan Hamzic

Preuzmi PDF

Vidi više

Labeled Data: A Retrieval-Based Few-Shot Learning Approach

.

0

Dzenan Hamzic, Florian Skopik, Max Landauer, Markus Wurzenberger, Andreas Rauber

Preuzmi PDF

Vidi više

with AI: A Novel Pipeline Approach for Threat Intelligence Analysis and Enrichment

.

0

Dzenan Hamzic, Florian Skopik, Max Landauer, Markus Wurzenberger, Andreas Rauber

Preuzmi PDF

Vidi više

Dženan Hamzić

Pretplatite se na novosti o BH Akademskom Imeniku