Data-driven Approach to Information Sharing using Data Fusion and Machine Learning for Intrusion Detection


  • Lars Christian Andersen Norwegian University of Science and Technology
  • Katrin Franke Norwegian University of Science and Technology
  • Andrii Shalaginov Norwegian University of Science and Technology


Intrusion Detection System (IDS) sensors are employed in various locations in computer and communication networks to identify possible malicious activities. One of the main challenges with IDS is the high false positives rate, which creates a high unnecessary workload for human analysts at Security Operation Centre (SOC). Similarly, the exponential growth of captured sensor raw data combined with the application of Threat Intelligence (TI) creates a complex data flow. Considering mentioned challenges, this paper presents a model of heterogeneous sensor and TI data fusion and reduction in intrusion detection. We summarize found literature and qualitative research interviews with security experts from law enforcement and public and private organizations. Building on our qualitative research we identied feature subsets for corresponding data fusion that produce accurate classication model in Machine Learning (ML)-aided analysis. Proposed data fusion process model was successfully evaluated on a real-world dataset from a SOC. This work contributes to development of data-driven approach for automated classication of IDS events using reduction of raw log data.

Author Biographies

Lars Christian Andersen, Norwegian University of Science and Technology

Norwegian Information Security Laboratory, Center for Cyber- and Information Security

Katrin Franke, Norwegian University of Science and Technology

Norwegian Information Security Laboratory, Center for Cyber- and Information Security

Andrii Shalaginov, Norwegian University of Science and Technology

Norwegian Information Security Laboratory, Center for Cyber- and Information Security


K. Julisch and M. Dacier, "Mining intrusion detection alarms for actionable knowledge," in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 366-375, ACM, 2002.

T. H. Nguyen, J. Luo, and H. W. Njogu, "An efficient approach to reduce alerts generated by multiple ids products," International Journal of Network Management, vol. 24, no. 3, pp. 153-180, 2014.

P. E. Berg, K. Franke, and H. T. Nguyen, "Generic feature selection measure for botnet malware detection," in Intelligent Systems Design and Applications (ISDA), 2012 12th International Conference on, pp. 711-717,
IEEE, 2012.

Digital Shadows, "Cyber situational awareness - gain an 'attacker's eye view' of your organisation," 2015.

Sqrrl Data Inc., "Sqrrl architecture." Accessed April 15. 2016.

NSW Government, "NSW ict strategy, priorities: Information sharing."
priorities/managing-information-better-services/information-sharing. Accessed December 15. 2015.

C. Kruegel, W. Robertson, and G. Vigna, "Using alert verification to identify successful intrusion attempts," Praxis der Informationsverarbeitung und Kommunikation, vol. 27, no. 4, pp. 219-227, 2004.

K. Julisch, "Clustering intrusion detection alarms to support root cause analysis," ACM transactions on information and system security (TISSEC), vol. 6, no. 4, pp. 443-471, 2003.

A. Valdes and K. Skinner, "Probabilistic alert correlation," in Recent advances in intrusion detection, pp. 54-68, Springer, 2001.

P. Langley et al., "Selection of relevant features in machine learning. Defense Technical Information Center, 1994.

J. C. Schlimmer et al., "Efficiently inducing determinations: A complete and systematic search algorithm that uses optimal pruning.," in ICML, pp. 284-290, Citeseer, 1993.

H. Almuallim and T. G. Dietterich, "Learning with many irrelevant features.," in AAAI, vol. 91, pp. 547-552, Citeseer, 1991.

K. Kira and L. A. Rendell, "A practical approach to feature selection," in Proceedings of the ninth international workshop on Machine learning, pp. 249-256, 1992.

M. Hall, G. Holmes, et al., "Benchmarking attribute selection techniques for discrete class data mining," Knowledge and Data Engineering, IEEE Transactions on, vol. 15, no. 6, pp. 1437-1447, 2003.

H. T. Nguyen, K. Franke, and S. Petrovic, "Reliability in a feature-selection process for intrusion detection," in Reliable Knowledge Discovery, pp. 203-218, Springer, 2012.

C. McNamara,"General guidelines for conducting research interviews."
businessresearch/interviews.htm. Accessed February 1. 2016.

M. Bedworth and J. O'Brien, "The omnibus model: a new model of data fusion?," IEEE Aerospace and Electronic Systems Magazine, vol. 15, no. 4, pp. 30-36, 2000.

R. Contu and R. McMillan, "Competitive landscape: Threat intelligence services, worldwide, 2015.", 2014. Accessed December 10. 2015.

I. Kononenko and M. Kukar, Machine learning and data mining: introduction to principles and algorithms. Horwood Publishing, 2007.

I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," The Journal of Machine Learning Research, vol. 3, pp. 1157-1182, 2003.

H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 8, pp. 1226-1238, 2005.

C. E. Shannon, "A mathematical theory of communication," ACM SIGMOBILE Mobile Computing and Communications Review, vol. 5, no. 1, pp. 3-55, 2001.

M. A. Hall, Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, University of Waikato, Hamilton, New Zealand, 1998.

M. Robnik-Sikonja and I. Kononenko, "An adaptation of relief for attribute estimation in regression," in Fourteenth International Conference on Machine Learning (D. H. Fisher, ed.), pp. 296-304, Morgan Kaufmann, 1997.

"Weka 3: Data mining software in java." accessed: 10.09.2015.

M. A. Hall, Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato, 1999.

L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, "Exposure: Finding malicious domains using passive dns analysis.," in NDSS, 2011.

Verizon, "2016 data breach investigations report."
verizon-insights-lab/dbir/2016/, 2016. Accessed April 27. 2016.

Symantec, "Internet security threat report - volume 21, april 2016."
security-center/threat-report, 2016. Accessed April 20. 2016.





Norsk Informasjonssikkerhetskonferanse 2016