Advanced threat intelligence based on data from network traffic sensors and honeypots

Application Track:

Theme Driven

Code:

REACH-2022-THEMEDRIVEN-4PDIH_9.1

Domain:

Proposed by:

4PDIH

Entity Logo:

Summary of the entity:

4PDIH provides, connects, and supports knowledge, business and technology expertise, technologies, experimental and pilot environments, best practices, methodologies, and other activities necessary to fully enable the Slovenian industry, public administration, and communities in building digital competencies, innovation models and processes, and to support their digital transformation.

The aim of 4PDIH is to foster awareness and provide services to grow digital competencies, share digital experience and case studies locally, regionally, and internationally, and support the government to adapt regulation and open its data to foster entrepreneurship.

Summary of the challenge:

The goal of the challenge is to identify and model cyber threats from network data and honeypot data. Multiple research directions are possible, ranging from statistical data analysis and pattern detection, and advanced machine learning, to rich data visualizations.

Description:

Stakeholders:  

  • Data scientists, machine learning and AI experts, and researchers 
  • Cybersecurity experts 
  • SOC engineers 
  • Sysadmins and IT personnel 
  • Stakeholders that plan or manage digital transformation in various industries 

Description: 

Digitalization significantly increases the attack surface of modern systems. In such environments, it is crucial to understand the threat landscape, constantly reassess risks, and update our understanding of the possible attack mechanisms. On the one hand, the countless simple scanning and brute-forcing attempts from script kiddies raise the bar for the minimum-security requirements. On the other hand, targeted attacks, advanced persistent threats, and the use of zero-day exploits are used where stakes are higher and require much more involved practices to protect mission-critical systems. For these kinds of attacks, we need to learn about novel tactics, exploits, and attack vectors. The knowledge about both can be extracted from attack observation platforms, honeypots, and real-world cyberattack logs and network data.  

The goal of this challenge is to identify and model cyber threats from network data and honeypot data. Our datasets include logs from a darknet sensor and honeypot systems covering multiple technologies and protocols. Several research directions are possible, ranging from statistical data analysis and pattern detection in the data to using advanced machine learning or even AI to detect anomalies. Some examples include the detection of throttled port scans and brute forcing attempts that fall below the threshold of most detection systems that operate on short time windows, darknet backscatter traffic analysis to detect large denial of service attacks, remote access tactics analysis, malware analysis, attack phase classification, prediction of escalation, etc.

Finally, data exploration and analysis by researchers and experts have proved to be an extremely productive method of gaining insights; this requires suitable preparation of data for such analysis, which involves rich 2D/3D/VR data visualization, sonification, novel data modeling and representation techniques, interactive data browsers, and more.

Data:

DIH Data Provider:

    • Network sensor data (historical logs and near-real-time traffic dumps) from several properties at the University of Ljubljana; Contains mostly unsolicited traffic from various hacking and scanning attempts; approx. 200GB (and growing) of structured data (and possibly terabytes of unstructured PCAP data)
    • Honeypot data for multiple protocols (Telnet, SSH, several HTTP-based protocols and services) from UL CyberLab honeynet. Contains mostly unsolicited traffic from various hacking and scanning attempts; approx. 1 TB (and growing) of structured log files

Expected outcomes:

  • Tools for visual data analysis, and drill-down/exploration 
  • Tools for automatic data classification, risk assessment, and threat detection 
  • Integration of developed tools and heuristics into security-related software and appliances (intrusion detection and prevention systems, monitoring and alerting systems, etc.)

How do we apply?

Read the Guidelines for Applicants

Doubts or questions? Read more about REACH on the About Us page,

have a look at our FAQ section or drop us an email at opencall@reach-incubator.eu.