Skip to main content

CTU-13 Dataset Preprocessing

Source : New dataset, CTU-13-Extended, now includes pcap files of normal traffic — Stratosphere IPS

qTUimage.pngContents of Dataset File :
  • CTU-Malware-Capture-Botnet-42
  • CTU-Malware-Capture-Botnet-43
  • CTU-Malware-Capture-Botnet-44
  • CTU-Malware-Capture-Botnet-45
  • CTU-Malware-Capture-Botnet-46
  • CTU-Malware-Capture-Botnet-47
  • CTU-Malware-Capture-Botnet-48
  • CTU-Malware-Capture-Botnet-49
  • CTU-Malware-Capture-Botnet-50
  • CTU-Malware-Capture-Botnet-51
  • CTU-Malware-Capture-Botnet-52
  • CTU-Malware-Capture-Botnet-53
  • CTU-Malware-Capture-Botnet-54

Preprocessing

(Truncated) PCAP files in the extended data set extracted using geek The Zeek Network Security Monitor

To prepare the data for training the files will be converted :

PCAP > ZEEK LOGS > CSV > Structured CSV > ML TRAINING

Extracted Files :

  • analyzer.log
  • capture_loss.log
  • conn.log
  • loaded_scripts.log
  • notice.log
  • packet_filter.log
  • stats.log
  • telemetry.log
  • weird.log


conn.log fields

ts Timestamp of first packet seen
uid Unique connection ID
id.orig_h Originator’s IP address
id.orig_p Originator’s port
id.resp_h Responder’s IP address
id.resp_p Responder’s port
proto Transport protocol (TCP/UDP/ICMP)
serviceservice* Application service (http, dns, ssl) if identified
duration Connection’s total duration in seconds
orig_bytes Payload bytes from originator
resp_bytes Payload bytes from responder
conn_state Overall state of connection (e.g., ESTABLISHED, REJ)
local_orig Whether the originator is a local host
local_resp Whether the responder is a local host
missed_bytes Bytes not captured due to packet loss
history Packet-level flags indicating handshake/data flow
orig_pkts Number of packets sent by the originator
orig_ip_bytes Total IP-layer bytes from originator (including headers)
resp_pkts Number of packets sent by the responder
resp_ip_bytes Total IP-layer bytes from responder (including headers)
tunnel_parents Reference to any parent tunnel connection (UID)
#truncate \\t for comma delimitted data
cat conn.log | zeek-cut ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig local_resp missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes tunnel_parents | tr "\\t" "," > conn.csv