Skip to main content

CTU-13 Dataset Preprocessing

Source : New dataset, CTU-13-Extended, now includes pcap files of normal traffic — Stratosphere IPS

qTUimage.pngContents of Dataset File :
  • CTU-Malware-Capture-Botnet-42
  • CTU-Malware-Capture-Botnet-43
  • CTU-Malware-Capture-Botnet-44
  • CTU-Malware-Capture-Botnet-45
  • CTU-Malware-Capture-Botnet-46
  • CTU-Malware-Capture-Botnet-47
  • CTU-Malware-Capture-Botnet-48
  • CTU-Malware-Capture-Botnet-49
  • CTU-Malware-Capture-Botnet-50
  • CTU-Malware-Capture-Botnet-51
  • CTU-Malware-Capture-Botnet-52
  • CTU-Malware-Capture-Botnet-53
  • CTU-Malware-Capture-Botnet-54

Preprocessing

(Truncated) PCAP files in the extended data set extracted using geek The Zeek Network Security Monitor

To prepare the data for training the files will be converted :

PCAP > ZEEK LOGS > CSV > Structured CSV > ML TRAINING

Extracted Files :

  • analyzer.log
  • capture_loss.log
  • conn.log
  • loaded_scripts.log
  • notice.log
  • packet_filter.log
  • stats.log
  • telemetry.log
  • weird.log


tsTimestamp of first packet seen
uidUnique connection ID
id.orig_hOriginator’s IP address
id.orig_pOriginator’s port
id.resp_hResponder’s IP address
id.resp_pResponder’s port
protoTransport protocol (TCP/UDP/ICMP)
serviceApplication service (http, dns, ssl) if identified
durationConnection’s total duration in seconds
orig_bytesPayload bytes from originator
resp_bytesPayload bytes from responder
conn_stateOverall state of connection (e.g., ESTABLISHED, REJ)
local_origWhether the originator is a local host
local_respWhether the responder is a local host
missed_bytesBytes not captured due to packet loss
historyPacket-level flags indicating handshake/data flow
orig_pktsNumber of packets sent by the originator
orig_ip_bytesTotal IP-layer bytes from originator (including headers)
resp_pktsNumber of packets sent by the responder
resp_ip_bytesTotal IP-layer bytes from responder (including headers)
tunnel_parentsReference to any parent tunnel connection (UID)