CTU-13 Dataset Preprocessing
Source : New dataset, CTU-13-Extended, now includes pcap files of normal traffic — Stratosphere IPS
Contents of Dataset File :
- CTU-Malware-Capture-Botnet-42
- CTU-Malware-Capture-Botnet-43
- CTU-Malware-Capture-Botnet-44
- CTU-Malware-Capture-Botnet-45
- CTU-Malware-Capture-Botnet-46
- CTU-Malware-Capture-Botnet-47
- CTU-Malware-Capture-Botnet-48
- CTU-Malware-Capture-Botnet-49
- CTU-Malware-Capture-Botnet-50
- CTU-Malware-Capture-Botnet-51
- CTU-Malware-Capture-Botnet-52
- CTU-Malware-Capture-Botnet-53
- CTU-Malware-Capture-Botnet-54
Preprocessing
(Truncated) PCAP files in the extended data set extracted using geek The Zeek Network Security Monitor.
To prepare the data for training the files will be converted :
PCAP > ZEEK LOGS > CSV > Structured CSV > ML TRAINING
Extracted Files :
- analyzer.log
- capture_loss.log
- conn.log
- loaded_scripts.log
- notice.log
- packet_filter.log
- stats.log
- telemetry.log
- weird.log
uid Unique connection ID
id.orig_h Originator’s IP address
id.orig_p Originator’s port
id.resp_h Responder’s IP address
id.resp_p Responder’s port
proto Transport protocol (TCP/UDP/ICMP)
service Application service (http, dns, ssl) if identified
duration Connection’s total duration in seconds
orig_bytes Payload bytes from originator
resp_bytes Payload bytes from responder
conn_state Overall state of connection (e.g., ESTABLISHED, REJ)
local_orig Whether the originator is a local host
local_resp Whether the responder is a local host
missed_bytes Bytes not captured due to packet loss
history Packet-level flags indicating handshake/data flow
orig_pkts Number of packets sent by the originator
orig_ip_bytes Total IP-layer bytes from originator (including headers)
resp_pkts Number of packets sent by the responder
resp_ip_bytes Total IP-layer bytes from responder (including headers)
tunnel_parents Reference to any parent tunnel connection (UID)