CTU-13 Dataset Preprocessing
Source : New dataset, CTU-13-Extended, now includes pcap files of normal traffic — Stratosphere IPS
Contents of Dataset File :
- CTU-Malware-Capture-Botnet-42
- CTU-Malware-Capture-Botnet-43
- CTU-Malware-Capture-Botnet-44
- CTU-Malware-Capture-Botnet-45
- CTU-Malware-Capture-Botnet-46
- CTU-Malware-Capture-Botnet-47
- CTU-Malware-Capture-Botnet-48
- CTU-Malware-Capture-Botnet-49
- CTU-Malware-Capture-Botnet-50
- CTU-Malware-Capture-Botnet-51
- CTU-Malware-Capture-Botnet-52
- CTU-Malware-Capture-Botnet-53
- CTU-Malware-Capture-Botnet-54
Preprocessing
(Truncated) PCAP files in the extended data set extracted using geek The Zeek Network Security Monitor.
To prepare the data for training the files will be converted :
PCAP > ZEEK LOGS > CSV > Structured CSV > ML TRAINING
Extracted Files :
- analyzer.log
- capture_loss.log
- conn.log
- loaded_scripts.log
- notice.log
- packet_filter.log
- stats.log
- telemetry.log
- weird.log
conn.log fields
| ts | Timestamp of first packet seen |
| uid | Unique connection ID |
| id.orig_h | Originator’s IP address |
| id.orig_p | Originator’s port |
| id.resp_h | Responder’s IP address |
| id.resp_p | Responder’s port |
| proto | Transport protocol (TCP/UDP/ICMP) |
| service | Application service (http, dns, ssl) if identified |
| duration | Connection’s total duration in seconds |
| orig_bytes | Payload bytes from originator |
| resp_bytes | Payload bytes from responder |
| conn_state | Overall state of connection (e.g., ESTABLISHED, REJ) |
| local_orig | Whether the originator is a local host |
| local_resp | Whether the responder is a local host |
| missed_bytes | Bytes not captured due to packet loss |
| history | Packet-level flags indicating handshake/data flow |
| orig_pkts | Number of packets sent by the originator |
| orig_ip_bytes | Total IP-layer bytes from originator (including headers) |
| resp_pkts | Number of packets sent by the responder |
| resp_ip_bytes | Total IP-layer bytes from responder (including headers) |
| tunnel_parents | Reference to any parent tunnel connection (UID) |
cat conn.log | zeek-cut ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig local_resp missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes tunnel_parents > conn.csv