The dataset presented here is built with real traffic and up-to-date attacks. These data come from several netflow v9 collectors strategically located in the network of a spanish ISP. It is composed of two differentiated sets of data that are previously split in weeks:
The main advantage of this dataset over previous ones is its usefulness for evaluating IDSs that consider long-term evolution and traffic periodicity. Models that consider differences in daytime/night or labour weekdays/weekends can also be trained and evaluated with it.
It involves 4 months where the traffic periodicity is clearly shown. Although this data set is clean from synthetically generated attacks the red dots show that some others traffic anomalies appear in this period. These anomalies were detected by several state-of-the-art anomaly detectors.
This fiugre shows the amount of network traffic flows for main communication ports.
It includes one month of traffic. As in the calibration data, it clearly shows the traffic periodicity. The red dots show that some additional type of anomalies/attacks, other than those synthetically generated, appear in this period. These anomalies were detected by several state-of-the-art anomaly detectors. For example, in this case, the biggest red dots correspond with an email spam campaing.
This figure shows the amount of network traffic flows for main communication ports.