UGR'16: A New Dataset for Network IDS Evaluation

The dataset presented here is built with real traffic and up-to-date attacks. These data come from several netflow v9 collectors strategically located in the network of a spanish ISP. It is composed of two differentiated sets of data that are previously split in weeks:

The main advantage of this dataset over previous ones is its usefulness for evaluating IDSs that consider long-term evolution and traffic periodicity. Models that consider differences in daytime/night or labour weekdays/weekends can also be trained and evaluated with it.


Calibration data

(Click on each month to see the weeks involved)

March, 2016

3 weeks

More Info

April, 2016

4 weeks

More Info

May, 2016

6 weeks

More Info

June, 2016

4 weeks

More Info

Network traffic evolution for the calibration data

It involves 4 months where the traffic periodicity is clearly shown. Although this data set is clean from synthetically generated attacks the red dots show that some others traffic anomalies appear in this period. These anomalies were detected by several state-of-the-art anomaly detectors.

Type of network traffic collected

This fiugre shows the amount of network traffic flows for main communication ports.

Test data

(Click on each month to see the weeks involved)

July, 2016

1 week

More Info

August, 2016

5 weeks

More Info

Network traffic evolution for the test data

It includes one month of traffic. As in the calibration data, it clearly shows the traffic periodicity. The red dots show that some additional type of anomalies/attacks, other than those synthetically generated, appear in this period. These anomalies were detected by several state-of-the-art anomaly detectors. For example, in this case, the biggest red dots correspond with an email spam campaing.

Type of network traffic collected

This figure shows the amount of network traffic flows for main communication ports.