UGR'16: A New Dataset for the Evaluation of Cyclostationarity-Based Network IDSs

The dataset presented here is built with real traffic and up-to-date attacks. These data come from several netflow v9 collectors strategically located in the network of a spanish ISP. It is composed of two differentiated sets of data that are previously split in weeks:

A CALIBRATION set of data gathered from March to June of 2016 (4 months) containing real background traffic data
A TEST set of data gathered from July to August of 2016 containing real background and synthetically generated traffic data that corresponds with several and well know types of attacks.

The main advantage of this dataset over previous ones is its usefulness for evaluating IDSs that consider long-term evolution and traffic periodicity. Models that consider differences in daytime/night or labour weekdays/weekends can also be trained and evaluated with it.

Reference:
Gabriel Maciá Fernández, José Camacho, Roberto Magán-Carrión, Pedro García-Teodoro, Roberto Theron, Ugr'16: a new dataset for the evaluation of cyclostationarity-based network IDSs, In Computers & Security, 2017

Author paper version available for download by clicking here

Calibration data

(Click on each month to see the weeks involved)

March, 2016

3 weeks

More Info

April, 2016

4 weeks

More Info

May, 2016

6 weeks

More Info

June, 2016

4 weeks

More Info

Network traffic evolution for the calibration data

It involves 4 months where the traffic periodicity is clearly shown. Although this data set is clean from synthetically generated attacks the red dots show that some others traffic anomalies appear in this period. These anomalies were detected by several state-of-the-art anomaly detectors.

Type of network traffic collected

This fiugre shows the amount of network traffic flows for main communication ports.

Test data

(Click on each month to see the weeks involved)

July, 2016

1 week

More Info

August, 2016

5 weeks

More Info

Network traffic evolution for the test data

It includes one month of traffic. As in the calibration data, it clearly shows the traffic periodicity. The red dots show that some additional type of anomalies/attacks, other than those synthetically generated, appear in this period. These anomalies were detected by several state-of-the-art anomaly detectors. For example, in this case, the biggest red dots correspond with an email spam campaing.

Type of network traffic collected

This figure shows the amount of network traffic flows for main communication ports.