Dataset

  • Extremely Imbalanced Data

We used the Fudan text classification corpus to create 8 highly skewed data sets, with the imbalance ratio approximately 1:123. Each data set has exactly the same skew ratio, i.e., 1:123, but with different class numbers. These data sets were used in our DMKD paper below:

Pang, G., Jin, H., & Jiang, S. (2015). CenKNN: a scalable and effective text classifier. Data Mining and Knowledge Discovery, 29(3), 593-625.

These data sets are available at SCHOLAT.