Downloads
We maintain the data using Google Drive and Baidu Yunpan. The extraction code for Baidu Yunpan is noah
Download link for Wukong100m: [Google Drive]
Download link for Wukong100m: [Baidu Yunpan]
Download link for Wukong-Test: [Google Drive]
Download link for Wukong-Test: [Baidu Yunpan]
Data orgainization
The whole dataset is split into 256 files, each contains around 80,000 <image, text> pairs. After unzip the file, files under the data root directory is like this
data_root └─wukong_release ├─ wukong_100m_0.csv ├─ wukong_100m_1.csv ├─ wukong_100m_2.csv ├─ .... └─ wukong_100m_255.csv