模型和数据

为了方便用户使用,我们收集了深度学习常用的数据集,以及一些常用模型的预训练权重,放在对象存储中,用户可直接使用这些数据开始自己的工作,节省下载数据的时间,提高工作效率。

数据集

ImageNet

名称 地址 URL 尺寸

ILSVRC2017 Object localization dataset

CLS-LOC dataset

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/imagenet/ILSVRC2017_CLS-LOC.tar.gz

155GB

ILSVRC2017 Object detection dataset

DET dataset

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/imagenet/ILSVRC2017_DET.tar.gz

55GB

ILSVRC2017 Object detection test dataset

DET test dataset

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/imagenet/ILSVRC2017_DET_test_new.tar.gz

428MB

COCO

名称 地址 数量/尺寸

2017 Train Images

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/train2017.zip

118K/18GB

2017 Val images

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/val2017.zip

5K/1GB

2017 Test images

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/test2017.zip

41K/6GB

2017 Unlabeled images

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/unlabeled2017.zip

123K/19GB

2017 Train/Val annotations

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/annotations_trainval2017.zip

241MB

2017 Stuff Train/Val annotations

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/stuff_annotations_trainval2017.zip

401MB

2017 Testing Image info

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/image_info_test2017.zip

1MB

2017 Unlabeled Image info

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/coco/image_info_unlabeled2017.zip

4MB

PASCAL VOC

名称 地址 尺寸

VOC2012 training/validation data

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/voc/2012/VOCtrainval_11-May-2012.tar

1.86GB

VOC2012 test data

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/voc/2012/VOC2012test.tar

1.72GB

VOC2012 development kit code and documentation

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/voc/2012/VOCdevkit_18-May-2011.tar

500KB

VOC2012 PDF documentation

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/voc/2012/devkit_doc.pdf

416KB

VOC2007 training/validation data

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/voc/2007/VOCtrainval_06-Nov-2007.tar

439MB

VOC2007 test data

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/voc/2007/VOCtest_06-Nov-2007.tar

430MB

VOC2007 development kit code and documentation

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/voc/2007/VOCdevkit_08-Jun-2007.tar

250KB

VOC2007 PDF documentation

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/voc/2007/devkit_doc_07-Jun-2007.pdf

175KB

OpenSLR

Name Category Summary Files

Vystadial

Speech

English and Czech data, mirrored from the Vystadial project

data_voip_cs.tgz [1.5G]data_voip_en.tgz [2.7G]

TED-LIUM

Speech

English speech recognition training corpus from TED talks, created by Laboratoire d’Informatique de l’Université du Maine (LIUM) (mirrored here)

TEDLIUM_release1.tar.gz [21G]

THCHS-30

Speech

A Free Chinese Speech Corpus Released by CSLT@Tsinghua University

data_thchs30.tgz [6.4G]test-noise.tgz [1.9G]resource.tgz [24M]

Aishell

Speech

Mandarin data, provided by Beijing Shell Shell Technology Co.,Ltd

data_aishell.tgz [15G]resource_aishell.tgz [1.2M]

Free ST Chinese Mandarin Corpus

Speech

A free Chinese Mandarin corpus by Surfingtech (www.surfing.ai), containing utterances from 855 speakers, 102600 utterances;

ST-CMDS-20170001_1-OS.tar.gz [8.2G]

VGGFace2

名称 描述 地址 尺寸

Licence.txt

Licence for VGGFace2 dataset.

http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/licence.txt

-

Readme.txt

README.

http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/Readme.txt

-

Vggface2_train.tar.gz

36G. Loosely cropped faces for training.

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/vggface2/vggface2_train.tar.gz

36GB

Vggface2_test.tar.gz

1.9G. Loosely cropped faces for testing.

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/vggface2/vggface2_test.tar.gz

1.9GB

MD5

MD5.

http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/MD5

-

Meta.tar.gz

Meta information for VGGFace2 Dataset.

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/vggface2/meta.tar.gz

9MB

BB_Landmark.tar.gz

The information for bounding boxes and 5 facial landmarks referring to the loosely cropped faces.

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/vggface2/bb_landmark.tar.gz

170MB

Dev_kit.tar.gz

Development kit.

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/vggface2/dev_kit.tar.gz

3kB

中英文维基百科语料

名称 描述 地址 尺寸

zhwiki-latest-pages-articles.xml.bz2

2018年7月23日时最新的中文维基百科语料

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/wiki/zhwiki-latest-pages-articles.xml.bz2

1.5GB

enwiki-latest-pages-articles.xml.bz2

2018年7月23日时最新的英文维基百科语料

https://appcenter-deeplearning.sh1a.qingstor.com/dataset/wiki/enwiki-latest-pages-articles.xml.bz2

14.2GB