Unsplash’s dataset is now open source

1_VnKoValwGK3-d1bZhD6sVA.jpeg

Are you looking for image data to do research in Machine Learning and Deep Learning? Are you tired of open source image datasets that are usually limited in size, expose low-quality images, lack variability in the image data, or rely on mass labeling by 3rd party services? If your answer is 'Yes' to all these questions, I have a surprise for you in the next paragraph.

Unsplash just launched the biggest open source Image Dataset. This dataset has been used by researchers at industry-leading institutions such as Stanford University, Cornwell University, National Cheng Kung University, and Apple.

Train and test models using the largest collaborative image dataset ever openly shared. The Unsplash Dataset is created by over 200,000 contributing photographers and billions of searches across thousands of applications, uses, and contexts.

In total, the dataset contains over 2M high-quality images, with 16GB of accompanying data covering:

  • Keyword-image conversions in search results
  • Community and AI-generated keywords
  • EXIF, location, and landmarks
  • Image categories and subcategories
  • User-generated collections and groupings of images
  • Image views and downloads stats

Read the documentation at github.com/unsplash/datasets

To download the Dataset visit

Lite version: github.com/unsplash/datasets (550MB)

Full version: unsplash.typeform.com/to/HPVbjo (16GB)

Have fun with it and don't make a terminator.