10 Great Datasets for Kids

10 Great Datasets for Kids

I’m always on the lookout for open datasets that kids and youth may enjoy. Recently I asked the great people of Twitter to share their favorite kid-friendly datasets and they came back with some wonderful suggestions! I’ll share some of these suggestions, and a few others below.

10 Kid-Friendly Datasets

1) Kid Dataset: Disney Plus Shows

Kid Dataset: Disney Plus Shows

Overview: This dataset contains information on shows and series which are available on the Disney+ stream service. Updated on the first of the month, it includes 19 columns with show metadata.

Available at: Kaggle's "Disney Plus Movies and TV Shows" dataset.

Kid Dataset: Disney Plus Shows

2) Kid Dataset: World Happiness Report

Kid Dataset: World Happiness Report

Overview: Suggested by the Australian Data Science Education Institute, the World Happiness Report is a landmark survey of the state of global happiness. The World Happiness Report 2018, ranks 156 countries by their happiness levels, and 117 countries by the happiness of their immigrants.

Available at: The data is available at the World Happiness Report website at the following link.

Kid Dataset: World Happiness Report

3) Kid Dataset: Strange Sightings with Bigfoot and UFO

Kid Dataset: Strange Sightings with Bigfoot and UFO

Overview: Suggested by Seth Rosen, these datasets provide information on Bigfoot and UFO sightings. These datasets are great for mapping data. If you are an R user, you can follow my tutorial to plot the sightings in a google map by lat/long. The Bigfoot dataset has approximately 3.8K rows (sightings) and 6 columns describing the event, time and location details . The UFO dataset has approximately 80K rows (sightings) and 11 columns describing the location, time duration, shape and specifics of the sighting.

Available at: Kaggle Bigfoot sighting report & Kaggle UFO Sightings

Kid Dataset: Strange Sightings with Bigfoot and UFO

4) Kid Dataset: YaRrr!

Kid Dataset: YaRrr!

Overview: Suggested by Sergio Garcia Mora, and created by Nathaniel D. Phillips, this dataset is available as part of the online book “YaRrr! The Pirate’s Guide to R“. The book offers a fun introduction to R and analytical concepts. The pirates dataset provides survey data from 1000 pirates.

Available at: The dataset is available as part of the R yarrr package, reviewed in the book.

Kid Dataset: YaRrr!

5) Kid Dataset: Pokemon

Kid Dataset: Pokemon

Overview: Suggested by Emily Robinson, this dataset contains information on all Pokemon. The dataset was originally scraped from http://serebii.net/ and then has been kept up to date by a number of contributors. The information contained in this dataset include Base Stats, Performance against Other Types, Height, Weight, Classification, Egg Steps, Experience Points, Abilities, etc. This dataset is so fun to play with! Recently my daughter and I used it for our submission to the Women in Analytics data visualization competition.

Available at: Kaggle’s “The Complete Pokemon Dataset” page.



Kid Dataset: Pokemon

6) Kid Dataset: Baby Names

Kid Dataset: Baby Names

Overview: Suggested by Dr. Teomara Rutherford and Neal Grantham. The babynames dataset was made available by the United States social security administration. For each year from 1880 to 2017, the dataset lists the number of children of each sex for a given name. All names with more than 5 uses are included. 

Available at: The babynames dataset is available as part of the babynames R package.


Kid Dataset: Baby Names

7) Kid Dataset: Cereal

Kid Dataset: Cereal

Overview: Found in Rachael Tatman’s beginner friendly dataset list, this dataset provides information on 80 popular cereals. For each cereal, the following information is collected: basic metadata (name, manufacturer, type etc), nutritional information (calories, protein, fat, sodium, carbs etc) and other stats (weight, volume, rating etc).

Available at: Kaggle’s “80 Cereals” page.

Kid Dataset: Cereal

8) Kid Dataset: Spotify

Kid Dataset: Spotify

Overview: This dataset contains metadata on over 600,000 tracks, gathered via the Spotify API. Information collected includes basic track information (name, duration, artists, release date etc) and numerical ratings for various attributes (danceability, acousticness, tempo, etc).

Available at: Kaggle’s “Spotify Dataset 1922-2021, ~600k Tracks“ page.

Kid Dataset: Spotify

9) Kid Dataset: Palmer Pengins

Kid Dataset: Palmer Penguins

Overview: Suggested by Tom Mock, the palmerpenguins R package contains two datasets that they propose as a viable alternative to Anderson’s Iris data. The penguins dataset contains size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica. The curated dataset contains 8 columns with metadata for each penguin. As of posting this article, there were 344 rows (penguins).

Available via: The palmerpenguins R package.

Kid Dataset: Palmer Penguins


10) Kid Dataset: Datasaurus

Kid Dataset: Datasaurus

Overview: Suggested by Paulapivat and Dr. Trent, Alberto Cairo created the Datasaurus dataset. The dataset is used to encourage people to gain a better understanding of their data through exploration and data visualization. At first the dataset appears to be a standard set of X and Y coordinates. Through plotting the data in a scatterplot, we can see that it takes on an unexpected shape.

Available at: Here is a link to the original blog, and here is a link directly to the data. After Alberto published this dataset, Justin Matejka and George Fitzmaurice published another version called the “Datasaurus Dozen” which shows that different patterns can be found by grouping the data. Check out their write up and available data here.

 
Kid Dataset: Datasaurus
 



Other Ideas

While the above list is focused on ready-made datasets, you can have a lot of fun with other types of data!

  • Analyze your personal data - as suggested by Tad, you can work with the kids to export their personal data (ie messenger, google, fitbit, etc) and analyze it for trends.

  • Create their own dataset - The students can could create their own dataset through a survey. Alternatively they could create a dataset by systematically gathering information on everyday things (ie count the number of items of each color in the house).

  • Sports Data - as suggested by Tanya Cashorali, sports data is always popular. To get started, you can download NBA Stats, MLB Stats, NFL Stats and more!

Other Kids Data Activities

If you’re looking for additional data science material for kids, please feel free to check out a few of my other blogs below.

Thank You

Thank you for reading my article on kid-friendly datasets. Please reach out to me on Twitter to let me know if you liked the article or to share your projects

Simple EDA in R with inspectdf

Simple EDA in R with inspectdf

3 Reasons Why You Should Pre-Sketch Your Data Visualizations

3 Reasons Why You Should Pre-Sketch Your Data Visualizations