Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Snowflake launches Government & Education Data Cloud

    June 7, 2023

    Google Cloud introduces free AI courses on its skill boost platform

    June 5, 2023

    Networking for Practical
    Quantum Applications

    June 5, 2023
    Facebook Twitter Instagram
    Your Infotech
    • Data

      Are Your APIs Leaking Sensitive Data?

      May 23, 2023

      6 barriers to becoming a data-driven company

      May 18, 2023

      How to explain data meshes, fabrics, and clouds

      May 16, 2023

      Crypto Price Today: Bitcoin holds above $27,600, focus on US CPI data

      May 12, 2023

      How To Delete Your Data From ChatGPT

      May 10, 2023
    • Cloud

      Snowflake launches Government & Education Data Cloud

      June 7, 2023

      Google Cloud introduces free AI courses on its skill boost platform

      June 5, 2023

      India public cloud market reaches $6.2 bn, SaaS sees largest growth

      June 2, 2023

      Apple’s original cloud photo sync service shuts down this summer

      May 30, 2023

      Cloud-based IT operations are on the rise

      May 26, 2023
    • Networking

      Google Cross-Cloud Interconnect: A Step Towards Seamless Multicloud Networking

      June 1, 2023

      Twitter Is a Far-Right Social Network

      May 25, 2023

      Meta Platforms scoops up AI networking chip team from Graphcore

      May 15, 2023

      What Is Bluesky? The Twitter Alternative With Promising Networking Technology

      April 24, 2023

      Enterprise networking sees age of SASE and network as a service

      April 19, 2023
    • Virtualization

      Imagination and Telechips drive automotive display diversity with hardware virtualization

      March 16, 2023

      Device virtualization is key to IoT adoption

      March 3, 2023

      Discover how virtualization can transform your business with this online training

      February 7, 2023

      Server Virtualization Software Market Next Big Thing | Major Giants IBM, Oracle, Microsoft

      February 2, 2023

      Global Data Virtualization Market Report 2022: Featuring Oracle, IBM, Cisco, Salesforce, Workday, Alteryx, Domo, Ceros, Cluvio & Qliktech International

      January 26, 2023
    • IT Infrastructure

      Networking for Practical
      Quantum Applications

      June 5, 2023

      TCS+ | The need for speed: Braintree’s Heath Huxtable on modern IT infrastructure

      March 13, 2023

      The race to net zero: Six ways to slash IT infrastructure emissions

      March 10, 2023

      Vertiv and TechAccess partner to boost African IT infrastructure solutions

      February 28, 2023

      It Infrastructure Market Size 2023 Research Report with Technological Factors and Forecast till 2025

      February 21, 2023
    Your Infotech
    Home»Data»10 Websites to Get Amazing Data for Data Science Projects
    Data

    10 Websites to Get Amazing Data for Data Science Projects

    yourinfotechBy yourinfotechApril 13, 2023Updated:April 13, 2023No Comments5 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp VKontakte Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    How much can anyone really care about sepal length?” my friend complained to me over coffee a few days ago. She was referring to the built-in `iris` dataset in R, which first debuted way back in 1936. “Why do college professors try to teach us data science with crappy, boring, pointless data when there’s so much great data out there for data science projects?”

    She’s right. It’s really tough to motivate yourself to learn data science, or do data science projects when your data is boring or meaningless to you. I know I struggled to motivate myself to learn data science until I found some good crunchy data that interested me.

    In this article, I’m going to break down 10 amazing websites where you can grab some really awesome data for data science projects. The purpose will be to showcase a variety of data that might appeal to you. Ultimately, these websites should help you find data you care about, do a cool data science project, and use that to get a job.

    How did I Vet these Data Sources?

    If you see a website in this article, it’s because the data it contains is:

    • Freely available. You won’t have to pay for it.
    • Community-oriented. It’s not just going to just be a file; there will be some commentary and explanation around it.
    • Cool. It’s something that someone, somewhere will care about. Maybe you!
    • Clean-ish. You’ll get to practice the fun part of data science – analyzing, visualizing, sharing, and so on.
    • Language-agnostic. You can dig into these with Python, R, SQL, or any other language you like.

    1. Google’s Dataset Search

    I’m cheating a little bit, because this isn’t really a website for datasets, but rather a search engine for data sets. But it’s too good not to include.

    Google’s Dataset Search is just like Google but for data sets. You type in your query, and Google returns as many datasets as it has on that subject.

    For example, searching “cats” brings me over one hundred datasets, including a dataset containing over 9,000 images of cats.

    Kaggle’s Datasets is also a search engine, but it’s both more limited and more focused.

    It’s more limited because it only contains datasets that people have published with Kaggle. But it’s more focused because the datasets aren’t just whatever random set of numbers Google scraped. Kaggle is a home for data science competitions, so the datasets it collects are extremely relevant to data science.

    This allows you to filter by your specific interest. For example, I can stumble across that same cat dataset if I searched “cat” with the “computer vision” filter on.

    3. KDNuggets

    This may come as a surprise to you, but KDNuggets curates a great set of datasets. These datasets are specifically for Data Science, Machine Learning, AI & Analytics, so they’re 

    Many of these aren’t KDNuggets exclusives, but it’s a good list to poke around in. It’s worth noting that when you sign up to be a KDNuggets email subscriber, you also get access to World Data AI which itself contains 3.5 billion datasets.

    4. Government websites

    I could easily expand this list of websites to get datasets to about a million simply by individually listing each of the government websites I like to use to get data. I won’t. Instead, I’ll offer a small list here:

    • http://datasf.org/
    • http://data.gov.uk
    • https://www.usa.gov/About/developer-resources/1usagov.shtml
    • https://www.census.gov/data/datasets.html

    Governments are constantly collecting data to do studies, and many of them publish that data online.

    5. Pudding.cool

    If you like your data to come with a heady dose of pop culture, look no further than Pudding.cool. This website looks at topics as varied as repetitive pop lyrics, women’s pockets, and how The Big Bang Theory gets censored by the Chinese government.

    This is more of a digital magazine writing longform essays about culture, showing a lot of data alongside. I’m including it here because they tell awesome stories and share their data.

    6. 538

    Another essay-driven pop culture website with freely available data you can purloin. They focus more on sports and politics. It’s less data-driven, but I’m giving it a spot on this list because it still curates and shares datasets.

    7. Tidy Tuesdays

    Now, the reality of the matter is that data often isn’t tidy at all. Tidy Tuesdays isn’t exactly a website with datasets per se, but it’s a weekly event and community with an emphasis on using data science to explore untidy data.

    Every week, a new dataset drops. Participants are encouraged to share their cleaning techniques and visualizations with each other on GitHub and Twitter.

    8. GitHub

    GitHub is the home of a lot of data. You can easily search, filter, and download data to play around with on your own. However, the data quality is highly variable. Because anyone can upload data, it’s not always in great condition.

    However, I feel the benefits make up for that.

    9. Buzzfeed

    Buzzfeed doesn’t just do quizzes that comment on the human condition by asking you to build a salad. It may not be as well known for this, but Buzzfeed does a lot of quality data journalism.

    10. Awesome Public Datasets

    I’m ending this list with a pretty self-explanatory title: Awesome Public Datasets. This repo lives on GitHub and contains (mostly) free datasets to explore. They come from online datasets, user suggestions, and research papers.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Email
    Previous ArticleArcelorMittal Nippon Steel selects IBM for cloud-powered transformation
    Next Article Amazon cranks up AI competition against Microsoft, Google with new cloud tools
    yourinfotech
    • Website

    Related Posts

    Are Your APIs Leaking Sensitive Data?

    May 23, 2023

    6 barriers to becoming a data-driven company

    May 18, 2023

    How to explain data meshes, fabrics, and clouds

    May 16, 2023

    Crypto Price Today: Bitcoin holds above $27,600, focus on US CPI data

    May 12, 2023

    Leave A Reply Cancel Reply

    Our Picks

    Subscribe to Updates

    Get the latest creative news from Your Infotech about Information Technology.

    About Us
    About Us

    We provide a wide range of customized, integrated B2B and B2C digital marketing services solutions that are ideal for your business.

    We're accepting new partnerships right now.

    Email Us: info@yourmartech.com
    Contact: +1-530-518-1420

    Our Brands
    • Your Martech
    • Your HR Tech
    • Your Fin Tech
    • Your Revenue
    • Your Bio Tech
    • Your POS Tech
    • Your Health Tech
    SUBSCRIBE NOW
    Loading
    LinkedIn
    • Privacy Policy
    © 2023 Vigarbiz Inc. Designed by Vigarbiz Media.

    Type above and press Enter to search. Press Esc to cancel.