Sign up for our daily and weekly newsletters to get the latest updates and exclusive content on top-tier AI coverage.
Yesterday, Netflix made a rare move by releasing a public dataset showing all titles with over 100,000 viewership hours from January 2023 to June 2023. This report, titled “What We Watched: A Netflix Engagement Report,” contains data on more than 18,000 titles, representing 99% of all Netflix viewings and totaling nearly 100 billion hours. Netflix plans to update and release this report twice a year.
Netflix measures “viewership hours” instead of individual viewers or households since some content may be watched multiple times by the same person.
I decided to download the report, which is available as an Excel spreadsheet on Netflix’s blog, and use OpenAI’s ChatGPT (GPT-4) for data analysis.
ChatGPT provided a concise and understandable, albeit brief, analysis of the data. However, it encountered errors when generating charts and sometimes struggled to meet my specific requests.
Here’s how my process went: I started by asking ChatGPT to perform a basic data analysis. It complied and provided a detailed description of the data.
One key point it identified, which might have gone unnoticed by an untrained eye, was the significant number of missing values in the ‘Release Date’ column (13,359), limiting certain types of time-based analysis.
Interestingly, while ChatGPT highlighted the “Top 10 Most-Watched Titles (Jan-Jun 2023),” it didn’t initially list them. I had to prompt it further for that information. I also asked for the least viewed titles, the median viewed title, the average hours viewed, and the title closest to this average, all of which ChatGPT was able to provide.
ChatGPT faced difficulties creating an accurate line plot of viewership hours by month, partly because the dataset only included total viewership hours per title over the six-month period, not broken down by month. Initially, it produced a plot with incorrect dates going back to 2010. After asking for a correction to focus only on the six-month period, it produced a somewhat misleading plot.
The primary issue was that the Netflix data did not differentiate monthly viewership hours, instead presenting cumulative hours for titles released each month over the January-June period. Consequently, a viewer might misunderstand the data representation.
Generating a correctly labeled and useful chart took several tries, indicating that while ChatGPT is helpful for casual users, it still needs improvement to become a reliable and precise data analysis tool.
Stay updated! Get the latest news delivered daily by subscribing to our newsletter.