The Hidden World of YouTube: Fueling AI with Obscure Videos
Researchers from the University of Massachusetts Amherst have analyzed YouTube videos to understand their impact on AI training. Their findings reveal many videos aimed at personal audiences, including children under 13. This research raises concerns about privacy and copyright as companies like OpenAI use these videos to develop AI models.
![The Hidden World of YouTube: Fueling AI with Obscure Videos](https://devdiscourse.blob.core.windows.net/aiimagegallery/05_06_2024_21_45_50_6361762.png)
- Country:
- United States
Amherst, Jun 28 (The Conversation)—As the artificial intelligence revolution gathers pace, data remains its lifeblood. OpenAI and Google have turned to YouTube as a rich source of training data. However, what exactly comprises this YouTube archive? A team from the University of Massachusetts Amherst set out to investigate, analyzing random samples of YouTube videos to demystify this extensive dataset.
Their 85-page publication sheds light on the surprising contents of YouTube. They discovered many videos intended for personal use or small groups, with a significant proportion created by children under 13.
While most users experience YouTube through algorithmically recommended videos, a vast iceberg of obscure content remains unexplored. Researchers documented thousands of personal videos with minimal views but high engagement, indicating they were meant for a small audience, such as friends and family. This contrasts with the widely known popular content, exposing another layer of YouTube as a video-centered social network for close-knit groups.
The research gains urgency in the context of a New York Times exposé revealing that OpenAI and Google are leveraging these videos to train their large language models. Concerns about YouTube's terms of service, copyright issues, and the sheer volume of data—including content from kids—are growing.
The researchers, while not condemning Google, underscore that OpenAI's opacity about training materials and the potential inclusion of user-generated content from children pose serious ethical questions. With the Federal Trade Commission's Children's Online Privacy Protection Rule in mind, regulatory efforts are needed to ensure legal protections for user data, particularly as AI continues to evolve.
(This story has not been edited by Devdiscourse staff and is auto-generated from a syndicated feed.)
ALSO READ
Google introduces Gemini Chatbot service in India
Top Stories in Finance: JPMorgan Lifts Bonus Cap, Labour's Tax Loophole Stand, Rival AI Start-up by OpenAI Co-founder
TikTok Faces U.S. Privacy Violation Suit, DOJ Drops Data Security Allegations
Brazil's Top Court Dismisses Google and Telegram Fake News Probe
DOJ Prepares Lawsuit Against TikTok for Children's Privacy Violations