• A new trend is spreading among AI startups: self-collecting proprietary data, instead of scraping the Internet or hiring cheap labor for data labeling as before.
  • In summer 2025, Taylor – a freelance artist – and her roommate wore GoPro cameras on their foreheads to record themselves painting, cooking, cleaning the house, etc., to create training data for the visual AI model of the company Turing.
  • Each day they had to create 5 hours of synchronized video, but it actually took 7 hours of work due to the need for breaks and recovery. Taylor said constantly wearing the camera “caused headaches and left red marks on her forehead.”
  • Turing hired hundreds of manual laborers—from chefs, construction workers, electricians—to collect real-world video from multiple angles, helping the AI learn sequential thinking and visual reasoning.
  • Sudarshan Sivaraman, Turing’s Director of AGI, stated: “We need diverse data from manual jobs, because only then can the model understand how humans actually work.”
  • 75–80% of Turing’s data is synthetic data, expanded from real videos. However, the quality of the original data determines the accuracy of the entire system: “If the input data is poor, the synthetic data will also be poor.”
  • Fyxer, an AI startup specializing in email processing, is also following this trend. Instead of using mass-market data, they train their model using a group of professional executive assistants who understand exactly when to respond to an email—a “very human” task.
  • Founder Richard Hollingsworth said: “It’s the quality of the data, not the quantity, that determines performance.” He calls this a competitive advantage (moat) that competitors find hard to replicate.
  • Startups like Turing and Fyxer show a shift: AI is now not just about strong algorithms, but about accurate, human-refined data with high fidelity and practical applicability.

📌 A new trend is spreading among AI startups: self-collecting proprietary data, instead of scraping the Internet or hiring cheap labor for data labeling as before. It’s the quality of the data, not the quantity, that determines performance. This is a competitive advantage that competitors find hard to replicate. This trend marks a revolution in data quality — where the most effective models are built from real data, real people, and real actions.

Share.
© 2025 Vietmetric
Exit mobile version