A long list of tech companies are rushing to give themselves the right to use people's data to train AI

Kali HaysSep 14, 2023, 06:00 IST

Business Insider

OpenAI CEO Sam Altman at the Sun Valley conference this year.Kevin Dietsch/Getty Images

More companies are quietly updating privacy policies to use collected user data to train AI models.
Transcription tool Rev is one of the latest companies to change its terms of use for use in AI.

The days of any activity that requires an internet connection not benefitting AI are coming to an end.

Over the last couple of months, companies as varied as Twitter, or X, Microsoft, Instacart, Meta, and Zoom have rushed to update their terms of service and/or privacy policies to allow the collection of information and content from people and customers as data to train generative artificial intelligence models.

Tweets, web searches and apparently even grocery shopping are now an opportunity for companies to build more predictive tools like Bard and ChatGPT, which is owned by OpenAI and receives considerable backing from Microsoft. Zoom, after a public upset at the idea of video calls being fed to a large language model used to train AI, is the only company to subsequently change its updated use policy to say explicitly that user videos would not be used this way.

Such backlash hasn't stopped more companies from deciding their platforms should be training grounds for AI. One of the latest to alter their terms of service is Rev, a popular service for transcribing recorded conversations and phone calls that also does things like closed captions for videos. In the latest version of Rev's Terms of Service, the company added a section it calls "Your content, including services output." That section now states that it not only has a broad license to use all of the content uploaded to its platform "whether publicly or privately," it can use the information "to improve the services, e.g., to train and maintain Rev's ASR speech-to-text model, and other Rev artificial intelligence models."

Rev's Terms seem to have been updated sometime in June to include that language, according to a copy found through the Internet Archive. Users were only prompted to review updated Terms in September, in an email from the company announcing its partnership with OpenAI as "a new third-party sub-processor." OpenAI is now processing data from Rev for "an upcoming new feature." Rev did not disclose what exactly had changed in its terms. A Rev spokesperson said the terms were updated this month and that its model is "informed by a diverse collection of voice data."

"Rev now uses data perpetually, not just while being an active customer, and it is used anonymously to train Rev's proprietary AI," the spokesperson said. The spokesperson also claimed that a Rev customer can "opt out of sharing their data for training purposes," by sending an email to support@rev.com. There is no dedicated form for such a request but the spokesperson said Rev guarantees such requests will be honored.

In Instacart's August update to its terms and conditions, the online grocery shopping platform added language prohibiting anyone from using its content or data to "create, train, test, or improve" any AI tools, or the large language and machine learning models that underpin them.

More companies have been trying to do what they can to stop their data from being scraped and saved to expand datasets needed to train AI models. However, Instacart also added language that left it a window to do just that with its own customers' data, saying its license now allows it to "...otherwise enhance our machine learning algorithms, for the purposes of operating, providing, and improving the services." That language was not in its previous terms, according to a version seen through the Internet Archive. Instacart also did not specify these changes in its update.

An Instacart spokesperson said the company is preparing to deploy some sort of AI tool on its platform.

"We're incorporating generative-AI experiences into our products to assist with customers' grocery shopping questions and help them make food-related decisions," the spokesperson said. "Our updated terms clarify that generative AI is now a part of Instacart's offering subject to restrictions on misuse and the other general provisions of our terms, and the standards for those features remain the same as our entire product."

Even when companies do disclose what they've changed in an update to a Terms agreement or Privacy Policy that covers the use of data for AI, they're often vague. Microsoft's updated Terms, for which it highlighted changes going into effect Sept. 30, added a new five-point section on AI Services. The only one about user data says: "As part of providing the AI services, Microsoft will process and store your inputs to the service as well as output from the service, for purposes of monitoring for and preventing abusive or harmful uses or outputs of the service." The other four points cover Microsoft disallowing any use of its AI services for other AI tools.

Google also offers access to an archive version of its terms, updated in July. Yet the company is similarly vague on what it does with user data when it comes to AI. It can use data as its license allows for "operating and improving the services" including the creation of "new features and functionalities." In its privacy policy, also updated in July, Google mentions its Bard generative AI tool once, saying the company will "use publicly available information to help train Google's AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities."

Twitter, now X under the ownership of Elon Musk, is one of the more direct platforms in saying how it uses the user data it collects for AI, as Musk has for months been building out a new AI project. "We may use the information we collect and publicly available information to help train our machine learning or artificial intelligence models for the purposes outlined in this policy," the company says in its privacy policy updated this month.

Meta, formerly known as Facebook, updated its privacy policy in June. The policy now lets users know that "your activity and information you provide on our products and services" is used to train its generative AI models, as is anything written or said while using an AI tool like Llama 2 or CM3leon. So, everything from status updates to Instagram photos to prompts can now be part of Meta's AI training data sets. The company puts the onus on the user to prevent its training data from sucking up personal information that a person may not want to be used to teach an AI tool how better to answer prompts, saying people should "be mindful about" what they say in prompts.

"As a best practice, don't include any personal information, like your home address or phone number," Meta advises. At the end of August, it created a simple form where users could "request" to opt out of their data being used to train AI models. The company does not say whether it will abide by any such request.

Are you a tech employee or someone else with insight to share? Contact Kali Hays at khays@insider.com, on secure messaging appSignal at 949-280-0267, or through Twitter DM at @hayskali. Reach out using a non-work device.

Cookies on the Business Insider India website

A long list of tech companies are rushing to give themselves the right to use people's data to train AI