Chinese search engine Baidu released some of its code after Eric Schmidt urged tech companies to join forces on AI
Schmidt, now executive chairman of Alphabet, Google's parent company, claimed last Monday that AI has the potential to fix some of the world's "hard problems," including population growth, climate change, human development, and education. In order for this to happen, however, he stressed that companies need to start working together on AI and publish their AI breakthroughs to the academic community.
Four days after Schmidt's remarks, Baidu Research's Silicon Valley AI Lab (SVAIL) published some AI code, known as "Warp-CTC", on code repository GitHub.
The now-public code has been used to build a Baidu speech-recognition system called Deep Speech 2, which can recognise certain short sentences better than humans. It's useful technology for Baidu because the company's many millions of customers often prefer to engage with Baidu services using their voice as typing Chinese characters into a smartphone can be difficult.
Baidu's "Warp-CTC" tool can plug into existing machine learning frameworks being developed by startups and other companies to significantly speed up their AI development efforts. MIT Technology Review reports that a machine learning startup called Nervana, which offers a deep-learning framework to companies that don't have the know-how or resources to develop their own, is already using Warp-CTC in its software.
Yahoo data dump
Last Thursday Yahoo gave machine learning scientists access to a huge dataset in a bid to help them develop computer programs that can think and learn for themselves.
"Data is the life-blood of research in machine learning," said Suju Rajan, director of personalisation science at Yahoo Labs. "However, access to truly large-scale datasets is a privilege that has been traditionally reserved for machine learning researchers and data scientists working at large companies - and out of reach for most academic researchers."
The dataset is a collection of anonymised user interactions with the news feeds on websites like Yahoo News and Yahoo Sports. Yahoo says there are 110 billion events in the 13.5 terabyte file, which is more than 10 times the size of the previous largest dataset released.
Google and Facebook have also published AI code, research, and datasets that help machine learning scientists.
China's internet giants have been slower off the mark, possibly because they see their code as important intellectual property that gives them a competitive advantage over their rivals.