These Berkeley PhD students and the co-founder of buzzy $6.2 billion Databricks are tackling the next really hard problem for big data programmers
- On Wednesday, the startup Anyscale announced it raised $20.6 million in series A funding led by Andreessen Horowitz.
- Anyscale was built from the open source project Ray, a distributed computing project that allows people to run large scale applications across many computers.
- Anyscale was started by University of California, Berkeley graduate students and the co-founder of the big data startup Databricks, which is now valued at $6.2 billion.
- Visit Business Insider's homepage for more stories.
A new startup born from a research lab at the University of California, Berkeley just launched out of stealth with $20.6 million in funding.
On Wednesday, Anyscale, which started in June, announced it raised its series A funding led by Andreessen Horowitz.
Anyscale was founded by Berkeley grad students Robert Nishihara and Philipp Moritz, as well as Berkeley computer science professor Ion Stoica, who is also the co-founder of the buzzy, big data $6.2 billion startup Databricks. Anyscale is an extension of the research they were doing at Berkeley's RISELab, which is led by Stoica.
At RISELab, they started an open source project called Ray, which they built to solve problems they kept encountering in their own research. As they experimented with different algorithms, they found themselves having to build their own tools and infrastructure to run them. That's when they realized this was something that would be useful to lots of people.
Stoica says the demand for machine learning is exploding, which requires specialized hardware to run these powerful algorithms. Right now, developers may turn to tools like Tensorflow and PyTorch, which helps teach machines to learn, but they are on their own when it comes to managing the underlying infrastructure, the computers and cloud options they need to run really big ML apps.
The power and memory required to run algorithms that tackle advanced problems needs to be distributed among multiple computers, Stoica says. That's where the open source project Ray comes in.
"Building and programming distributed applications is incredibly hard," Nishihara, the CEO, told Business Insider. "It requires a lot of expertise. This is a problem we ran into...We thought there had to be a better way."
Stoica says the goal is to make this as easy as programming your own laptop.
"This is where Ray is coming in and why we built it and what we built it for - to make it easy for developers to not only develop new applications but modify the existing application from one machine to thousands of machines," Stoica told Business Insider.
'We want to build great products'
As the open source Ray project took off, Nishihara and Stoica decided to build a company from it when they saw how popular it was becoming.
"More and more companies are using it," Stoica said. "There's a natural need and desire to have a business entity behind the open source project to ensure it has a great future. The other reason we started the company is that we are very excited about the opportunity."
As the co-founder of Databricks, Stoica already knew Ben Horowitz, co-founder and general partner at Andreessen Horowitz. Horowitz had invested in Databricks, and when Stoica told him about Anyscale, he decided to lead its series A round as well. Databricks just raised $400 million in October.
With the funding, Anyscale plans to build commercial support and features that enhance Ray, but they are promising not to abandon the Ray open source project, either.
"At a high level, our No. 1 our goal is to make Ray open source hugely successful and make it the de facto standard for building distributed applications," Stoica said. "As a company we want to build great products."
'Too high of a barrier or entry'
Nishihara says Ray is for people who are not experts in distributed computing but who still need to build large-scale applications. For example, a biologist might want to use machine learning to analyze tons of data when doing research. Learning to build the infrastructure to run it or hiring people to do that can be time-consuming and expensive.
"You not only have to be a biology expert, you have to be a distributed computer expert," Nishihara said. "That's too high of a barrier of entry to most people. We're trying to simplify this and make it so that these people who are working on these problems are not necessarily computer science experts, but can benefit from the same tool."
Stoica has previously worked with Apache Spark, another open source project for processing big data. He's not worried about handling another large project, but he says that one challenge may be figuring out how to improve features based on the requirements and feedback of users, since people will use it for a variety of different needs.
"Adapting the system and improving the system is not easy," Stoica said. "On the other hand it's very gratifying to see all these applications. In the future it's about moving fast and building the right product."
Nishihara says he only expects this kind of computing to keep growing.
"We are incredibly excited about this because this is where we see the trend going," Nishihara said. "Distributed computing is becoming increasingly essential and will become the default mode of computation. The tools are just not there yet. It's incredibly complex to do distributed computing. It will take quite a bit of work to solve that problem."
Got a tip? Contact this reporter via email at rmchan@businessinsider.com, Signal at 646.376.6106, Telegram at @rosaliechan, or Twitter DM at @rosaliechan17. (PR pitches by email only, please.) Other types of secure messaging available upon request. You can also contact Business Insider securely via SecureDrop.