Ray Summit: Sharing Our Experiences with the Community

Ray is a popular open source framework for scaling AI and Python workloads which is maintained by Anyscale. At deepset, we use it for both of our products: deepset Cloud and Haystack. In deepset Cloud, Ray handles the crucial job of deploying indexing and query pipelines and then serving them to the users. For Haystack, we offer the integration with Ray to run the pipelines.

At deepset, we love open source and we celebrate the exchange between different communities. Hence, we were excited when Anyscale invited us to share our experiences with running Ray for production workloads at their very first in-person Ray Summit.

Ray Summit 2022

After years of virtual conferences, Ray Summit gathered the community for an in-person event in San Francisco. People from over fifteen countries met at the Hyatt Regency Embarcadero Hotel for three days of workshops, news about the latest Ray 2.0 release, and talks by the community and companies who are using Ray. With the exception of the keynotes and the workshops, there were four tracks going on in parallel, which left the attendees with the agony of choice.

We were lucky enough that our talk about Running a question-answering system on Ray Serve was scheduled right after the initial keynote by Anyscale, OpenAI, and Meta so that we could happily enjoy the other talks and mingle with the Ray community.

Running at Scale with Ray 2.0

At deepset I spend most of my time working on the backend of deepset Cloud. In deepset Cloud, we use Ray Serve to index documents and deploy query pipelines. Indexing is an especially challenging task as we need to index millions of documents for our customers in reasonable time. This requires high throughput and poses additional complexity of fiddling around with GPU workloads. Ray Summit with all its production use cases was the perfect place for me to see how other companies approach this and how Ray 2.0 solves some of our limits which started to hit.

High Availability Deployments with Ray 2.0

Our customers rely on deepset Cloud to be stable and performant. We regularly redeploy deepset Cloud to ship new features or bug fixes. So far, this tended to cause us some headaches as new deployments meant restarting our Ray cluster, which could potentially cause downtime for our deployed pipelines.

You can probably imagine my joy when Anyscale announced at the conference that Ray 2.0 supports highly-available architecture for online serving. For this, Ray 2.0 re-introduces Redis as a global cluster storage (GCS). Up until now, this important cluster metadata has been managed and stored by the Ray head node, which meant that a restart of the head node inevitably caused our Ray cluster to be unreachable. Making Redis optional in Ray 1.11 just to re-introduce it as key component in Ray 2.0 isn’t without irony and the Redis team payed homage with their talk The return of the Redi: Redis and Ray in the ML Ecosystem (the missing “s” is on purpose) at the conference. This change is a real game changer for us as it allows us to ship frequent updates to deepset Cloud without downtime for our users.

KubeRay as the New Default Cluster Deployment Method

Until now, the default method for deploying Ray, which we also used at deepset, was to create a cluster on AWS EC2 machines. A community-driven project called KubeRay added support for deploying Ray clusters on top of Kubernetes. As I’ve found out at the conference, some particularly demanding Ray users already started to use KubeRay for the Ray deployments. Ray 2.0 embraces this and makes KubeRay the new recommended way of deploying Ray clusters. The old deployment methods will continue to be maintained but the Anyscale team stressed that they are doubling down on fully integrating KubeRay and making Ray resources native Kubernetes resources.

Once again, this was great news for us at deepset. We are already using Kubernetes for deploying our other services and moving our Ray clusters to Kubernetes has numerous advantages. First, it helps to unify our stack and our observability tooling. Second, we can leverage the battle-proven Kubernetes auto-scaling capabilities and its rolling deployments. Lastly, it gives us more options for customizing the deployment which will come in handy in our journey to stabilize and harden deepset Cloud.

Clusters, Clusters, Clusters

KubeRay makes it really easy to create and destroy Ray clusters. This makes Ray clusters way more exchangeable. At the conference, a fellow developer told me that they create between 100 to 1000 Ray clusters a day on Kubernetes which was really mind-blowing to me. This conversation and also the subsequent talks on KubeRay really changed how I thought about Ray clusters and inspired me to think more about what constitutes a Ray cluster from the deepset Cloud perspective. For us, this has the potential to significantly decrease the risk related to the deployments and allow for different kinds of clusters.

Thanks for a Great Summit Experience, Anyscale!

Ray Summit 2022 was truly a blast. It was an honor for me to represent our work here at deepset at the Ray Summit 2022 and to share our experiences with the community. Talking to other advanced Ray users, and the Ray team, and listening to the talks gave me great insights, which will be super helpful for our work at deepset.

I’ve focused above on just a few highlights of it, but Ray 2.0 brings many more new features such as scale-to-zero support and more transparent Ray Serve deployments. I am sure that Ray’s rapid growth and its strong community can accelerate our work at deepset and help us deliver a scalable and robust end-to-end platform for NLP-powered search systems.