Postgres query plan

8/31/2023

A query can include filters on the number of times a user has done a given event in an arbitrary time range as well as break down the results by multiple different fields simultaneously.įor example, we might want to filter the above for users who had never seen a blog post before, or for users who later viewed our pricing page, or break down the above query by the specific blog post viewed in addition to the initial referrer of the user viewing the blog post. The queries only become more complicated as they make use of the more powerful features we support. This analysis is about as simple as it gets, and yet it produces this hairy SQL query: As an example, consider this “simple” query, graphing daily unique users who view a blog post, broken down by their initial referrer: This process can easily result in incredibly hairy SQL queries. To execute a query, we take the query expressed in the UI, compile it to SQL, inline the event definitions, and then run the resulting SQL query across a cluster of PostgreSQL instances. Handling this indirection at all is its own fascinating topic. This layer of indirection between the concept being analyzed and the underlying data makes the product very powerful and easy to use: you can add a new event definition and run an analysis over all of your historical data, as if you had been tracking “Signups” since you first installed the product.īut it comes at a cost: everything we do becomes more complicated because the analysis is in terms of a dynamic schema that isn’t fully known until read time. We gather data from web, iOS, android, and a growing list of third-party tools such as Stripe and Salesforce, and provide a simple UI our customers can use to query that data.Ī core concept in our product is the idea of an “event definition”, which is a mapping between a human-level event like “Signup” or “Checkout” and what that event means in terms of raw clicks. What’s more, our product is designed for rich, ad hoc analyses, so the resulting SQL is unboundedly complex.įor some background, Heap is a tool for analyzing customer interactions.

Our customers run hundreds of thousands of queries per week and each one is unique. Making Heap fast is a unique and particularly difficult adventure in performance engineering.

0 Comments

Postgres query plan

Leave a Reply.

Author

Archives

Categories