Construct sessions out of individual user event streams in BigQuery

I have even data logged in bigquery, where I want to group the requests in the following way:

  1. if a request happens more than x minutes apart from the last request, start a new session
  2. if the session grows longer than y minutes, close the session and start a new one

The only two columns essential for defining a session (above requirements) are a user_id column and a timestamp of when an event took place.

The goal is to construct sessions out of individual user event streams.

1 answer

  • answered 2018-02-13 02:25 Gordon Linoff

    You cannot do this in BigQuery. The first condition is easy -- you just need to know the time difference between adjacent rows.

    Unfortunately, the second requires "memory". Basically, two individual rows do not give you the information you need. SQL, in general, does have a solution, recursive CTEs. However, these are not supported in BigQuery.

    If you wanted to break the sessions at a particular time, say midnight, then you could proceed.