Dataflow BigQuery to BigQuery

I am trying to create a dataflow script that goes from BigQuery back to BigQuery. Our main table is massive and has multiple nested fields which breaks the extract capabilities. I'd like to create a simple table that can be extracted containing all the relevant information.

The SQL Query 'Select * from table.orders where paid = false limit 10' is a simple one to make sure it works. The main query connect to multiple tables within the same project.

This seems to work but I'd like to know what I can do to test it out? Also, How can I get this to run every morning automatically?

Thank you

from __future__ import absolute_import

import argparse
import logging

import apache_beam as beam


def run():
    argv = [

    with beam.Pipeline(argv=argv) as p:

        # Read the table rows into a PCollection.
        rows = p | 'read' >> =  'Select * from `table.orders` where paid = false limit 10', use_standard_sql=True))

        # Write the output using a "Write" transform that has side effects.
        rows  | 'Write' >>

if __name__ == '__main__':

1 answer