python pipeline.py –streaming –runner DataflowRunner –project xxx –temp_location gs://xxx-dflow-stream/temp –staging_location gs://xxx-dflow-stream/staging –job_name tweeps
Traceback (most recent call last):
File "pipeline.py", line 1, in
from apache_beam.options.pipeline_options import PipelineOptions
ModuleNotFoundError: No module named ‘apache_beam’
my requirements.txt has apache_beam[gcp] in it, and i installed the requirements using pip.
i can see apache beam in tweeper/lib/python3.7/site_packages/apache_beam
i resolved it by using "pip3 install -r requirements.txt" and launching the scripts with "python3"
(the "3" added in both cases)
i also had to add –region us-east1 as it was failing because of a missing region error
Thanks for posting this Thomas!
The default Python version in the Cloud Shell terminal has changed since this video was recorded. I’ll mark it down as requiring a refresh.
In the meantime your notes will hopefully help other students out.
python pipeline.py –streaming –runner DataflowRunner –project xxx –region xxxx –templocation gs://xxxx-stream/temp –staginglocation gs://xxx-stream/staging –job_name tweeps