Google Certified Professional Data Engineer

Sign Up Free or Log In to participate!

getting a python error in the tweeper lab

python pipeline.py –streaming –runner DataflowRunner –project xxx –temp_location gs://xxx-dflow-stream/temp –staging_location gs://xxx-dflow-stream/staging –job_name tweeps

Traceback (most recent call last):

File "pipeline.py", line 1, in

from apache_beam.options.pipeline_options import PipelineOptions

ModuleNotFoundError: No module named ‘apache_beam’

my requirements.txt has apache_beam[gcp] in it, and i installed the requirements using pip.

i can see apache beam in tweeper/lib/python3.7/site_packages/apache_beam

i resolved it by using "pip3 install -r requirements.txt" and launching the scripts with "python3"

(the "3" added in both cases)

i also had to add –region us-east1 as it was failing because of a missing region error

2 Answers

Thanks for posting this Thomas!

The default Python version in the Cloud Shell terminal has changed since this video was recorded. I’ll mark it down as requiring a refresh.

In the meantime your notes will hopefully help other students out.

Cheers

Full command:

python pipeline.py –streaming –runner DataflowRunner –project xxx –region xxxx –templocation gs://xxxx-stream/temp –staginglocation gs://xxx-stream/staging –job_name tweeps

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?