TensorFlow so file conclict with PyArrow new version

These days, i try to use tensorflow to train on ecs machine. It was okay at the first days, every thing is fine, suddenly, i found i cannot use TensorFlow anymore, even very simple command, like ‘import tensorflow’, can not be executed.
Python crash with Segmentation fault, so i just enable faulthandler.
The error is as below:

Fatal Python error: Segmentation fault

Current thread 0x00007f5f2acd5740 (most recent call first):
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 922 in create_module
  File "<frozen importlib._bootstrap>", line 571 in module_from_spec
  File "<frozen importlib._bootstrap>", line 658 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 684 in _load
  File "/usr/lib/conda/envs/python3.6/lib/python3.6/imp.py", line 343 in load_dynamic
  File "/usr/lib/conda/envs/python3.6/lib/python3.6/imp.py", line 243 in load_module
  File "/usr/lib/conda/envs/python3.6/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28 in swig_import_helper
  File "/usr/lib/conda/envs/python3.6/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 32 in <module>

The code is simple:

_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)

just import a so file.

It confused me for two days, i have to check what i have done before tensorflow cannot be used. Finally , i remember i just upgraded pyarrow from 0.10.0 to 0.14.0.
So i just downgraded from 0.14.0 to 0.10.0, everything turned to be good.
It is very hard for people to know what is the main reason python throw Segmentation Error. If i don’t remember i upgrade the pyarrow, i think i would not thought it is confict between so file.

pyspark with jupyter

首先配置jupyter config文件。

jupyter-notebook --generate-config

修改jupyter config文件

c.NotebookApp.port = 18888
c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.allow_root = True

当然要配置好spark,emr环境spark已经完全配置正确。配置pyspark参数

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

启动pyspark即可。

pyspark --master yarn