Difference between revisions of "Mock pipeline v2.0"
Jump to navigation
Jump to search
(Created page with "== Dependencies == * Python 3.8 for PEP 574 (Pickle protocol 5 with out-of-band data) * Spark 3.0.0 for SPARK-28198 (Add mapPartitionsInPandas to allow an iterator of DataFra...") |
|||
Line 7: | Line 7: | ||
=== Development Toolset v8 === | === Development Toolset v8 === | ||
+ | |||
+ | Required in order to compile Python 3.8, as GCC v4.8 is not compatible. | ||
sudo yum install centos-release-scl | sudo yum install centos-release-scl | ||
Line 32: | Line 34: | ||
make -j 24 | make -j 24 | ||
make install | make install | ||
+ | |||
+ | Fix RPATH | ||
+ | |||
+ | /software/astro/sl6/patchelf/0.8/bin/patchelf --set-rpath /software/astro/centos7/python/3.8.0/lib/ /software/astro/centos7/python/3.8.0/bin/python3.8 | ||
+ | |||
+ | === Pip & virtualenv === | ||
+ | |||
+ | cd /tmp | ||
+ | curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py | ||
+ | /software/astro/centos7/python/3.8.0/bin/python3 get-pip.py | ||
+ | /software/astro/centos7/python/3.8.0/bin/pip install virtualenv | ||
+ | |||
+ | === Virtual enviroment === | ||
+ | |||
+ | cd ~/env | ||
+ | /software/astro/centos7/python/3.8.0/bin/virtualenv mocks3 | ||
+ | source mocks3/binN/activate | ||
+ | |||
+ | === Spark v3.0.0 === | ||
+ | |||
+ | cd /tmp | ||
+ | wget https://www-eu.apache.org/dist/spark/spark-3.0.0-preview/spark-3.0.0-preview-bin-hadoop3.2.tgz | ||
+ | tar xvvzf spark-3.0.0-preview-bin-hadoop3.2.tgz | ||
+ | mkdir -p /software/astro/centos7/spark/3.8.0 | ||
+ | mv spark-3.0.0-preview-bin-hadoop3.2/* /software/astro/centos7/spark/3.8.0 | ||
+ | |||
+ | cd /software/astro/centos7/spark/3.8.0/python | ||
+ | pip install pypandoc | ||
+ | python setup.py sdist | ||
+ | pip install dist/pyspark-3.0.0.dev0.tar.gz |
Revision as of 13:23, 28 November 2019
Dependencies
- Python 3.8 for PEP 574 (Pickle protocol 5 with out-of-band data)
- Spark 3.0.0 for SPARK-28198 (Add mapPartitionsInPandas to allow an iterator of DataFrames)
Installation
Development Toolset v8
Required in order to compile Python 3.8, as GCC v4.8 is not compatible.
sudo yum install centos-release-scl sudo yum install devtoolset-6 scl enable devtoolset-8 bash
Python 3.8.0
Get system compilation flags using:
>>> import sysconfig >>> sysconfig.get_config_var('CONFIG_ARGS')
Configure:
export CFLAGS="-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv" export LDFLAGS="-Wl,-z,relro -g" export PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig ./configure --prefix=/software/astro/centos7/python/3.8.0 --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu \ --disable-dependency-tracking --enable-ipv6 --enable-shared --with-computed-gotos=yes --with-dbmliborder=gdbm:ndbm:bdb \ --with-system-expat --with-system-ffi --enable-loadable-sqlite-extensions --with-dtrace --with-valgrind --without-ensurepip --enable-optimizations
Compile & install:
make -j 24 make install
Fix RPATH
/software/astro/sl6/patchelf/0.8/bin/patchelf --set-rpath /software/astro/centos7/python/3.8.0/lib/ /software/astro/centos7/python/3.8.0/bin/python3.8
Pip & virtualenv
cd /tmp curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py /software/astro/centos7/python/3.8.0/bin/python3 get-pip.py /software/astro/centos7/python/3.8.0/bin/pip install virtualenv
Virtual enviroment
cd ~/env /software/astro/centos7/python/3.8.0/bin/virtualenv mocks3 source mocks3/binN/activate
Spark v3.0.0
cd /tmp wget https://www-eu.apache.org/dist/spark/spark-3.0.0-preview/spark-3.0.0-preview-bin-hadoop3.2.tgz tar xvvzf spark-3.0.0-preview-bin-hadoop3.2.tgz mkdir -p /software/astro/centos7/spark/3.8.0 mv spark-3.0.0-preview-bin-hadoop3.2/* /software/astro/centos7/spark/3.8.0 cd /software/astro/centos7/spark/3.8.0/python pip install pypandoc python setup.py sdist pip install dist/pyspark-3.0.0.dev0.tar.gz