Pandas development environment setup
I have recently started contributing to pandas
, a Python data-processing library that I use at work. While the code change I proposed passed all CI tests on GitHub, I am still interested in setting up a working copy of the development environment to better understand what it is like.
It turns out, even for project of only moderate size like pandas
, the myriad of C library, binaries, and Python package dependnecies are still a nighmare to get right on the first try without using the project’s recommended toolchain. I very specifically switched from conda
to pyenv
, so it is a real pain to have to use mamba
, which uses conda
undernead the hood. Within 2 hours, I did manage to get the whole thing to work, but my distaste of conda
eventually got me to uninstall mamba
and give the progress I have made.
Note that this is done on an 2020 M1 Macbook Air with MacOS Ventura (so brew
’s home directory is /opt/homebrew
). Python is installed and managed through pyenv 2.3.4.
Virtual environment
python -m venv .venv
The .gitignore
file on the pandas-dev/pandas
repository does not exclude .venv
(although GitHub’s template will). Without modifying .gitignore
of the repository, the workaround I found was to use a global .gitignore
, which can then be synced across computers through GitHub.
Install python dependencies
pip install --upgrade pip wheel setuptools
pip install -r requirements
Installing psycopg2
from source:
First problem: psycopg2
cannot be installed because it could not find pg_config
binary.
Collecting psycopg2
Using cached psycopg2-2.9.5.tar.gz (384 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [25 lines of output]
/Users/ganyuxu/Documents/projects/pandas/.venv/lib/python3.10/site-packages/setuptools/config/setupcfg.py:508: SetuptoolsDeprecationWarning: The license_file parameter is deprecated, use license_files instead.
warnings.warn(msg, warning_class)
running egg_info
creating /private/var/folders/92/rxm5_d4n50ddj3y4xzl8s2700000gn/T/pip-pip-egg-info-v5njs25m/psycopg2.egg-info
writing /private/var/folders/92/rxm5_d4n50ddj3y4xzl8s2700000gn/T/pip-pip-egg-info-v5njs25m/psycopg2.egg-info/PKG-INFO
writing dependency_links to /private/var/folders/92/rxm5_d4n50ddj3y4xzl8s2700000gn/T/pip-pip-egg-info-v5njs25m/psycopg2.egg-info/dependency_links.txt
writing top-level names to /private/var/folders/92/rxm5_d4n50ddj3y4xzl8s2700000gn/T/pip-pip-egg-info-v5njs25m/psycopg2.egg-info/top_level.txt
writing manifest file '/private/var/folders/92/rxm5_d4n50ddj3y4xzl8s2700000gn/T/pip-pip-egg-info-v5njs25m/psycopg2.egg-info/SOURCES.txt'
Error: pg_config executable not found.
pg_config is required to build psycopg2 from source. Please add the directory
containing pg_config to the $PATH or specify the full executable path with the
option:
python setup.py build_ext --pg-config /path/to/pg_config build ...
or with the pg_config option in 'setup.cfg'.
If you prefer to avoid building psycopg2 from source, please install the PyPI
'psycopg2-binary' package instead.
For further information please check the 'doc/src/install.rst' file (also at
<https://www.psycopg.org/docs/install.html>).
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-faile
Solution? Install PostgreSQL through brew, then either add the installation path to PATH
or create a symlink to the pg_config
stuff.
brew install postgresql
will work, but I want to try something lighter-weight since I don’t need the server-side things.
brew install libpq
PATH="/opt/homebrew/Cellar/libpq/15.1/bin:${PATH}" pip install -r requirements-dev.txt
Installing table
I am also missing something related to HDF5
, which stopped me from installing tables==3.7.0
Collecting tables
Downloading tables-3.7.0.tar.gz (8.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.2/8.2 MB 57.6 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [12 lines of output]
/var/folders/92/rxm5_d4n50ddj3y4xzl8s2700000gn/T/H5closesyfucpba.c:2:5: error: implicit declaration of function 'H5close' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
H5close();
^
1 error generated.
cpuinfo failed, assuming no CPU features: No module named 'cpuinfo'
* Using Python 3.10.4 (main, Apr 20 2022, 23:51:46) [Clang 13.1.6 (clang-1316.0.21.2.3)]
* Found cython 0.29.32
* USE_PKGCONFIG: False
.. ERROR:: Could not find a local HDF5 installation.
You may need to explicitly state where your local HDF5 headers and
library can be found by setting the ``HDF5_DIR`` environment
variable or by using the ``--hdf5`` command-line option.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip
First try brew:
brew install hdf5
Need to specify another path:
HDF5_DIR=/opt/homebrew/Cellar/hdf5/1.12.2_2 \
PATH="/opt/homebrew/Cellar/libpq/15.1/bin:${PATH}" \
pip install -r requirements-dev.txt
boto3
and botocore
version pinning
pip
taking too much time resolving dependency seems to be a known issue called backtracking.
Installing boto3
and botocore
ahead of time doesn’t work.
fiona
and its library dependencies
In trying to install fiona==1.8.22
I ran into this error message:
Collecting fiona>=1.8
Downloading Fiona-1.8.22.tar.gz (1.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 53.9 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [2 lines of output]
Failed to get options via gdal-config: [Errno 2] No such file or directory: 'gdal-config'
A GDAL API version must be specified. Provide a path to gdal-config using a GDAL_CONFIG environment variable or use a GDAL_VERSION environment variable.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip
again, try brew
first:
brew install gdal
After installation, the problem was resolved.
Giving up
After resolving the problems above, I still ran into additional problems with psycopg2
and brotlipy
… At this point I have sunk 2 hours into running pip install -r requirements.txt
with still the “build” and “install” parts ahead of me. The risk of messing up my laptop feels unworthy of the reward I can get, so I will pivot to using a different method for setting up the development environment.
Using Mamba
Thanks to Apple putting a measly 8GB of RAM on the MacBook Air I can’t afford to run Docker (which I will use as a last resort, but which will also require me to use a beefier laptop). Will try mamba
as recommended by pandas’ own website
First you need to install mamba
using the installation script, then from the project root:
# Create and activate the build environment
mamba env create
mamba activate pandas-dev
# Build and install pandas
python setup.py build_ext -j 4
python -m pip install -e . --no-build-isolation --no-use-pep517
# Verify build
python -c "import pandas; print(pandas.__version__);"
This all works, but it overrides my pyenv
installation, which I don’t like.
rm -rf /Users/ganyuxu/mambaforge
I would rather use Docker and VSCode than letting unfamiliar virtual environment mess with my laptop.
Conclusion
- It would be great if there can be some way to only install a small portion of the dependencies
pip
needs to get its dependency resolver fixed- Maybe I should think about using Codespace or GitPod, but I don’t like VSCode, and the vim plugin on VSCode just won’t do it. Maybe I should consider doing a project where I can have Neovim in a container…