Pandas development environment setup, the sequel
About a week ago I attempted to set up a complete development environment for pandas by following this guidance. Three approaches were listed on the guide, but each of them has some flaws that I found unacceptable: the mamba approach requires installing mamba on my laptop, the pip approach requires installing large number of additional C libraries and configuring them into the right paths, and the docker approach only supports development using VSCode. I don’t like mamba (which is built on top of conda) because I specifically switched from conda to pyenv, I don’t want to risk breaking my development setup for other projects by tinkering with external libraries that I don’t quite understand, and I don’t want to use VSCode.
At the same time, I was also hatching an idea about making my Neovim setup more portable using containers. My idea at the time was to build a Docker image that has Neovim binaries, my configurations, and the language servers, so that I can run a single docker run
command and have my complete setup ready to go.
Putting two and two together, I figured that building my own Neovim
image was a promising lead. This blog post details the steps I took to build a working pandas
development environment with Neovim.
Neovim
I chose to work with a Debian base image (more precisely python:3.9
, which was based on debian:buster
), but for reasons beyond my understanding, apt-get install neovim
will install version v0.4, while at the time of writing this post, the latest stable version is v0.8.
So I chose to build Neovim from source, which resulted in the following Dockerfile section:
# Note that this was run as "root" so that
RUN git clone https://github.com/neovim/neovim \
&& cd neovim \
&& git checkout stable \
&& make CMAKE_BUILD_TYPE=RelWithDebInfo \
&& make install
Building and installing Neovim from source took around 3 minutes on a 6-core Intel 16-inch MacBook pro, although after the initial build, this layer should have been cached, so subsequent build should be very fast.
Non-root user
Per best practice, a non-root user is created. Below are the commands I ran to setup an grant sudo
privilege to the non-root user. note that NVIM_HOME_DIR
and NVIM_USER
are both build arguments declared using the ARG
keyword.
RUN adduser --home ${NVIM_HOME_DIR} \
--shell "/bin/bash" \
--disabled-password \
${NVIM_USER} \
&& usermod -aG sudo ${NVIM_USER} \ # grant sudo privilege
&& echo "ALL ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers # invoke sudo without password
Language servers
I chose to let language servers be installed by non-root users (although some of the steps did require root-level access, which is why sudo
was granted to the non-root user).
# pyright language server
RUN sudo apt-get install -y nodejs npm \
&& sudo npm install -g pyright
# rust-analyzer, which I choose to isntall through rustup
ENV PATH="${NVIM_HOME_DIR}/.cargo/bin:${PATH}"
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o ./rustup.sh \
&& sh ./rustup.sh -y \
&& rm ./rustup.sh \
&& rustup component add rust-analyzer \
&& sudo ln -s $(rustup which --toolchain stable rust-analyzer) /usr/local/bin/rust-analyzer
Personal configuration
Finally with all external dependencies fulfilled, I am ready to put in my personal configuration:
# Create XDG_CONFIG_HOME + Personal config + Packer
ENV XDG_CONFIG_HOME=${NVIM_HOME_DIR}/.config
RUN mkdir ${XDG_CONFIG_HOME} \
&& git clone https://github.com/xuganyu96/xuganyu96.github.io.git \
&& ln -s $(pwd)/xuganyu96.github.io/neovim ~/.config/nvim \
&& git clone --depth 1 https://github.com/wbthomason/packer.nvim \
&& git clone --depth 1 https://github.com/wbthomason/packer.nvim\
~/.local/share/nvim/site/pack/packer/start/packer.nvim
However, this only copies the configuration files and Packer
, while the plugins listed in the configuration files are not installed. Unfortunately I could not figure out a way to invoke the PackerSync
command from the build stage, so running the container to execute PackerSync
from within Neovim is the only option.
Fortunately, there is still a way out using docker commit
. This is not the most elegant solution, but with docker commit
I will only need to call PackerSync
once per machine (provided that I will not use Dockerhub; if I upload the image to DockerHub that I will only need call PackerSync
per build). Here is how it is done:
First build the current Dockerfile, without Neovim plugins installed
# from Dockerfile's directory
docker build -t neovim:latest .
Then, launch the container. Give the container a name for simplier reference later:
docker run -it --name "neovim" neovim:latest # don't add --rm flag
From within the container, launch nvim
to execute PackerSync
. Exit neovim and quit out of the container. The container should be stopped but not removed (you can check that the container is not removed using docker ps -a
).
The PackerSync
command will download all the necessary plugin source code, and the container’s file system is still preserved (sorry I don’t quite have the intimate knowledge of how file changes were preserved there). At this point, we can call docker commit
to add the plugin files into the docker image:
docker commit neovim neovim:latest
Finally, clean up the first container:
docker container rm neovim
Pandas
With Neovim, personal configs, plugins, language servers, and Python (comes with the base image!) all good to go, setting up the tool chain for pandas
development is fairly straightforward:
- Install mamba
- Create virtual environment
- Enter virtual environment
- Build pandas from source
- Start working
step 1 and 2 are purely file changes, so we can use the docker commit
trick to preserve them. There are two reasons why I didn’t put them in the Dockerfile
:
- I want the
Dockerfile
to not be specific to any project - The headless execution of mamba’s installation scrtip is tedious work
Setting up the toolchain
Navigate to the pandas
repository, then run the neovim:latest
image with the repository mounted into the container:
# from pandas project root
docker run -it --rm --name "pandas_dev" -v $(pwd):/home/nvim/pandas neovim:latest
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh" \
&& bash Mambaforge-$(uname)-$(uname -m).sh \
&& source .bashrc \
&& cd pandas \
&& mamba env create
I will commit the container into a separate image tagged pandas-dev:latest
:
docker commit pandas_dev pandas-dev:latest
Validating the commit
Launch the container again, this time with the image we committed from the step above
# from pandas project root
docker run -it --rm --name "pandas_dev" -v $(pwd):/home/nvim/pandas pandas-dev:latest
then navigate into the pandas
project, activate virtual environment, build the project, and attempt to import
cd pandas \
&& mamba activate pandas-dev \
&& python setup.py build_ext -j 4 \
&& python -m pip install -e . --no-build-isolation --no-use-pep517 \
&& python -c "import pandas; print(pandas.__version__)"