Autorun Python Notebooks in AWS Sagemaker
Problem Statement: Deploy machine learning models on AWS without getting into dockers, containers, and other advanced deployment tools.
I created a few python notebooks with each notebook containing one machine learning model, prefaced by some feature engineering and data manipulation steps. Without getting into advanced deployment options, I just wanted to run all the jupyter notebooks, within the Sagemaker notebook, on a weekly basis. Of course, I did not want to babysit and manually run each notebook either.
Solution: As part of a bigger solution,
1. Lambda functions can auto start the Sagemaker notebook at a certain time
2. Lifecycle configuration can execute the notebook on start and then close the Sagemaker instance once it is done executing
In this blog post, I am capturing how to use Lifecycle configuration.
There are two parts to the Lifecycle Configuration
1. Run the notebook on start: Use nbconvert to run multiple python notebooks within the Sagemaker instance
2. Stop the Sagemaker notebook: Use the auto-stop-idle configuration that is widely available on git and AWS blogs. However, I intend to capture some nuances.
Part 1: Run the notebook on start
Some housekeeping edits in the notebook as they are not supported by nbconvert and lifecycle configurations
1. Take out all package installations from your notebook cells
2. Remove matplotlib and all related visualizations
Lifecycle configuration script
set -e
ENVIRONMENT=python3#Declare all the jupyter notebooks that need to run, within the Sagemaker instance
FILE1="/home/ec2-user/SageMaker/Notebook1.ipynb"
FILE2="/home/ec2-user/SageMaker/Notebook2.ipynb"
FILE3="/home/ec2-user/SageMaker/Notebook3.ipynb"#Activate python environment. The lifecycle configuration cannot autodetect the environment
source /home/ec2-user/anaconda3/bin/activate "$ENVIRONMENT"#Install packages here instead of putting them inside the notebooks
pip install --upgrade pip
pip install PyAthena#Execute the notebook in background
nohup jupyter nbconvert "$FILE1" "$FILE2" "$FILE3" --ExecutePreprocessor.kernel_name=python3 --to notebook --inplace --ExecutePreprocessor.timeout=7200 --execute &#Deactivate the python environment
source /home/ec2-user/anaconda3/bin/deactivate
Decoding the script
nohup
nohup jupyter nbconvert "$FILE1" "$FILE2" "$FILE3" --ExecutePreprocessor.kernel_name=python3 --to notebook --inplace --ExecutePreprocessor.timeout=7200 --execute &
Lifecycle config has 5 minutes to get through. If it does not finish then the notebook fails to start. However, using “nohup” makes the notebook run in the background while it starts in the console as well. If your notebook takes less than 5 mins to run, you can ignore this.
nbconvert
nohup jupyter nbconvert "$FILE1" "$FILE2" "$FILE3" --ExecutePreprocessor.kernel_name=python3 --to notebook --inplace --ExecutePreprocessor.timeout=7200 --execute &
Executes the notebook and saves it as HTML format, which is the default output format for nbconvert. To run the notebooks sequentially, club them all in one nbconvert; however, to run them in parallel, use multiple nbconvert statements.
to notebook and inplace
nohup jupyter nbconvert "$FILE1" "$FILE2" "$FILE3" --ExecutePreprocessor.kernel_name=python3 --to notebook --inplace --ExecutePreprocessor.timeout=7200 --execute &
Updates the same notebook once it finishes running executing the notebook. Not using “inplace” will create save the result in another copy of the .ipynb notebook.
timeout
nohup jupyter nbconvert "$FILE1" "$FILE2" "$FILE3" --ExecutePreprocessor.kernel_name=python3 --to notebook --inplace --ExecutePreprocessor.timeout=7200 --execute &
The arguments passed to ExecutePreprocessor
are configuration options called traitlets. The timeout
traitlet defines the maximum time (in seconds) each notebook cell is allowed to run, if the execution takes longer an exception will be raised. The default is 30 s, so in cases of long-running cells, you may want to specify a higher value. The timeout
option can also be set to None
or -1
to remove any restriction on execution time.
For further customization, detailed documentation on nbconvert is available online.
Debugging errors
1. Track the logs in Cloudwatch by clicking “View logs” on the link below the Lifecycle configuration
2. Run the commands in the following order one by one in the terminal
$ cd Sagemaker\YourNotebook
$ source /home/ec2-user/anaconda3/bin/activate python3
$ pip install <packages>
$ jupyter nbconvert TestNotebook.ipynb --ExecutePreprocessor.kernel_name=python3 --to notebook --inplace --execute
$ source /home/ec2-user/anaconda3/bin/deactivate
Part 2: Stop the Sagemaker notebook
Appending the following to the lifecycle configuration will stop the notebook after the time specified in IDLE_TIME.
IDLE_TIME=7200 # 2 hrs
echo "Fetching the autostop script"
wget https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/master/scripts/auto-stop-idle/autostop.py
echo "Starting the SageMaker autostop script in cron"
(crontab -l 2>/dev/null; echo "*/1 * * * * /usr/bin/python $PWD/autostop.py --time $IDLE_TIME --ignore-connections") | crontab
The key here is to put idle time as the total runtime that your notebooks will take to run. In my case, the notebooks would take ~2hr to run, so I changed the idle time to 7200.
Bad character error
You must be wondering where is that ^M and /r in your code that is showing up in the logs. It is not your problem, it is unix/windows communication challenge where the new line in windows is not interpreted well by Unix. There are few ways to deal with this — use a visual studio code editor, use notepad++ OR copy over the code directly from git.