Sunday, September 2, 2018

Jupyter Notebook - Tricks

Magics

You probably know that you can start notebooks with different kernels (e.g., R, Julia) — not just Python. What you might not know is that even within a notebook, you can run different types of code in different cells. With “magics”, it is possible to use different languages. The magics that are available vary per notebook kernel, however.
By running % lsmagic in a cell you get a list of all the available magics. You can use % to start a single-line expression to run with the magics command. Or you can use a double %% to run a multi-line expression.
Some of my favorites are:
  • % env to list your environment variables.
  • !: to run a shell command. E.g., ! pip freeze | grep pandas to see what version of pandas is installed.
  • % matplotlib inline to show matplotlib plots inline the notebook.
  • % pastebin 'file.py' to upload code to pastebin and get the url returned.
  • % bash to run cell with bash in a subprocess.
  • %time will time whatever you evaluate
  • %%latex to render cell contents as LaTeX
  • %timeit will time whatever you evaluate multiple times and give you the best, and the average times
  • %prun%lprun%mprun can give you line-by-line breakdown of time and memory usage in a function or script. See a good tutorial here.
  • %% HTML: to render the cell as HTML. So you can even embed an image or other media in your notebook:
You can even use magics to mix languages in a single notebook. For example, rmagics lets you run R code — including plotting — in a Python notebook. Note that you first need to load the rmagics extension.
As described in the rmagics documentation, you can use %Rpush and %Rpull to move values back and forth between R and Python:
You can find other examples of language-magics online, including SQL magics and cython magics. You can read about more common magics here. Seriously, you could spend an entire day learning about these!

Pipelines

Magics are handy on their own, but they really shine when you combine them. These functions can help you create pipelines in one visual flow by combining steps in different languages. Getting familiar with magics gives you the power to use the most efficient solution per subtask and bind them together for your project.
When used this way, Jupyter notebooks became “visual shell scripts” tailored for data science work. Each cell can be a step in a pipeline that can use a high-level language directly (e.g., R, Python), or a lower-level shell command. At the same time, your “script” can also contain nicely formatted documentation and visual output from the steps in the process. It can even document its own performance, automatically recording CPU and memory utilization in its output.

Batch, scheduling, and reports

Like any other Python script it is possible to also run your notebook in batch mode. By using nbconvert, you can calculate an entire notebook non-interactively, saving it in place or to a variety of other formats.
This capability makes notebooks a powerful tool for ETL and for reporting. For a report, just schedule your notebooks to run on a recurring basis automatically and update its contents or email its results to colleagues. Or using the magics techniques described above, a notebook can implement a data pipeline or ETL task to run on an automatic schedule, as well.

Scheduled dashboard

Let’s say that you have to regularly send a folium map to your colleague’s email with all the earthquakes of the past day.
To be able to do that, you first need an earthquake data set that updates regularly (at least daily). A data feed that updates every 5 minutes can be found here. Then, you can use Jupyter to write the code to load this data and create the map.
Domino lets you schedule any script to run on a regular basis, and this works for ipynb files just like anything else. When it runs a scheduled execution of batchdemo.ipynb, Domino will calculate the notebook and update its cells with the newest results.
Collaborators can visit the page to view the updated notebook in the browser — without running a Jupyter server. So your notebook has become as a dashboard that’s always up to date.

Scheduled dashboard with magics and HTML export

A step further is combining magics pipelining and turning the whole notebook into a HTML report. This next example shows how you can first use a shell script to retrieve a webpage (http://www.sfgate.com) that you visualize in a wordcloud with Python. Then, as part of the scheduled run, it is converted to a HTML page with the result of the run. You can set up your scheduled runs to automatically email any results (e.g., your notebook rendered as HTML) to your colleagues.
When you finish your notebook with inline visualizations, create a shell script that is similar to:
1ipython nbconvert --to html pipelinedashboard.ipynb
After scheduling this shell script, the result will be a regular HTML version of the last run of your notebook.

Stay tuned for Part II, where we’ll explore creating interactive dashboards in Jupyter notebooks.

Saturday, September 1, 2018

Image Pyramids with Python and OpenCV

  • Part #1: Image Pyramids with Python and OpenCV.
  • Part #2: Sliding Windows for Image Classification with Python and OpenCV.
An “image pyramid” is a multi-scale representation of an image.
Utilizing an image pyramid allows us to find objects in images at different scales of an image. And when combined with a sliding window we can find objects in images in various locations.
At the bottom of the pyramid we have the original image at its original size (in terms of width and height). And at each subsequent layer, the image is resized (subsampled) and optionally smoothed (usually via Gaussian blurring).
The image is progressively subsampled until some stopping criterion is met, which is normally a minimum size has been reached and no further subsampling needs to take place.

Method #1: Image Pyramids with Python and OpenCV

The first method we’ll explore to construct image pyramids will utilize Python + OpenCV.
In fact, this is the exact same image pyramid implementation that I utilize in my own projects!
Let’s go ahead and get this example started. Create a new file, name it helpers.py , and insert the following code:
We start by importing the imutils  package which contains a handful of image processing convenience functions that are commonly used such as resizing, rotating, translating, etc. You can read more about the  imutils  package here. You can also grab it off my GitHub. The package is also pip-installable:
Next up, we define our pyramid  function on Line 4. This function takes two arguments. The first argument is the scale , which controls by how much the image is resized at each layer. A small scale  yields more layers in the pyramid. And a larger scale  yields less layers.
Secondly, we define the minSize , which is the minimum required width and height of the layer. If an image in the pyramid falls below this minSize , we stop constructing the image pyramid.
Line 6 yields the original image in the pyramid (the bottom layer).
From there, we start looping over the image pyramid on Line 9.
Lines 11 and 12 handle computing the size of the image in the next layer of the pyramid (while preserving the aspect ratio). This scale is controlled by the scale  factor.
On Lines 16 and 17 we make a check to ensure that the image meets theminSize  requirements. If it does not, we break from the loop.
Finally, Line 20 yields our resized image.
But before we get into examples of using our image pyramid, let’s quickly review the second method.

Method #2: Image pyramids with Python + scikit-image

The second method to image pyramid construction utilizes Python and scikit-image. The scikit-image library already has a built-in method for constructing image pyramids calledpyramid_gaussian , which you can read more about here.
Here’s an example on how to use the pyramid_gaussian  function in scikit-image:
Similar to the example above, we simply loop over the image pyramid and make a check to ensure that the image has a sufficient minimum size. Here we specify downscale=2  to indicate that we are halving the size of the image at each layer of the pyramid.

Image pyramids in action

Now that we have our two methods defined, let’s create a driver script to execute our code. Create a new file, name it pyramid.py , and let’s get to work:
We’ll start by importing our required packages. I put my personal pyramid  function in ahelpers  sub-module of pyimagesearch  for organizational purposes.
You can download the code at the bottom of this blog post for my project files and directory structure.
We then import the scikit-image pyramid_gaussian function, argparse  for parsing command line arguments, and cv2  for our OpenCV bindings.
Next up, we need to parse some command line arguments on Lines 9-11. Our script requires only two switches, --image , which is the path to the image we are going to construct an image pyramid for, and --scale , which is the scale factor that controls how the image will be resized in the pyramid.
Line 14 loads then our image from disk.
We can start utilize our image pyramid Method #1 (my personal method) on Lines 18-21where we simply loop over each layer of the pyramid and display it on screen.
Then from Lines 27-34 we utilize the scikit-image method (Method #2) for image pyramid construction.
To see our script in action, open up a terminal, change directory to where your code lives, and execute the following command:

Reference