Debug Python from the shell with ipdb

By: Jonathan Marcus

IDEs are great. I love PyCharm and VSCode as much as anyone. But vim, emacs, and nano have their place too – sometimes all you have is a shell, and you need to edit a file.

Same goes for Python debugging. Visual debuggers are wonderful but not always available. Here's a guide for what to do when your Python script breaks while you've SSHed to a host or dropped into a Docker container.

Example Pandas error

Consider this toy script that makes a Pandas DataFrame, then prints and computes a statistic about it.

import io, pandas as pd

df = pd.read_csv(io.StringIO("item,prc\napple,1\nbanana,2\ncarrot,3"))
total = df["price"].sum()
print(f"Total price: {total}")

Running it produces an error:

$ python script.py
  File "script.py", line 4, in <module>
    total = df["price"].sum()
  ... [some internal pandas messages]
KeyError: 'price'

Drop into the debugger

pdb is your friend. ipdb is your best friend. Install it with pip:

$ pip install ipdb

Add breakpoint() to your Python code.

import io, pandas as pd

breakpoint()df = pd.read_csv(io.StringIO("item,prc\napple,1\nbanana,2\ncarrot,3"))
total = df["price"].sum()
print(f"Total price: {total}")

Add an environment variable then run your script again to drop into the ipdb debugger:

$ export PYTHONBREAKPONT=ipdb.set_trace
$ python ~/script.py
> /Users/jmarcus/script.py(4)<module>()
      3 breakpoint()
----> 4 df = pd.read_csv(io.StringIO("name,prc\napple,1\nbanana,2\ncarrot,3"))
      5 total = df["price"].sum()

ipdb>

Step-by-step debugging

Start by looking around at the source code around the breakpoint:

ipdb> l
      1 import io, pandas as pd
      2 
      3 breakpoint()
----> 4 df = pd.read_csv(io.StringIO("name,prc\napple,1\nbanana,2\ncarrot,3"))
      5 total = df["prc"].sum()
      6 print(df)
      7 print(f"Total price: {total}")

ipdb>

Next reproduce the error. Use n to single step twice: once to read in the data, and a second time to trigger the error:

ipdb> n
> /Users/jmarcus/script.py(5)<module>()
      4 df = pd.read_csv(io.StringIO("name,prc\napple,1\nbanana,2\ncarrot,3"))
----> 5 total = df["price"].sum()
      6 print(df)

ipdb> n
KeyError: 'price'
> /Users/jmarcus/script.py(5)<module>()
      4 df = pd.read_csv(io.StringIO("name,prc\napple,1\nbanana,2\ncarrot,3"))
----> 5 total = df["price"].sum()
      6 print(df)

ipdb>

Next inspect the current state of the variables:

ipdb> df.columns
Index(['name', 'prc'], dtype='object')
ipdb> pp df    
     name  prc
0   apple    1
1  banana    2
2  carrot    3
ipdb>

It now seems clear that my script had the wrong column name, price instead of prc. No problem, let's try a fix:

ipdb> df["price"].sum()
*** KeyError: 'price'
ipdb> df["prc"].sum()   
6
ipdb>

Success! Next modify the script to use the fix:

import io, pandas as pd

breakpoint()
df = pd.read_csv(io.StringIO("item,prc\napple,1\nbanana,2\ncarrot,3"))
total = df["prc"].sum()print(f"Total price: {total}")

Run the script again. At the breakpoint, use c to let the program continue to completion:

$ python ~/script.py
> /Users/jmarcus/script.py(4)<module>()
      3 breakpoint()
----> 4 df = pd.read_csv(io.StringIO("name,prc\napple,1\nbanana,2\ncarrot,3"))
      5 total = df["prc"].sum()

ipdb> c
     name  prc
0   apple    1
1  banana    2
2  carrot    3
Total price: 6

What we learned

The bug was simple, but solving it with the ipdb debugger taught some skills that are very useful for harder bugs.

  • Install ipdb. Use export PYTHONBREAKPOINT=ipdb.set_trace and breakpoint() to start an interactive shell.
  • l lists the source code.
  • n runs until the next line.
  • You can evaluate arbitrary Python expressions.
  • pp pretty prints objects.
  • c continues onward.

For more ipdb commands, check out this ipdb cheat sheet.

Did I miss your favorite ipdb flag? Let me know on Twitter (@jm_conducto).

Disclaimer: I've done this debug cycle a lot. I founded Conducto, a pipeline tool that makes it easy to 1) reproduce errors in an interactive shell, 2) debug with ipdb, 3) write your fix using your native editor, and 4) update your pipeline and continue without starting over.