Debugging

Overview

When you have a task that needs a pipeline, one of the first steps is to name the subtasks that you want it to perform. You might name subsubtasks and subsubsubtasks too--whatever makes it easier to communicate the pipeline's state.

Seeing which nodes succeed and which nodes fail is often enough context to work with. The name of a failed parent should tell you what went wrong, and looking at the state of its children should tell you why. That's the power of organizing your computation into pipelines: knowledge at a glance.

But if you're looking at an Exec node (which has no children) and you need to go deeper still, then this page is for you. It will explain both types of debug sessions that Conducto provides and prepare you to choose the right one.

If you want to follow along, make sure that you can run Conducto pipelines and then run the commands below:

git clone https://github.com/conducto/examples
cd examples/eratosthenes
python pipeline.py primes_less_than 30 --run --local

Too Close for Pipelines, Switching to Shells

For Conducto to be as flexible as it is, there has to be some point where it lets go of control so you can work your magic. That point is the interface between containers and processes. Managing containers is Conducto's job; starting processes is up to you.

If you're hunting a bug at the process level, a pipeline probably isn't the best fit. There's a better paradigm for that: the interactive shell. Each Exec node in a Conducto pipeline provides debug commands which will give you a shell in a container for that node.

A Prime Example

Let's debug a failed pipeline.

A failed test, indicating a problem with an earlier node

The failing node is called is 2 included? and from the output of the find primes we see that 2 doesn't appear. That looks like the problem, but why?

We can see from the command that find primes runs a python script called sieve.py, which probably has the answer. And since the pipeline definition includes:

img = co.Image(copy_dir=".")

...we can expect it to be in the same directory as the pipeline definition. Open it with your editor of choice.

# https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
primes = []
for i in range(3, num):    if all([i % p for p in primes]):
        primes.append(i)

You don't have to read the wikipedia page in that comment too closely to realize that it's full of listings that start with 2, but the for-loop in sieve.py starts with 3. Maybe the original author thought that 3 was the first prime number (an easy mistake to make).

So we've got an idea about how to fix this, but before we make any changes let's start a debug session.

Debugging Live Code

We call code live if you can make a change and see the effects right away. It's more fun to build things with live code, so it's the default (but it's not always available).

Let's use a live debug session to fix the problem in the example from the previous section. Click the bug icon or use the dropdown; this will copy a command.

Two choices for debugging this exec node

Open a terminal, paste the command, and run it.

$ /path/to/python -m conducto livedebug --id=rmn-ujn --node='/find primes' --timestamp=0
    Launching docker container...
    Context will be mounted read-write
    Make modifications on your local machine, and they will be reflected in the container.
    Execute command by running sh /cmd.conducto

root@6e078f34eba8:/mnt/conducto# █

Before putting you in a shell, Conducto warns that there are local files mounted in the new container. That's what separates "Debug (Live Code)" from "Debug (Snapshot)".

So the sieve.py that's open in your editor is the sieve.py in the container, not a copy. Saving it in your editor will update it in the container too.

Before making any changes, let's look around a bit.

root@6e078f34eba8:/mnt/conducto# ls
    check.py  pipeline.py  sieve.py

root@6e078f34eba8:/mnt/conducto# cat /cmd.conducto
    python sieve.py 30

This node's image was configured with copy_dir=".", so pipeline.py and its siblings are in our working directory. (If this were a snapshot debug session, they would be copies.)

Also, /cmd.conducto contains the node's command, which calls sieve.py. Running it will recreate the node's behavior.

root@6e078f34eba8:/mnt/conducto# sh /cmd.conducto
3
4
5
...

That's the problematic output. We've recreated the bug, now let's make the change in our editor and save it.

primes = []
# for i in range(3, num):
for i in range(2, num):    if all([i % p for p in primes]):
        primes.append(i)

In the debug shell, arrow-up to recall the previous command, and rerun it with your changes.

root@6e078f34eba8:/mnt/conducto# sh /cmd.conducto
2
3
5
...

Those look more like prime numbers, I think we fixed it. We should rerun the affected nodes just to be sure.

Our changes are now in that file, but the problematic node is still using the image from before. The "Rebuild and Reset" button will update the image and clear the node's state.

Trigger an image build and reset the node

Since we reset the child of a serial node, Conducto also resets the nodes that follow it, so they'll also rerun after the image builds.

The bug is fixed

Once they've completed, we can see that the bug is fixed.

Not Always Available

Conducto can only create containers with live code if it knows where to find that code in your filesystem. If you used copy_dir to place your files in the image, then Conducto just mounts that directory.

But if your files were copied into the image some other way, you might need to:

  • Get a copy of the files
  • Set path_map={"./path/from/definition/to/files":"."} when you initialize the image object for that node.

This will not change normal pipeline operation, but it will enable the "Live Code" option. You can read more about path_map in Images

Not Always What you Want

If you want to make "what-if" changes and see their results, then live debug is usually the way to go. It removes the need to rebuild the image every time you want to see a change in action. But it comes with some risks.

For instance, if your local files mismatch the ones in the image, your debug container will behave differently than the node does. You could end up debugging an old version of the file.

Another potential hazard is that maybe sh /cmd.conducto modifies or creates a file during a live debug session. Since a local directory was mounted, the change will persist after the session ends. This can be especially confusing because the owner and group indicators for a file created in a container might not jive with your local operating system's configuration.

These are easy things to deal with if you're aware that they're happening. But if you're not making "what-if" changes, they can be avoided altogether by choosing a snapshot debug session instead.

Debugging a Snapshot

Sometimes you don't need to make changes that live longer than your debug session. If all you want is answers, it may be simpler to debug in a snapshot instead.

The idea behind the word "snapshot" is that a pipeline running from start to finish is like a story. If so, then whenever a node's container completes, the story advances a bit. So if you pluck a single container out of that story and run it in isolation, you're running a snapshot.

In the example above, there was node called check distribution which succeeded each time we saw it. A test that always passes is pointless, so lets give it some scrutiny and make sure that it's actually helping us.

A passing test

Nagura's Theorem, hmm, sounds complicated. Rather than scrutinizing the code too closely, let's just tamper with the input and see if we can make it fail.

There's no need to involve live code in this case, so we'll copy a command for a snapshot debug session and paste it into a terminal.

❯ /path/to/python -m conducto debug --id=rmn-ujn --node='/check distribution' --timestamp=0
    Launching docker container...
    Execute command by running sh /cmd.conducto

root@838a92506168:/mnt/conducto# ls
    check.py  pipeline.py  sieve.py

Like before, we see that the pipeline definition's siblings are here in the container. But since we're in a snapshot, these are copies of those files. Changing them in your local filesystem will have no effect here.

A quick look at check.py shows us that it reads numbers line-at-a time from stdin.

for numstr in sys.stdin.readlines():
    primes.append(int(numstr))

In our debug session, we can string it together with sieve.py.

root@838a92506168:/mnt/conducto# ./sieve.py 100 | ./check.py 100
Nagura's Theorem passes for n = 100

Ok, so it also passes with primes less than 100--but that's no suprise. Now let's remove a prime from the list and see if our check catches it.

root@838a92506168:/mnt/conducto# ./sieve.py 100 | grep -v 37 | ./check.py 100
    Nagura's Theorem fails for n = 100

root@838a92506168:/mnt/conducto# echo $?
    2

Ok, so the test isn't useless: It failed when a prime went missing. What if the input is too dense, rather than too sparse?

root@838a92506168:/mnt/conducto# seq 100 | ./check.py 100
Nagura's Theorem passes for n = 100

It looks like Nagura is less useful for catching false positives. If we're worried that sieve.py might produce too much data, it might be a good idea to add an additional check to this pipeline.

Since we had no intention of modifying code, debugging a snapshot was the simpler way to come to this conclusion.

Conclusion

A well made pipeline makes it easy to reason about many custom-sized chunks of computation. But pipelines only succeed if their nodes succeed, and they're not very helpful for understanding the inner workings of a single node.

To address this, Conducto provides commands that will create containers that mimic a node's behavior. These commands will put you in an interactive shell.

If you want to tinker, ask Conducto for live code and use your favorite tools to get to the bottom of things. If you want to study the bug in its native habitat, ask Conducto for a snapshot--the container you get will be nearly identical to the ones it creates for running your commands.

We hope that these features help you understand your pipeline better or find the bug that you're chasing. Happy hunting.

API's

Concepts

Example Pipelines

Chat with us for a live demo right now!
(If we're awake 😴)

avatar