Your browser was unable to load all of the resources. They may have been blocked by your firewall, proxy or browser configuration.
Press Ctrl+F5 or Ctrl+Shift+R to have your browser try again.

Does darjeeling/bugzoo infrastructure support more than one simultaneous process? #10

#1

On my host machine yesterday, I launched two darjeeling repairs on two different bugzoo containers and darjeeling directories. When I looked at the results, it looked like bugzoo manager shut down the daemon while another darjeeling/kaskara process was still running based on the internal debug logs I enabled. I've run darjeeling on a single test multiple times and I didn't see the following error occur. I'll specifically perform the simultaneous test scenario again and see what happens.

bugzoo internal log:

2019-06-25 14:35:51:bugzoo.mgr.source:INFO: refreshed sources
2019-06-25 14:35:51:bugzoo.server:INFO: launched BugZoo daemon
2019-06-25 14:35:51:bugzoo.server:INFO: resource limits:
  * CPU time (seconds): (-1, -1)
  * Heap size (bytes): (-1, -1)
  * Num. process: (128118, 128118)
  * Num. files: (1024, 4096)
  * Address space: (-1, -1)
  * Locked address space: (16777216, 16777216)
2019-06-25 14:35:51:bugzoo.server:INFO: system resources:
  * CPU cores: 6 physical, 12 logical
  * CPU frequency: 3.80 GHz
  * virtual memory: 31.34 GB
  * swap memory: 2.00 GB (2.00 GB free)
  * disk space: 329.09 GB (206.89 GB free)
2019-06-25 14:35:51:bugzoo.manager:INFO: Shutting down daemon...
2019-06-25 14:35:51:bugzoo.manager:INFO: Shut down daemon

kaskara internal log:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bss-lab-1/Darjeeling/djling_venv/lib/python3.6/site-packages/darjeeling/evaluator.py", line 285, in evaluate
    outcome = self._evaluate(candidate)
  File "/home/bss-lab-1/Darjeeling/djling_venv/lib/python3.6/site-packages/darjeeling/evaluator.py", line 274, in _evaluate
    del bz.containers[container.id]
  File "/home/bss-lab-1/Darjeeling/djling_venv/lib/python3.6/site-packages/bugzoo/client/container.py", line 61, in __delitem__
    r = self.__api.delete('containers/{}'.format(uid))
  File "/home/bss-lab-1/Darjeeling/djling_venv/lib/python3.6/site-packages/bugzoo/client/api.py", line 126, in delete
    return requests.delete(url, **kwargs)
  File "/home/bss-lab-1/Darjeeling/djling_venv/lib/python3.6/site-packages/requests/api.py", line 158, in delete
    return request('delete', url, **kwargs)
  File "/home/bss-lab-1/Darjeeling/djling_venv/lib/python3.6/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/bss-lab-1/Darjeeling/djling_venv/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/bss-lab-1/Darjeeling/djling_venv/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/home/bss-lab-1/Darjeeling/djling_venv/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=6060): Max retries exceeded with url: /containers/4eabf7ed-b89b-4555-a71a-ba6973bbc69a (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0ca469fb70>: Failed to establish a new connection: [Errno 111] Connection refused',))
  • replies 4
  • views 3.6K
  • likes 0
#2

I haven't tried to do this before, but so long as the Darjeeling instances are connected to different BugZoo servers, then this shouldn't be an issue. How did you launch the Darjeeling and BugZoo processes?

I had two different xterms in two different directories invoking two different bugzoo programs with darjeeling.
While one darjeeling invocation died like the logs I shared, the other found a repair.
At this point in time, I have only seen that error once - I haven’t tried to reproduce it, since I was making progress in another vein.
I didn’t invoke any BugZoo process other than the bugzoo bug build for those two programs, as in I didn’t launch a separate bugzoo server. I do know that I pruned docker images before launching, but that shouldn’t impact this scenario.

#4

So, by default, Darjeeling will spin up a temporary BugZoo server on a predetermined port. When the second Darjeeling instance is launched, it will try to launch a new BugZoo server on the same predetermined port that was used by the first Darjeeling instance, and failure will ensue.

From looking at the source code, it seems like Darjeeling doesn't allow the user to specify a port that should be used by the ephemeral BugZoo server: https://github.com/squaresLab/Darjeeling/blob/master/src/darjeeling/cli/init.py#L304

There are a few solutions to this:

  1. We add a --bugzoo-port option to the Darjeeling binary that allows the user to specify which port should be used by the ephemeral BugZoo server.
  2. We add a --bugzoo-url option to the binary to allow Darjeeling to connect to an already-running BugZoo server (that may be shared by multiple Darjeeling instances).
  3. We allow Darjeeling to use a range of ports (rather than a single port) to launch a BugZoo server. If a given port in the range is taken, then the next one will be tried.

Regardless, determining if the port is being used by another service (and whether the service is a bugzoo server) is probably good error checking to have.
When I saw this behavior and looked at the ephemeral invocation, my head immediately went to solution #2.
But #3 is the simplest workaround - and doesn’t require any behavioral change on the user’s end. However, the bugzoo server overhead could potentially be a slowdown (??).