Improve Heat implementation error handling
Using Heat has increased our ability to spawn 33% more robot builders
but we're now hitting some issues with Rackspace that we will need to
the error handling of our scripts for when Rackspace returns a status
such as Error: None, or CREATE_IN_PROGRESS due to stacks taking longer
than expected to spawn. Rackspace has told us we cannot delete stacks
that are in CREATE_IN_PROGRESS or DELETE_IN_PROGRESS states and that we
should exit our code carefully.
This patch makes the following changes:
* Wait and query stack create status every minute for 15 minutes
* Check status is CREATE_IN_PROGRESS within timeout
* Continue with the job once stack create returns CREATE_COMPLETE
* Fail job on CREATE_FAILED and cleanup stack
* Notify publisher not delete stack when CREATE_IN_PROGRESS
or DELETE_IN_PROGRESS
* Improve delete-stale-stacks to search for inactive stacks not being
used by either releng or sandbox siloes and remove them.
* Delete stacks job will now run every hour to cleanup orphaned systems
Change-Id: Ifdc927f601c07e519cdc502a2fb56fca138c659e
Also-by: Thanh Ha <thanh.ha@linuxfoundation.org>
Signed-off-by: Anil Belur <abelur@linuxfoundation.org>
Signed-off-by: Thanh Ha <thanh.ha@linuxfoundation.org>