Closed Bug 969707 Opened 10 years ago Closed 10 years ago

high pending for try linux-hp builds. Are we unintentionally ignoring builds with aws_watch_pending

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlund, Assigned: jlund)

Details

Attachments

(3 files)

Report Pending is showing a consistently high amount of pending builds for 'linux-hp'. I am assuming this is our 11 bld-centos-hp pool of slaves.

This high pending amount (300-500) through out the day seems abnormal. I wonder if aws_watch_pending is not spinning up ec2 instances that can do some of these build variants that are pending.

I have not narrowed which builds names are actually pending. My understanding is some are only meant to be run on our 'in-house' hp machines but with such little pending on ec2 machines throughout the day, I'm questioning if some builds are being unintentionally being ignored in from cloud tools.
here we can see various spikes that seems to be expected on a Friday. But our linux-hp machnines have been consistently staying +300
logging chatter from irc:

[[17:30:36]] <catlee-away> | so linux-hp is poorly named
[[17:30:40]] <catlee-away> | it's not just the hp machines
[[17:31:04]] <jlund|build> | ahh

so correction to above: this is not just our 11 bld-centos-hp machines.

[[17:34:18]] <catlee-away> | is it try or non-try?
[[17:34:43]] <catlee-away> | watch pending uses http://hg.mozilla.org/build/cloud-tools/file/default/aws/configs/watch_pending.cfg to figure out what ec2 machines to start
[[17:35:05]] <catlee-away> | if a job doesn't match one of those, it won't get a machine started for it
[[17:36:39]] <        aki> | ah, http://hg.mozilla.org/build/cloud-tools/file/default/aws/configs/watch_pending.cfg#l15 doesn't have an up to date device list
[[17:37:56]] <catlee-away> | this is one of the reasons I want to have static dumps of build master configs that other tools can consume
[[17:37:59]] <        aki> | s,unagi|panda|otoro,emulator-jb|wasabi|nexus-4, and then allow for -debug maybe
[[17:39:05]] <        aki> | oh, and all the _eng'es
[[17:39:16]] <        aki> | yeah
[[17:39:45]] <        aki> | maybe renaming everything to a common format and having those dumps will be a faster route to success
[[17:44:33]] <        aki> | maybe step 1 is for watch_pending to list the builds it's ignoring

so it seems we might be missing all the recent b2g builders added. ec2 should be able to run these so aws_watch_pending should pick them up.

catlee also made a point that pending graph may be mis-categorizing data so my graph in comment 1 may not give an accurate representation of what is going on.
Attached file aws_watch_logs3
So it looks like we already spit out Builder patterns that are pending but do not match anything from BuilderMap: http://hg.mozilla.org/build/cloud-tools/file/44c2d285e62e/aws/aws_watch_pending.py#l463

So grep'n for only those lines in /home/buildduty/logs/aws/aws_watch_pending.log
and only concentrating on Android, Linux, b2g, and Ubuntu (minus the duplicates, tests, and talos), I am left with the following attachment.

I will extend what we already have on the b2g lines with what aki commented on in in comment 2: http://hg.mozilla.org/build/cloud-tools/file/default/aws/configs/watch_pending.cfg#l15

But are there other Builders that jump out for anybody that should be handled by ec2 machines?

with what aki commented on in in comment 2.
Flags: needinfo?(rail)
(In reply to Jordan Lund (:jlund) from comment #2)
> catlee also made a point that pending graph may be mis-categorizing data so
> my graph in comment 1 may not give an accurate representation of what is
> going on.

That's correct. My graphing hasn't kept up with recent builders either. I have some code which dumps out builder names and other assorted things from a master instance. I'll post that somewhere and let people know.
This patch:

- adds missing builders I found here: https://bug969707.bugzilla.mozilla.org/attachment.cgi?id=8374342

- removes device builds from try

- removes "\\S+" from "^b2g_try\\S+_linux(32|64)_gecko(-debug)?" pattern. I think we only need that if we don't know a given branch but here we know it is 'try'. By removing it, we pick up builder names like: "b2g_try_linux32_gecko"


Rail, how does this look?
Attachment #8374386 - Flags: review?(rail)
Assignee: nobody → jlund
Comment on attachment 8374386 [details] [diff] [review]
969707_watch_pending_ignoring_builds-cloud-tools-021114.diff

Review of attachment 8374386 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the patch!

::: aws/configs/watch_pending.cfg
@@ +12,4 @@
>          "^Linux.* nightly": "bld-linux64",
>          "^Linux.* valgrind": "bld-linux64",
>          "^Linux.* try.*build": "try-linux64",
> +        "^b2g_(?!try)\\S+_(unagi|panda|otoro|leo|inari|inari_eng|hamachi|hamachi_eng|emulator|emulator-debug|emulator-jb|emulator-jb-debug|helix|helix_eng|leo|leo_eng|nexus-4|wasabi)_(dep|nightly)": "bld-linux64",

A nit, can you add _eng to all variations and also include periodic builds?
Something like
"^b2g_(?!try)\\S+_(unagi|panda|otoro|leo|inari|hamachi|emulator|helix|leo)(_eng)?_(dep|nightly|periodic)"
Attachment #8374386 - Flags: review?(rail) → review+
I think the r+ above address the needinfo flag as well.
Flags: needinfo?(rail)
This has been pulled into cruncher (where the watch_pending proc lives). I am seeing no aws_watch_pending.log output for ignoring builders we don't want ignored. pending graph also shows a decline in linux scl3 machines.

I'm calling this resolved.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: