Closed
Bug 969707
Opened 10 years ago
Closed 10 years ago
high pending for try linux-hp builds. Are we unintentionally ignoring builds with aws_watch_pending
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jlund, Assigned: jlund)
Details
Attachments
(3 files)
Report Pending is showing a consistently high amount of pending builds for 'linux-hp'. I am assuming this is our 11 bld-centos-hp pool of slaves. This high pending amount (300-500) through out the day seems abnormal. I wonder if aws_watch_pending is not spinning up ec2 instances that can do some of these build variants that are pending. I have not narrowed which builds names are actually pending. My understanding is some are only meant to be run on our 'in-house' hp machines but with such little pending on ec2 machines throughout the day, I'm questioning if some builds are being unintentionally being ignored in from cloud tools.
Assignee | ||
Comment 1•10 years ago
|
||
here we can see various spikes that seems to be expected on a Friday. But our linux-hp machnines have been consistently staying +300
Assignee | ||
Comment 2•10 years ago
|
||
logging chatter from irc: [[17:30:36]] <catlee-away> | so linux-hp is poorly named [[17:30:40]] <catlee-away> | it's not just the hp machines [[17:31:04]] <jlund|build> | ahh so correction to above: this is not just our 11 bld-centos-hp machines. [[17:34:18]] <catlee-away> | is it try or non-try? [[17:34:43]] <catlee-away> | watch pending uses http://hg.mozilla.org/build/cloud-tools/file/default/aws/configs/watch_pending.cfg to figure out what ec2 machines to start [[17:35:05]] <catlee-away> | if a job doesn't match one of those, it won't get a machine started for it [[17:36:39]] < aki> | ah, http://hg.mozilla.org/build/cloud-tools/file/default/aws/configs/watch_pending.cfg#l15 doesn't have an up to date device list [[17:37:56]] <catlee-away> | this is one of the reasons I want to have static dumps of build master configs that other tools can consume [[17:37:59]] < aki> | s,unagi|panda|otoro,emulator-jb|wasabi|nexus-4, and then allow for -debug maybe [[17:39:05]] < aki> | oh, and all the _eng'es [[17:39:16]] < aki> | yeah [[17:39:45]] < aki> | maybe renaming everything to a common format and having those dumps will be a faster route to success [[17:44:33]] < aki> | maybe step 1 is for watch_pending to list the builds it's ignoring so it seems we might be missing all the recent b2g builders added. ec2 should be able to run these so aws_watch_pending should pick them up. catlee also made a point that pending graph may be mis-categorizing data so my graph in comment 1 may not give an accurate representation of what is going on.
Assignee | ||
Comment 3•10 years ago
|
||
So it looks like we already spit out Builder patterns that are pending but do not match anything from BuilderMap: http://hg.mozilla.org/build/cloud-tools/file/44c2d285e62e/aws/aws_watch_pending.py#l463 So grep'n for only those lines in /home/buildduty/logs/aws/aws_watch_pending.log and only concentrating on Android, Linux, b2g, and Ubuntu (minus the duplicates, tests, and talos), I am left with the following attachment. I will extend what we already have on the b2g lines with what aki commented on in in comment 2: http://hg.mozilla.org/build/cloud-tools/file/default/aws/configs/watch_pending.cfg#l15 But are there other Builders that jump out for anybody that should be handled by ec2 machines? with what aki commented on in in comment 2.
Flags: needinfo?(rail)
Comment 4•10 years ago
|
||
(In reply to Jordan Lund (:jlund) from comment #2) > catlee also made a point that pending graph may be mis-categorizing data so > my graph in comment 1 may not give an accurate representation of what is > going on. That's correct. My graphing hasn't kept up with recent builders either. I have some code which dumps out builder names and other assorted things from a master instance. I'll post that somewhere and let people know.
Assignee | ||
Comment 5•10 years ago
|
||
This patch: - adds missing builders I found here: https://bug969707.bugzilla.mozilla.org/attachment.cgi?id=8374342 - removes device builds from try - removes "\\S+" from "^b2g_try\\S+_linux(32|64)_gecko(-debug)?" pattern. I think we only need that if we don't know a given branch but here we know it is 'try'. By removing it, we pick up builder names like: "b2g_try_linux32_gecko" Rail, how does this look?
Attachment #8374386 -
Flags: review?(rail)
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → jlund
Comment 6•10 years ago
|
||
Comment on attachment 8374386 [details] [diff] [review] 969707_watch_pending_ignoring_builds-cloud-tools-021114.diff Review of attachment 8374386 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for the patch! ::: aws/configs/watch_pending.cfg @@ +12,4 @@ > "^Linux.* nightly": "bld-linux64", > "^Linux.* valgrind": "bld-linux64", > "^Linux.* try.*build": "try-linux64", > + "^b2g_(?!try)\\S+_(unagi|panda|otoro|leo|inari|inari_eng|hamachi|hamachi_eng|emulator|emulator-debug|emulator-jb|emulator-jb-debug|helix|helix_eng|leo|leo_eng|nexus-4|wasabi)_(dep|nightly)": "bld-linux64", A nit, can you add _eng to all variations and also include periodic builds? Something like "^b2g_(?!try)\\S+_(unagi|panda|otoro|leo|inari|hamachi|emulator|helix|leo)(_eng)?_(dep|nightly|periodic)"
Attachment #8374386 -
Flags: review?(rail) → review+
Comment 7•10 years ago
|
||
I think the r+ above address the needinfo flag as well.
Flags: needinfo?(rail)
Assignee | ||
Comment 8•10 years ago
|
||
pushed to default: https://hg.mozilla.org/build/cloud-tools/rev/1be29c348f12
Assignee | ||
Comment 9•10 years ago
|
||
This has been pulled into cruncher (where the watch_pending proc lives). I am seeing no aws_watch_pending.log output for ignoring builders we don't want ignored. pending graph also shows a decline in linux scl3 machines. I'm calling this resolved.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Component: Tools → General
You need to log in
before you can comment on or make changes to this bug.
Description
•