|
allenday |
is it possible to decouple mapper and reducer slot allocation for jobs? |
|
allenday |
i mean, if a job is #1 in the MR queue, but it is not yet ready to reduce, can it be prevented from consuming reducer slots? |
|
|<– |
Smokinn has left irc.freenode.net () |
|
|<– |
savage- has left irc.freenode.net (Read error: 110 (Connection timed out)) |
|
–>| |
overlast (n=overlast@19.181.210.220.dy.bbexcite.jp) has joined #hadoop |
|
|<– |
overlast has left irc.freenode.net (”Leaving…”) |
|
nathanmarz |
allenday: i think that would be hard… reducing starts while the mapping is happening (copy stage) |
|
allenday |
nathanmarz, i frequently find that while the reduce has “started”, it can just sit there for a long time doing nothing |
|
allenday |
this is most common with nutch |
|
allenday |
so there could be a bunch of other jobs further back in the queue that get starved for reduces b/c the head of the queue is squatting on the slots |
|
nathanmarz |
it just sits there in the reduce phase? |
|
allenday |
for sure nutch does, yeah |
|
allenday |
during fetch, when it crawling |
|
|<– |
cutting has left irc.freenode.net (”Leaving.”) |
|
nathanmarz |
i see |
|
nathanmarz |
i don’t have that much familiarity with nutch |
|
nathanmarz |
is it possible to increase the number of reducers? |
|
allenday |
yep, but then you can get into i/o trouble later |
|
nathanmarz |
for the job i mean, not the cluster |
|
allenday |
oh |
|
allenday |
it sounds like you propose having these squatters consume minimal # of reducers (e.g. only 1) |
|
nathanmarz |
actually, the opposite |
|
nathanmarz |
let’s say you have 16 reduce slots |
|
nathanmarz |
and the job i set to use 16 reducers |
|
nathanmarz |
each one of those reducers potentially has to go over a lot of data |
|
nathanmarz |
if the job is instead set to use a lot more reducers, like 100 or something |
|
nathanmarz |
than an individual reducer will go a lot faster |
|
nathanmarz |
and potentially, those freed reduce slots will go to jobs with higher priority |
|
allenday |
ok, so you introduce priority to bump the further back ahead in the queue |
|
nathanmarz |
yea |
|
allenday |
is that settable in jobconf? |
|
nathanmarz |
you can set num reducers |
|
–>| |
tobias_au (n=opera@CPE-121-50-201-65.dsl.OntheNet.net) has joined #hadoop |
|
allenday |
so let’s suppose the job that squats on reduce slots gets to the head of the queue. regardless of if it has 16 or 100 reducers configured |
|
nathanmarz |
JobConf#setNumReduceTasks |
|
allenday |
and that it it still in map phase only. has not begun reducing yet |
|
allenday |
until one of those reduces finishes (i.e. the map has finished) all slots are still filled |
|
allenday |
it’s only when the first reduce finishes that the job at #2 can take over a reduce slot |
|
nathanmarz |
right |
|
nathanmarz |
yea that’s true |
|
allenday |
that’s bad |
|
nathanmarz |
this scheme doesn’t help until mappers finished |
|
allenday |
you really want this #1 job |
|
allenday |
when it is allocating reducers |
|
allenday |
to have low priority in acquiring the slots |
|
nathanmarz |
right |
|
nathanmarz |
well you don’t want it to acquire any slots until mappers finish |
|
allenday |
so you give reduce slots to #2, #3, #4, etc. until everyone who wants slots has them. then you assign to #1 |
|
allenday |
or until #1 is ready… |
|
allenday |
is it just me or does the queueing system in hadoop kind of suck? |
|
allenday |
i am coming here from sun grid which puts a lot of emphasis on this aspect |
|
nathanmarz |
well, the priority system will work if you start job #1 after the other jobs |
|
nathanmarz |
if you start the other jobs after #1 then they will get starved of reducers |
|
allenday |
heh, but the whole reason it is in #1 is because it was submitted first, right? isn’t hadoop FIFO wrt jobs? |
|
nathanmarz |
if they’re the same priority |
|
nathanmarz |
so maybe decreasing the reducers job #1 uses is the way to go |
|
nathanmarz |
set it so it doesn’t use all the reduce slots on the cluster |
|
allenday |
i need to do some research to see if there are jira open for improving the scheduler. or if there are some commercial plugins to improve the scheduling |
|
nathanmarz |
definitely room for improvement, agreed |
|
allenday |
yeah, that was what i thought you meant initially. it’s a hack too though, and breaks down when the number of jobs gets large |
|
allenday |
i’m surprised they are coupled. do you understand how it works when the mapper hands off to the reducer? |
|
allenday |
b/c i don’t and i need to |
|
nathanmarz |
yes |
|
allenday |
can i get the 2min version? |
|
nathanmarz |
the reason the reducers start while the mappers are running is because there’s some work they can do without all the map data |
|
nathanmarz |
each reducer needs to copy the relevant outputs from all the mappers to its machine |
|
nathanmarz |
this is called the “copy” phase and can occur in parallel with mapping |
|
allenday |
ok, i’ve seen that |
|
allenday |
so what we need is a flag taht indicates there will be no data to copy until maps all finish |
|
nathanmarz |
yea |
|
nathanmarz |
a flag that says not to pipeline the process |
|
allenday |
default behavior is to have the flag off and copy greedily |
|
allenday |
which is like it does now |
|
allenday |
turn the flag on says to wait until upstream map finishes before grabbing a reduce slot and kicking off the copy |
|
allenday |
**all upstream maps |
|
nathanmarz |
http://hadoop.apache.org/core/docs/current/hadoop-default.html |
|
nathanmarz |
those are all the hadoop config parameters |
|
nathanmarz |
you might be able to find something in there |
|
allenday |
yeah, i fiind goodies in there every time i read that page |
|
allenday |
i am only ~1mo into hadoop |
|
allenday |
here’s another scheduling related question/issue i’m having |
|
allenday |
i find that job i/o and cpu usage tend to synchronize after a while |
|
allenday |
b/c if there is a slow moving job in the queue, all the others tend to get jammed behind it |
|
allenday |
have you seen this? |
|
nathanmarz |
no, i haven’t |
|
nathanmarz |
but that’s interesting |
|
allenday |
it comes back to resource (mis)allocation by the scheduler |
|
nathanmarz |
how are you measuring that? |
|
allenday |
it’s this same issue where jobs will consume all the slots |
|
allenday |
so if you have a slow moving thing blocking all the resources |
|
allenday |
no one else can get past |
|
allenday |
then when the slow moving job finishes, the others all start getting processed very quickly (high cpu load during map), then as they begin to finish there is a flurry of i/o |
|
allenday |
it’s like congestion on the freeway where one car slams on the breaks it sends this wave of traffic jam behind it |
|
allenday |
assuming the freeway is already close to capacity (not sparse) |