The reduce phase has 3 steps: shuffle, sort, reduce. Shuffle is where the data is collected by the reducer from each mapper.
This can happen while mappers are generating data since it is only a data transfer.
On the other hand, sort and reduce can only start once all the mappers are done.
Reducers start based on a threshold of percentage of mappers that have finished. You can change the parameter to get reducers to start sooner or later.
Why is starting the reducers early a good thing?
Because it spreads out the data transfer from the mappers to the reducers over time, which is a good thing if your network is the bottleneck.
Why is starting the reducers early a bad thing? Because they "hog up" reduce slots while only copying data.
Another job that starts later that will actually use the reduce slots now can't use them.