Some experiments have gone pretty much to the extent you suggest with entirely featureless environments and avoiding all interaction with the experimenters. More commonly, as you suggest, statistics are used to eliminate the effect of extraneous variables and isolate the effect being studied. It is assumed that variables which cannot be controlled are random and thus apply to all the experimental subjects--so over a number of experiments should effectively cancel out.
Let's imagine that we wanted to test the theory that rats living in a blue environment had faster reactions than those living in a green one. We might measure this by seeing how long it took for a rat to press a lever after a bell rang. Now if we just took one rat in a blue box and one in a green then we wouldn't really be able to make any conclusion; presumably rats have some difference in reaction time anyway. If we take a thousand rats in each environment then we would expect to see a range of reaction times (and the times might well follow the 'normal distribution'). By comparing the responses of the two groups we can then see see whether the change in environment actually has an effect.
In this case there would seem to be a clear difference. In practice most cases will be less clear cut which is where statistics comes in to the equation (so to speak). The intention is to give an indication of the probability that an observed effect arises from the experimental condition being varied as opposed to random chance.
Simplifying tremendously, experiments work on the basis of negating the null hypothesis. Basically, you don't prove that blue rooms improve reaction times but rather disprove the contention that they have no effect (the 'null hypothesis'). Since this is all about probabilities rather than black/white observations, the researcher must decide at what level a result will be considered significant.
In psychology an effect is usually considered significant if the probability of the null hypothesis being true is less than 5% (this is expressed as p <= 0.05). In other words, if an observation would be due to chance less than 5 times per hundred attempts it is considered that the outcome is not due to chance.
This was covered briefly in your reading in Unit 1.2.3. Page 45 onwards looks at the question of significance. It will become clearer if you go on to study MA121 Intro to Statistics particularly in Unit 5.
To summarise: Ideally, all variables other than those being studied are controlled. In practice, some random variation will always occur. Experimenters therefore need to sample across multiple instances to determine whether an effect is significant or due to chance.