Data leakage occurs when information outside the scope of the training data is used in the model building process. This can induce unintended and unknown bias into the model that might not be discovered until it is not performing as intended when put into production. The best way to safeguard against data leakage is to have a robust validation procedure that ensures no portion of the validation data is used anywhere during the training process.