A representative definition of Machine Learning (ML) is "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E" (Tom M. Mitchell). A more recent specification is shown in this figure (from a book by Peter Flach), a (meta-level) machine learning algorithm takes the given training data as input and produces a model as output, then the model can serve as an (object-level) algorithm that solves the domain problem. Since the learning process replaces the programming process by human programmers, the application scope of computer is greatly extended.
An early attempt (under the name of "scientific discovery") was exemplified by the BACON system, which guesses the expression relating the variables (say, if X and Y are positively correlated, calculate X/Y; if they are negatively correlated, calculate X*Y). The program "re-discovered" several laws in physics and chemistry, though its explanation of scientific discovery has been questioned (cartoon).
A different approach is to approximate the target function statistically. The simplest case is linear regression, where the best linear function is found to summarize the training data. Since the type of the expression (or call it "model") has been determined in the design stage, the only thing to be learned is the parameters in the expression.
More complicated approaches have been used to approximate various functions in multidimensional spaces. For example, Support Vector Machine (SVM) finds the hyperplane that best separates the data into classes, maximizing the margin between the closest points (support vectors) of each class. For regression tasks (called Support Vector Regression), SVM tries to fit a line within a threshold (epsilon) that encompasses most of the data points. Using given kernel functions, points in the input space can be mapped into a feature space where a separating hyperplane can be easily found. When the number of dimensions is large, "features" of the input are used as intermediate abstraction between the input and the output.