Infosys’ blog on industry solutions, trends, business process transformation and global implementation in Oracle.

« Robots - Their World, My World, Our World | Main | Zero Based Budgeting - A Rigorous way of Cost Optimization »

Machine Learning - Introduction

Machine Learning - Introduction

Machine learning is a sub branch of Artificial Intelligence (AI) where we can write a code to use existing pre-defined algorithms to learn underlying patterns or underlying information in Raw data. Machine Learning basically means giving intelligence to machine to perform tasks as humans do. For example, now a days we hear a lot about google cars where we set destination and relax in car. The car will reach the destination without any other human intervention inspite of traffic / obstacles. The car is trained to identify obstacles/ traffic signals and act accordingly. Other examples applications include spam filtering, Credit card fraud detection, optical character recognition (OCR), Detecting faces in images, search engines, Handwriting recognition, Search engines, Handwriting recognition.... etc

  Types of Learning:

Machine learning tasks are divided or classified into mainly two categories, depending on the nature of the learning, "signal" or "feedback" available to a learning system.

These are:

Supervised learning

Unsupervised learning

Supervised learning:

Supervised learning is simply a formalization of the idea of learning from ex-supervised/ Pre-supervised examples. The computer is presented with example inputs and their desired outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs. Here we observe the features X1, X2, . . . , Xp and we need to predict an associated response variable Y . Data is usually split into training, test and validation sets. Model gets trained using the trained set and predictions are then made on the test set and then validated with the actual output. Validation sets are then used to ensure models are not over-fitted.

The two sets of data are known as training set and a test set.  The idea is for the learner to "learn" from a set of labeled examples in the training set so that it can identify unlabeled examples in the test set with the highest possible accuracy.

For example, a training set might consist of images of different types of gadgets (say, laptop, mobile e.t.c.), where the identity of each gadgets in image is given to the learner. The test set would then consist of more unidentified pieces of gadget, but from the same classes. The goal for the learner to develop a rule that can identify the elements in the test set.


Supervised methods can be used in many domains such as finance, manufacturing and marketing.

Supervised learning is further divided into two types.

1.) Regression

2.) Classification


            In Regression problems, we map the given data to real value domain. For Example, a regressor can predict the price of a house given its characteristics ( location, area, accessibility to roads, supermarkets, bus stops etc., ) . Some of the basic algorithms used in regression problems are Linear Regressor, Ridge Regressor, Lasso Regressor, Elastic Net, K-Nearest Neighbor Regressor, Decision Trees, Support Vector Machines, Artificial Neural Networks etc.

Claasifiers map the given data into pre-defined classed. For Exapmle, classifiers cab be used to classify whether loan can be given to a customer or not based on his credit score or whether an recieved email is genuine or spam etc., Some of the basic algorithms used are Logistic Regression, Linear Discriminant Analysis, Navie-Bayies classifier, K-Nearest Neighbour Classifier, Decision Trees, Support Vector Machines  e.t.c.,


Un-Supervised learning:

                Unsupervised learning involves no target values. It tries to auto associate information from the inputs with an intrinsic reduction of data dimensionality. Here we observe only the features X1, X2, . . . , Xp and we are not interested in prediction, because we do not have an associated response variable Y. Unsupervised learning is based on the relations among the data, and is used to find the significant patterns or features in the input data without the help of a teacher.

For example, a dataset might consist of images of different types of gadgets (say, laptop, mobile e.t.c.), where the identity of each gadgets in image is not given to the learner. We need to design an algorithm so that the computer classifies the gadgets based on the size, shape or weight etc...

A criterion is needed to terminate the learning process. Without a stopping criterion, a learning process continues even when a pattern, which does not belong to the training patterns set, is presented to the network. The network is adapted according to a constantly changing environment. Hebbian learning, competitive learning, and SOM are the three well-known unsupervised learning approaches. Generally speaking, unsupervised learning is slow to settle into stable conditions.

              Un-Supervised learning is further divided into two types.

1.) Clustering

2.) Association

Clustering means grouping the data based on similarities between them. For Example, Let us assume that we had a library and we were given a lot of books as a donation. Now how can we arrange them in racks? We need to group certain books into categories and then arrange them in racks. Similar way clustering algorithms work. Clustering algorithms are used to take the data, find similarities and dissimilarities between them and group them together. 


Clustering is normally done on distance based. Data is converted to set of points in space, with distance measures we group the points into number of clusters, such as within cluster or inter cluster distances should be minimum and intra cluster distances are maximum.

Association rule mining was used to discover interesting patterns. The items are stored in the form of transactions that can be generated by an external process, or extracted from relational databases or data warehouses. Market-basket analysis, one of the most intuitive applications of association rules, strives to analyze customer buying patterns by finding associations between items that customers put into their baskets. For example, you could mine the transaction data of a grocery store for frequent patterns and association rules. An example rule could be {milk, bread} -> {eggs}, this rule would tell you that if someone purchased milk and bread together, then they are also likely to purchase eggs.  Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Blogger Profiles