Infosys’ blog on industry solutions, trends, business process transformation and global implementation in Oracle.

« Artificial Intelligence is the future of Finance | Main | Exploring Anapedia »

Working Model of Stock Price Prediction using Natural Language Processing

Natural language processing, widely known as NLP, is a subfield of artificial intelligence. This is used to create a link between human and machine. NLP helps machine to understand human language by educating or train them based on rules or data.

NLP is mainly used in Speech Technology, OCR, Machine learning etc. Most common example we can consider of NLP is email or text filter, predictive words in email or text, digital assistant, data analysis etc.

Now researchers have taken NLP into a next level where a machine is trained to understand financial market's up and down. Using its data analysis capability, a machine can now predict how a specific stock price will behave in future.


Now this is important to understand why we needed such an AI powered system which will tell what stock to buy or not. This can be also done by a human also. But here machine beats human by it's computational power. It can analyze large number of historical and current financial data, data from social media and news and then analyse it and finally provide the prediction.

Investment is one of the most difficult decisions which may result in huge profit or loss according to the investors' analysis. It is very crucial that the extent of human errors in these pressure situations is reduced so that the profit can be maximized. The technical analysts believe that the future price can be forecasted using the past price movements.

Sentiment analysis uses text mining, natural language processing and computational techniques to automatically extract sentiments from a text. It aims to classify the polarity of a given text at the sentence level or class level, whether it reflects a positive, negative, or neutral view. In stock market prediction task, two important sources of the text are used either social media or online financial news article and historical stock prices. Sentiment analysis decreases the risk factor by informing the investors about the intricacies of the decision they are about to make. The stock closing prices for some future date could be predicted by training the machine learning models by providing the stock prices for previous dates. When sentiment analysis is applied on stocks in news from regarding the public sentiment or opinion on that stock. Then, it becomes evident that whether to invest in that stock or not.

Block diagram representation:


Above diagram shows how data can be fed to a machine and then rules will be applied to those data to make the prediction. The more data machine consumes to train the more accurate result can be seen.

To show how this prediction model works I have created various case studies and tested with Amazon, Facebook and Netflix stock prices.

Note: These codes are written in Python using google colab.

Programs are created for this article are very simple and it shows how to train the machine with dataset and predict future stock prices. I have used SVM, LR and Decision tree model. These programs uses downloaded stock files from financial site (Yahoo Finance) as input data. Also, these programs can be more enhanced which can read from all type of social media news and multiple financial files.

SVM model:

Support Vector Machine is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems.

LR model:

Linear regression was introduced in statistics as a model to understand the relationship between input and output numerical variables. But later this is used in natural language processing. It is both a statistical algorithm and a machine learning algorithm.

Below are the steps used for Decision Tree Classifier with Amazon, Facebook and Netflix stock files.

Case Study 1: With Facebook Stock

We are loading the stock file here.


Next step is making 'Date' field as indexed field.


Here we are using identifier 1 or 0 to understand when stock prices gone up or down. We are monitoring 'Close' field for this purpose. So, if the price is up next day it will show as 1 and if the price is down then it will show 0. Please refer 'Price_Up' column.


Next step is manipulating this dataset for further activities:



 Now we are creating training model; 80% of total stock data will be used as training data and 20% as testing data


Above score is predicted by Decision Tree Classifier.

Below is the comparison of actual and prediction data:


Case Study 2: With Amazon Stock

We are using same program here with different stock file.



Prediction as below:



So far, we have seen Decision tree model, the prediction score is not high enough. Therefore, Support Vector Machine Model or Liner Regression Model comes into picture. Below studies have been done with both SVM and LR.

Case Study 3: This has been performed with Wiki data and Facebook stock price which is available in quandl

First, we will install the required packages as below. Also this is a new program created. Please follow below series of steps.












SVM model score:


Linear Regression model score:





LR vs SVM prediction:



Above case studies help us to understand how an AI based prediction system can be built. Also, the prediction score will depend on multiple factors like complex logic, training dataset etc. This process is not just simply trying to predict a value but it works on every stock related sentiment and risk analysis.

Now we will perform another case study which will show the graphical representation of prediction.

Case Study 4: Graphical representation of Amazon stock price prediction

To showcase this below steps/codes were built. 




















Graphical representation of Original and Predictive values in Tree model:




Graphical representation of Original and Predictive values in LR model




As mentioned earlier, this machine can perform more accurately if it is built to handle massive load as training data equipped with better hardware or processor etc. Also, this can be interfaced with any number of language feed. The machine will translate any feed to a common machine language and then perform its analysis



1. Wikipedia

2. Yahoo Finance

3. Forbes

4. Google Images


Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Blogger Profiles