Social media and Twitter in particular are alternative data sources that are being used extensively to take the pulse of market sentiment.
In this post we will review the Tweepy library for fetching real-time and historical data from Twitter. We will go through the following topics:
- Twitter and Sentiment Analysis
- The impact of social networks on market trends
- What is Twitter?
- A Python Twitter API, tweepy
- How to install and set up Tweepy?
- Authentication on Twitter API
- How to use Tweepy to get tweets
- Tweets pagination with cursors
- Building a naive sentiment indicator
Twitter and Sentiment Analysis
Social networks and Twitter in particular are alternative data sources that are being used extensively as a market sentiment indicator.
Sentiment analysis of news and social networks is a comprehensive area of study where natural language processing is of vital importance to extract quantitative information from unstructured information sources.
We would need a book, if not an encyclopedia to describe the whole process of sentiment analysis and integrate it into our trading system.
So, in this post we are going to start with the first step, extracting information from Twitter to feed our Natural Language Processing algorithms.
The impact of social networks on market trends
Social networks have emerged as – mainstream communication media where users can be both consumers and creators of information.
Users are continuously generating data, either passively as mere receivers of information where our interest, reading time, etc. are recorded, or actively by creating content, expressing our interest in another content or even sharing it.
In addition, sufficiently interesting or shocking news can literally travel the world and become a global trend. Users with millions of followers can also create trends or impacts on specific topics.
The analysis of this type of data has been the subject of study for several years, having generated abundant academic material to model the flow of information, their interpretation and the extraction of mood, trends and forecasts.
Twitter is a microblogging social network where users interact by posting information, sharing through retweets or showing their interest by marking information as a favourite.
It also has a hashtags (#) mechanism to tag content with the ability for the message to reach many more people than just followers.
According to the Internet Live Stats, Twitter has more than 372 million active users, and the number is continuously increasing, and more than 700 million tweets are posted every day.
On the other hand, hundreds of users can generate heated debates about a company, which creates trends about the mood of the investors. Debates with thousands, if not millions of passive and active viewers.
Twitter has therefore become a relevant source of alternative information for analysts who believe that social media sentiment can increase their edge in the markets.
A Python Twitter API, Tweepy
Tweepy is an easy-to-use Python library for accessing the Twitter API, as its – website claims. It’s not a unique library to connect to the Twitter API, but it is one of the best known and with very active development on GitHub.
As can be easily deduced from the previous paragraph, Tweepy allows us to connect to Twitter information to extract historical and real-time data. This means that we have tons of information available for analysis through this library.
The information from social networks, and Twitter in particular, is unstructured data and uses a lot of jargon, slang, emoticons, etc., which makes it especially complicated to analyze.
Natural Language Processing or NLP is an area specialized in extracting quantitative and qualitative information from all this unstructured information. When it is used in conjunction with classical statistical tools or machine learning, we are able to do sentiment analysis of social networks and unstructured data in general.
In one post there is not enough space to do a complete coverage of sentiment analysis in social networks, so here we are going to focus on the first but not least important step of sentiment analysis which is the extraction, transformation and loading of Twitter information into analysis’ algorithms.
How to install and set up Tweepy?
Install the Tweepy Python library as usual through the pip installer:
pip install tweepy
Or clone the source code from GitHub and install it as a developer in your machine if you plan to modify the code and make your own contribution to the community.
git clone https://github.com/tweepy/tweepy.git cd tweepy python setup.py develop
In addition to installing the Tweepy library in Python, it is required to create a developer account on Twitter in order to obtain the keys that will allow us to connect with the Twitter servers to extract data.
Once the library is installed and the developer keys are obtained, we are ready to start working with information from Twitter.
Authentication on Twitter API
Start a terminal (Anaconda prompt if you want) and launch an interactive python session with the ipython command, import the tweepy library and assign variables for the keys.
Next we must create an authentication handler object and finally, create a tweepy object with the authentication handler object.
Now we have an object called api which is a socket connected against Twitter machines and enables us to extract tweets from specific users, extract tweets related to some word or set of words or even manage our own Twitter account.
How to use Tweepy to get tweets
The object api we have created allows us to use 105 RESTful methods from the Twitter API, some of them are only available for the premium API.
We can list all the available methods:
The list is not complete, although we have seen that we have 105 methods available. If we take a look, we will see that we can perform more functions than with the App itself and we can automate our own Twitter account and manage our publications or even create a completely autonomous bot to manage an account.
Some of the most used methods to get data are the following:
- search to search for tweets containing certain words
- user_timeline to search for tweets from specific users
Let’s get some tweets by looking for a word. We call the search method of the api object with the input parameter q for query.
Checking the results, we are getting a model (another object) with the tweets for the requested query, we get 15 tweets looking for the ticker of Johnson & Johnson $JNJ
Although this is the default number, we can increase it to 100 using the count input parameter, the maximum per request for a free Twitter API developer.
It is incredible the vast amount of meta-information that comes with a simple tweet of 140 characters, I have only included the first part of the result because to show it completely would require several pages.
So let’s take a look at more specific information such as the date and time of the tweet, the user and the content posted.
Again, for the user it returns a large amount of metadata associated with the user, such as name, alias, number of followers, etc.
To see a little more detail, we need again to use the methods available in the search object.
Remember that what Twitter returns us is an object model and as such, we have at our disposal countless methods to work with those objects.
Each tweet is an object of the Status class and we can see its methods here.
For the User model we have 52 methods available for working with users.
Please, check the documentation to learn about the large number of parameters that can be used in searches.
Another of the most used methods of the Twitter API is the user_timeline that allows us to analyze all the publications of a particular user.
Elon Musk’s account is hot, he has millions of followers and some of his posts cause seismic movements. Let’s pull down some of his tweets.
Again we can see that what we get is an object of class ResultSet or properly speaking a model of objects that provide us with a multitude of methods to work with the information.
Note that the maximum number of tweets returned is 20 by default, although we can extend it up to 200 with the count parameter.
It is not possible here to describe each of the parameters for the user_timeline function so we strongly recommend you to visit the Tweepy documentation.
Visit QuantInsti for additional insight on this topic: https://blog.quantinsti.com/tweepy/.
Disclosure: Interactive Brokers
Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from QuantInsti and is being posted with its permission. The views expressed in this material are solely those of the author and/or QuantInsti and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.