How to Extract Tweets Data from Twitter Using Python

SARVESH AMRUTE
4 min readSep 18, 2021

In this tutorial we’ll look at different ways to extract data from Twitter. We’ll use the Twitter RESTful API for this. Most people tweet about their opinion or experience of an event by tagging or mentioning users or by using hashtags so that people find content regarding to that topic. This data can be used for research as well as to obtain the demogramic insights.

Twitter’s API tweepy allows to do complex queries like extracting every tweet about a certain topic in / after a certain period, or extract tweets for a particular hashtags, etc.

In this tutorial we will be studying 3 different ways of filtering and extracting tweets.

1] First we will extract tweets mentioning the username. That means we’ll extract those tweets which contains the given username.

For eg. Here in this tweet by SpaceX, @inspiration4x user account has been mentioned.

2] Then we’ll extract tweets containing hashtags.

3] Then we’ll extract tweets after a certain date.

4] Lastly, we’ll extract the tweets tweeted by a particular user and store it in a dataframe.

Before we get started using tweepy, we have to make sure that:

1] We have a Twitter account along with the developer access. (if you don’t have a developer account, please go through the Twitter Developer Account section)

2] tweepy package installed in your system. (pip install tweepy)

Twitter Developer Account

1] Login or make a twitter account at https://apps.twitter.com/

2] Click on Create a new app

3] Fill in the details asked like app name, etc.

4] Once the app has been created, click on Keys and tokens, here you will get the api key and api secret key and scrolling down, you’ll be able to see the option of generating the access token and token secret.

Please keep this credentials with you as we will require it later.

Now lets get started with the coding!

Lets start with importing the tweepy package.

import tweepy

Then we’ll save those credentials in these 4 variables.

consumer_key = "your_consumer_key"  //same as api key
consumer_secret = "your_consumer_secret_key" //same as api secret
access_key = "your_access_key"
access_secret = "your_access_secret_key"

We now use these credentials to connect successfully as an authorized developer, and then connect to the API which will then help us to fetch data.

# Twitter authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)

# Creating an API object
api = tweepy.API(auth)

So now we will be using the cursor method for extracting tweets.

In this method, we pass api.search parameter for searching the tweets, q (query) to mention username, hashtag, etc and tweet_mode= ‘extended’ to extract the tweets with full text, else by default it returns 160 characters of tweet.

Also I request you to follow these steps simultaneously with this tutorial to learn faster!

1. For a particular user mention

username_tweets = tweepy.Cursor(api.search, q="@elonmusk", tweet_mode='extended').items(5)

We are extracting 5 tweets here using the items method.

.Cursor() returns the object which we can iterate to get the tweets collected. Also each item in the iterator has various attributes so that we can access the particular information about the tweet like:

1] the text of the tweet

2] the sender of the tweet

3] the date on which the tweet was sent

and many more.

Now we’ll iterate over the tweets and print them one by one

for tweet in username_tweets:
text = tweet._json["full_text"]
print(text)
#using different attributes
print(tweet.favorite_count)
print(tweet.retweet_count)
print(tweet.created_at)

Else, we can use List Comprehension also to get all the tweets in a list.

username_tweets_list = [tweet.text for tweet in username_tweets]
print(username_tweets_list[0]) //printing the recent tweet
mentioning elonmusk

2. For a particular hashtag

Similar to the above example, we will now extract the tweets on a particular hashtag.

hashtag_tweets = tweepy.Cursor(api.search, q="#VaccinationDrive", tweet_mode='extended').items(5)

Now we’ll iterate.

for tweet in hashtag_tweets:
text = tweet._json["full_text"]
print(text)

And the other things remain same as we saw previously.

3. Tweets after a mentioned date

Here we’ll see how to extract the tweets after the mentioned date.

date_tweets = tweepy.Cursor(api.search, q="@elonmusk", since="2020-5-31", tweet_mode='extended').items(5)

3. All tweets created by a particular user account since it’s very first tweet

So far we extracted the tweets which contained either username or hashtag and also applied filter of date.

Now we will be extracting the tweets of a particular user account.

For this we’ll be using api.user_timeline method and provide the user account in the screen_name parameter.

Lets extract the 10 recent tweets by Elon Musk!

new_tweets = tweepy.Cursor(api.user_timeline, screen_name="elonmusk", tweet_mode='extended').items(10)

Now we’ll iterate through the new_tweets!

list = []
for tweet in new_tweets:
text = tweet._json["full_text"]

refined_tweet = {'text' : text,
'favorite_count' : tweet.favorute_count,
'retweet_count' : tweet.retweet_count,
'created_at' : tweet.created_at}

list.append(refined_tweet)

Now the last part is to make a dataframe of tweets with columns representing the different attributes of tweet. Lets see!

import pandas as pd
df = pd.DataFrame(list)
df.to_csv('refined_tweets.csv')

Now let’s see the dataframe!

So we can see tweet, favorite count, retweet count and created_at attributes in the form of columns since we created dictionary and converted that dictionary to dataframe so that the keys get transformed into columns.

Conclusion

Thus we have seen that how easy it is to extract tweets on different basis from twitter. We learnt to extract tweets containing usernames, hashtags and the tweets tweeted by any user.

The whole procedure consists of 3 tasks:
1] Opening developer account on https://apps.twitter.com/ and getting the credentials mentioned at the start.
2] Installing tweepy and pandas liibraries.
3] Code

That’s it in this tutorial, all the data science fans, stay tuned for more articles / tutorials.

We can connect on Twitter, Linkedin and Github

Reference

--

--