Everything I know about Python...

Learn to Write Pythonic Code!

Check out the book Writing Idiomatic Python!

Looking for Python Tutoring? Remote and local (NYC) slots still available! Email me at jeff@jeffknupp.com for more info.

A Python App to See What People Are Saying About You

I've been taking stock of the digital services I use (and pay for) but am unhappy with. Digital goods sales (for my book) has already been taken care of by bull. Next on my list is tracking mentions of my site across the Internet. In this article, we'll build a simple (but fully functional) web application that searches for and displays mentions of a particular keyword (in my case, "jeffknupp.com").

I should mention that I use a service to do this already: mention. It's OK, but I'm reaching their quota for the free service, and I can't stand their mobile app, so I'd rather have something tailored for myself. And, as I've recently discovered with bull, writing a service like this from scratch can be done quite quickly. If you know of a better application for tracking mentions, by the way, please let me know!

Twitter

I'll focus initially on Twitter, as much/most of the commentary on my site likely occurs there (as opposed to blogs or newsgroups). I wanted to try out a new Python Twitter client anyway (birdy), so I decided to use birdy for my Twitter interactions.

At the very least, I need to be able to persist mentions of my site in a database. Any problem with the word "database" in it can usually be answered with "SQLAlchemy," and this is no exception. Let's create some SQLAlchemy models for our database:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
"""Database models for the eavesdropper application."""

import datetime

from sqlalchemy import Column, Integer, String, ForeignKey, DateTime, Boolean
from sqlalchemy.orm import relationship, backref
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Source(Base):

    __tablename__ = 'source'
    id = Column(Integer, primary_key=True, autoincrement=True)
    name = Column(String)

class Mention(Base):
    """A Mention from a particular source."""

    __tablename__ = 'mention'
    id = Column(Integer, primary_key=True, autoincrement=True)
    domain_id = Column(String)
    source_id = Column(Integer, ForeignKey('source.id'))
    source = relationship(Source)
    text = Column(String)
    associated_user = Column(String)
    seen = Column(Boolean, default=False)
    recorded_at = Column(DateTime, default=datetime.datetime.now)
    occurred_at = Column(DateTime, default=datetime.datetime.now)

    def __str__(self):
        """Return the string representation of a mention."""
        return self.text

    def to_json(self):
        return {
                'id': self.id,
                'domain_id': self.domain_id,
                'source': self.source.name,
                'text': self.text,
                'associated_user': self.associated_user,
                'seen': self.seen,
                'recorded_at': str(self.recorded_at),
                'occurred_at': str(self.occurred_at)}

Nothing very surprising here. I create two models, one to represent a data source (like "Twitter"), and another to model the actual mention of the keyword I'm interested in. The only interesting thing is the to_json function. Since I know that I'll be creating a web application with a dynamic front-end, I imagine I'll be sending this data as JSON quite often. Hence the existence of to_json.

After creating a models.py file, I usually follow up with a populate_db.py file to insert initial data into the database. Here are the contents of that file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from sqlalchemy import create_engine
from models import Source, Mention, Base
from sqlalchemy.orm import sessionmaker

engine = create_engine('sqlite:///sqlite.db')
Session = sessionmaker(bind=engine)
Base.metadata.create_all(engine) 
session = Session()

s = Source(id=1, name='Twitter')
m = Mention(id=1, source=s, text='jeffknupp.com is the best website ever!')
session.add(s)
session.add(m)
session.commit()

Again, nothing crazy here. It simply creates a single Source and Mention object and inserts them into the database.

Tweet collection, the Python way

We're now ready to begin the application proper. I'll begin with a skeleton of the application, filling in the docstrings for classes and functions but nothing else. Here is the skeleton: """Find and record references to a person or brand on the Internet."""

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import sys
from flask import Flask

app = Flask(__name__)

@app.route('/')
def index():
    """Return the main view for mentions."""

@app.route('/update/<source>', methods=['POST'])
def get_updates_for_source(source):
    """Return the number of updates found after getting new data from
    *source*."""

@app.route('/read/<id>', methods=['POST'])
def read(id):
    """Mark a particular mention as having been read."""

@app.route('/mentions')
def show_mentions():
    """Return a list of all mentions in JSON."""

def main():
    """Main entry point for script.""""
    app.run()

if __name__ == '__main__':
    sys.exit(main())

Since I'm going to be adding more sources in the near future, I decided that the source-specific retrieval code should live in a separate file. Here's the skeleton for twitter.py, where most of the heavy lifting is done:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from birdy.twitter import AppClient

from models import Source, Mention

CONSUMER_KEY = 'xxxx'
CONSUMER_SECRET = 'xxxx'
client = AppClient(CONSUMER_KEY, CONSUMER_SECRET)
access_token = client.get_access_token()
QUERIES = ['jeffknupp.com', 'jeffknupp']

def get_twitter_mentions():
    """Return the number of new mentions found on Twitter."""

Let's implement that single function, get_twitter_mentions, now. First, we'll need a list to keep track of all mentions across all queries (since multiple query terms are supported).

1
2
3
4
statuses = []
for query in QUERIES:
    response = client.api.search.tweets.get(q=query, count=100)
    statuses += response.data.statuses

I'm happy with how easy birdy is to use, although this is an admittedly simple use. Anyway, now that we have all the status updates containing our queries, let's prepare to insert only the new ones into the database:

1
2
3
session = Session()
twitter = session.query(Source).get(1)
for status in statuses:

We need to iterate over each status object, which birdy returns as a JSONObject (basically a dictionary who's keys are available as attributed). We want the get_twitter_mentions function to be (logically) idempotent. That is, if we execute the function multiple times, our database does not contain duplicate results. To achieve this, we need to check for any Mention objects that have the same domain_id, which is the unique identifying ID in the source system (i.e. the unique ID Twitter assigned the tweet).

1
    if not session.query(Mention).filter(Mention.domain_id==status.id_str).count():

Easy enough. Now let's actually create the Mention object we're going to insert:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
        created_at = datetime.datetime.strptime(status.created_at, r"%a %b %d %H:%M:%S +0000 %Y")
        m = Mention(text=status.text,
                associated_user='{} ({})'.format(status.user.screen_name,
                    status.user.followers_count),
                    recorded_at=datetime.datetime.now(),
                    occurred_at=created_at,
                    source=twitter,
                    domain_id=status.id_str)
        session.add(m)
session.commit()

After going back and adding a simple count of the new Mention objects, here's the completed function in its entirety:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def get_twitter_mentions():
    """Return the number of new mentions found on Twitter."""
    statuses = []
    for query in QUERIES:
        response = client.api.search.tweets.get(q=query, count=100)
        statuses += response.data.statuses
    session = Session()
    twitter = session.query(Source).get(1)
    new_mentions = 0
    for status in statuses:
        if not session.query(Mention).filter(Mention.domain_id==status.id_str).count():
            created_at = datetime.datetime.strptime(status.created_at, r"%a %b %d %H:%M:%S +0000 %Y")
            m = Mention(text=status.text,
                    associated_user='{} ({})'.format(status.user.screen_name,
                        status.user.followers_count),
                        recorded_at=datetime.datetime.now(),
                        occurred_at=created_at,
                        source=twitter,
                        domain_id=status.id_str)
            new_mentions += 1
            session.add(m)
    session.commit()
    return new_mentions

Back to the app

Now it's time to implement the main application logic. Let's return to app.py, the file in which we created our skeleton. I know that the index function is just going to return a rendered template, since the querying for Mentions will happen on the client side. Thus, index is trivial:

1
2
3
4
@app.route('/')
def index():
    """Return the main view for mentions."""
    return render_template('index.html')

The code to return all Mention objects as JSON seems simple, so let's implement that next:

1
2
3
4
5
6
7
8
9
@app.route('/mentions')
def show_mentions():
    """Return a list of all mentions in JSON."""
    session = db.session()
    mentions = session.query(Mention).all()
    values = [mention.to_json() for mention in mentions]
    response = make_response()
    response.data = json.dumps(values)
    return response

Again, nothing too crazy. Hitting the /mentions endpoint will return a JSON list of all Mention objects in the database.

Since the purpose is similar, let's implement the read function next:

1
2
3
4
5
6
7
8
9
@app.route('/read/<id>', methods=['POST'])
def read(id):
    """Mark a particular mention as having been read."""
    session = db.session()
    mention = session.query(Mention).get(id)
    mention.seen = True
    session.add(mention)
    session.commit()
    return jsonify({'success': True})

We simply use the <id> parameter passed in via the URL as the primary key in our database look up. Then we simply changed seen to True and save the object back to the database. We return a token response that's not of much interest (really, a HTTP 204 would have been more appropriate, but I was lazy).

The rest is just mop up. Here's the implementation for get_updates_for_source (which allows us to request updates via an HTTP request) :

1
2
3
4
5
6
7
@app.route('/update/<source>', methods=['POST'])
def get_updates_for_source(source):
    """Return the number of updates found after getting new data from
    *source*."""
    if source == 'twitter':
        updates = get_twitter_mentions()
        return jsonify({'updates': updates})

And that's the last part of the file. To recap, here's what the completed file looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
"""Find and record references to a person or brand on the Internet."""

import sys
import json
import pprint
import argparse

from flask import Flask, make_response, render_template, jsonify, send_from_directory
from flask.ext.sqlalchemy import SQLAlchemy
from birdy.twitter import AppClient

from models import Source, Mention, Base
from twitter import get_twitter_mentions

app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite+pysqlite:///sqlite.db'
db = SQLAlchemy(app)

@app.route('/')
def index():
    """Return the main view for mentions."""
    return render_template('index.html')

@app.route('/update/<source>', methods=['POST'])
def get_updates_for_source(source):
    """Return the number of updates found after getting new data from
    *source*."""
    if source == 'twitter':
        updates = get_twitter_mentions()
        return jsonify({'updates': updates})


@app.route('/read/<id>', methods=['POST'])
def read(id):
    """Mark a particular mention as having been read."""
    session = db.session()
    mention = session.query(Mention).get(id)
    mention.seen = True
    session.add(mention)
    session.commit()
    return jsonify({})

@app.route('/mentions')
def show_mentions():
    """Return a list of all mentions in JSON."""
    session = db.session()
    mentions = session.query(Mention).all()
    values = [mention.to_json() for mention in mentions]
    response = make_response()
    response.data = json.dumps(values)
    return response

def main():
    """Main entry point for script.""""
    app.run()

if __name__ == '__main__':
    sys.exit(main())

Client-side rendering with React.js

I've been looking for an excuse to learn Facebook's React.js framework, and this is the perfect opportunity. I won't go into detail about the implementation because a) I'm sure there's a better way to do it and b) I'm not an authority (by any means) on the subject.

Regardless, using React, I was able to create a page that displays all mentions. Unread mentions are presented in a well. Once clicked, the asynchronously send a /read request to the database and change their appearance (by changing their CSS class). So basically there's a visual difference between read and unread items and it's updated dynamically.

Here's the contents of index.html (which you may notice is very similar to the React tutorial code...):

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
<!doctype html>
<html>
<head>
    <script src="http://fb.me/react-0.8.0.js"></script>
    <script src="http://fb.me/JSXTransformer-0.8.0.js"></script>
    <script src="http://code.jquery.com/jquery-1.10.0.min.js"></script>
    <link href="//netdna.bootstrapcdn.com/bootstrap/3.0.3/css/bootstrap.min.css" rel="stylesheet">
    <link rel="stylesheet" href="/static/css/style.css">
</head>
<body>
<div class="container">
    <div class="row">
        <div class="col-md-6 col-md-offset-3" >
            <div id="content">
            </div>
            <script type="text/jsx">
            /**
            * @jsx React.DOM
            */
            var MentionBox = React.createClass({
                getInitialState: function() {
                    return {data: []}
                },
                loadCommentsFromServer: function() {
                    $.ajax({
                        url: '/mentions',
                        dataType: 'json',
                        success: function (data) {
                            this.setState({data: data});
                        }.bind(this),
                        error: function (xhr, status, err) {
                            console.error("/mentions", status, err.toString());
                        }.bind(this)
                    });
                },
                componentWillMount: function() {
                    this.loadCommentsFromServer();
                    setInterval(this.loadCommentsFromServer, this.props.pollInterval);
                },
                render: function() {
                    return (
                        <div className="mentionBox">
                        <h1>Mentions</h1>
                        <MentionList data={this.state.data}/>
                        </div>
                        );
                }
            });

            var MentionList = React.createClass({
                render: function() {
                    var mentions = this.props.data.map( function(mention) {
                        return <Mention mention={mention}></Mention>;
                    });

                    return (
                        <div className="mentionList">
                            {mentions}
                        </div>
                    );
                }
            });

            var Mention = React.createClass({
                getInitialState: function() {
                    return {seen: this.props.mention.seen}
                },
                markRead: function() {
                    var id= 
                $.ajax({
                        type: 'POST',
                        url: '/read/' + this.props.mention.id,
                        dataType: 'json',
                        success: function (data) {
                            this.setState({seen: true});
                        }.bind(this),
                        error: function (xhr, status, err) {
                            console.error("/mentions", status, err.toString());
                        }.bind(this)
                    });
                },
                render: function() {
                    return (
                            <div className={this.state.seen ? true : "well" } onClick={this.markRead}>
                                <div className="pull-right">
                                    <h6>
                                    {this.props.mention.source}
                                    </h6>
                                </div>
                                <h4>{this.props.mention.associated_user} <small> &nbsp; at {this.props.mention.occurred_at}</small></h4>
                                    <p>{this.props.mention.text}</p>
                            </div>
                    );
                }
            });

            React.renderComponent(
                    <MentionBox url="/mentions" pollInterval={20000} />,
                    document.getElementById('content')
            );
            </script>
        </div>
    </div>
</div>
</body>
</html>

That gives me this:

I decided to call the project "eavesdropper" as it's constantly listening to what others are saying :). In the next post about this project, I'll show you how to extend the project to pull from multiple sources. Until then, thanks for reading!

Posted on by
comments powered by Disqus
Web Analytics