I've been taking stock of the digital services I use (and pay for) but am
unhappy with. Digital goods sales (for my book) has already been taken care of
by bull. Next on my list is tracking
mentions of my site across the Internet. In this article, we'll build a simple
(but fully functional) web application that searches for and displays mentions
of a particular keyword (in my case, "jeffknupp.com").
I should mention that I use a service to do this already: mention.
It's OK, but I'm reaching their quota for the free service, and I can't stand
their mobile app, so I'd rather have something tailored for myself. And, as I've
recently discovered with bull, writing a service like this from scratch can be
done quite quickly. If you know of a better application for tracking mentions,
by the way, please let me know!
Twitter
I'll focus initially on Twitter, as much/most of the commentary on my site
likely occurs there (as opposed to blogs or newsgroups). I wanted to try out a
new Python Twitter client anyway (birdy), so
I decided to use birdy for my Twitter interactions.
At the very least, I need to be able to persist mentions of my site in a database.
Any problem with the word "database" in it can usually be answered with
"SQLAlchemy," and this is no exception. Let's create some SQLAlchemy models for
our database:
"""Database models for the eavesdropper application."""importdatetimefromsqlalchemyimportColumn,Integer,String,ForeignKey,DateTime,Booleanfromsqlalchemy.ormimportrelationship,backreffromsqlalchemy.ext.declarativeimportdeclarative_baseBase=declarative_base()classSource(Base):__tablename__='source'id=Column(Integer,primary_key=True,autoincrement=True)name=Column(String)classMention(Base):"""A Mention from a particular source."""__tablename__='mention'id=Column(Integer,primary_key=True,autoincrement=True)domain_id=Column(String)source_id=Column(Integer,ForeignKey('source.id'))source=relationship(Source)text=Column(String)associated_user=Column(String)seen=Column(Boolean,default=False)recorded_at=Column(DateTime,default=datetime.datetime.now)occurred_at=Column(DateTime,default=datetime.datetime.now)def__str__(self):"""Return the string representation of a mention."""returnself.textdefto_json(self):return{'id':self.id,'domain_id':self.domain_id,'source':self.source.name,'text':self.text,'associated_user':self.associated_user,'seen':self.seen,'recorded_at':str(self.recorded_at),'occurred_at':str(self.occurred_at)}
Nothing very surprising here. I create two models, one to represent a data
source (like "Twitter"), and another to model the actual mention of the keyword
I'm interested in. The only interesting thing is the to_json function. Since I
know that I'll be creating a web application with a dynamic front-end, I imagine
I'll be sending this data as JSON quite often. Hence the existence of to_json.
After creating a models.py file, I usually follow up with a
populate_db.py file to insert initial data into the database. Here are the
contents of that file:
1 2 3 4 5 6 7 8 91011121314
fromsqlalchemyimportcreate_enginefrommodelsimportSource,Mention,Basefromsqlalchemy.ormimportsessionmakerengine=create_engine('sqlite:///sqlite.db')Session=sessionmaker(bind=engine)Base.metadata.create_all(engine)session=Session()s=Source(id=1,name='Twitter')m=Mention(id=1,source=s,text='jeffknupp.com is the best website ever!')session.add(s)session.add(m)session.commit()
Again, nothing crazy here. It simply creates a single Source and Mention object
and inserts them into the database.
Tweet collection, the Python way
We're now ready to begin the application proper. I'll begin with a skeleton of
the application, filling in the docstrings for classes and functions but nothing
else. Here is the skeleton:
"""Find and record references to a person or brand on the Internet."""
importsysfromflaskimportFlaskapp=Flask(__name__)@app.route('/')defindex():"""Return the main view for mentions."""@app.route('/update/<source>',methods=['POST'])defget_updates_for_source(source):"""Return the number of updates found after getting new data from *source*."""@app.route('/read/<id>',methods=['POST'])defread(id):"""Mark a particular mention as having been read."""@app.route('/mentions')defshow_mentions():"""Return a list of all mentions in JSON."""defmain():"""Main entry point for script.""""app.run()if__name__=='__main__':sys.exit(main())
Since I'm going to be adding more sources in the near future, I decided that the
source-specific retrieval code should live in a separate file. Here's the
skeleton for twitter.py, where most of the heavy lifting is done:
1 2 3 4 5 6 7 8 9101112
frombirdy.twitterimportAppClientfrommodelsimportSource,MentionCONSUMER_KEY='xxxx'CONSUMER_SECRET='xxxx'client=AppClient(CONSUMER_KEY,CONSUMER_SECRET)access_token=client.get_access_token()QUERIES=['jeffknupp.com','jeffknupp']defget_twitter_mentions():"""Return the number of new mentions found on Twitter."""
Let's implement that single function, get_twitter_mentions, now.
First, we'll need a list to keep track of all mentions across all queries (since
multiple query terms are supported).
I'm happy with how easy birdy is to use, although this is an admittedly simple
use. Anyway, now that we have all the status updates containing our queries,
let's prepare to insert only the new ones into the database:
We need to iterate over each status object, which birdy returns as a
JSONObject (basically a dictionary who's keys are available as attributed).
We want the get_twitter_mentions function to be (logically) idempotent. That
is, if we execute the function multiple times, our database does not contain
duplicate results. To achieve this, we need to check for any Mention objects that have
the same domain_id, which is the unique identifying ID in the source system (i.e. the unique ID Twitter assigned
the tweet).
After going back and adding a simple count of the new Mention objects, here's
the completed function in its entirety:
1 2 3 4 5 6 7 8 91011121314151617181920212223
defget_twitter_mentions():"""Return the number of new mentions found on Twitter."""statuses=[]forqueryinQUERIES:response=client.api.search.tweets.get(q=query,count=100)statuses+=response.data.statusessession=Session()twitter=session.query(Source).get(1)new_mentions=0forstatusinstatuses:ifnotsession.query(Mention).filter(Mention.domain_id==status.id_str).count():created_at=datetime.datetime.strptime(status.created_at,r"%a %b %d %H:%M:%S +0000 %Y")m=Mention(text=status.text,associated_user='{} ({})'.format(status.user.screen_name,status.user.followers_count),recorded_at=datetime.datetime.now(),occurred_at=created_at,source=twitter,domain_id=status.id_str)new_mentions+=1session.add(m)session.commit()returnnew_mentions
Back to the app
Now it's time to implement the main application logic. Let's return to app.py,
the file in which we created our skeleton. I know that the index function is
just going to return a rendered template, since the querying for Mentions
will happen on the client side. Thus, index is trivial:
1234
@app.route('/')defindex():"""Return the main view for mentions."""returnrender_template('index.html')
The code to return all Mention objects as JSON seems simple, so let's
implement that next:
123456789
@app.route('/mentions')defshow_mentions():"""Return a list of all mentions in JSON."""session=db.session()mentions=session.query(Mention).all()values=[mention.to_json()formentioninmentions]response=make_response()response.data=json.dumps(values)returnresponse
Again, nothing too crazy. Hitting the /mentions endpoint will return a JSON
list of all Mention objects in the database.
Since the purpose is similar, let's implement the read function next:
123456789
@app.route('/read/<id>',methods=['POST'])defread(id):"""Mark a particular mention as having been read."""session=db.session()mention=session.query(Mention).get(id)mention.seen=Truesession.add(mention)session.commit()returnjsonify({'success':True})
We simply use the <id> parameter passed in via the URL as the primary key in
our database look up. Then we simply changed seen to True and save the object
back to the database. We return a token response that's not of much interest
(really, a HTTP 204 would have been more appropriate, but I was lazy).
The rest is just mop up. Here's the implementation for get_updates_for_source
(which allows us to request updates via an HTTP request) :
1234567
@app.route('/update/<source>',methods=['POST'])defget_updates_for_source(source):"""Return the number of updates found after getting new data from *source*."""ifsource=='twitter':updates=get_twitter_mentions()returnjsonify({'updates':updates})
And that's the last part of the file. To recap, here's what the completed file
looks like:
"""Find and record references to a person or brand on the Internet."""importsysimportjsonimportpprintimportargparsefromflaskimportFlask,make_response,render_template,jsonify,send_from_directoryfromflask.ext.sqlalchemyimportSQLAlchemyfrombirdy.twitterimportAppClientfrommodelsimportSource,Mention,Basefromtwitterimportget_twitter_mentionsapp=Flask(__name__)app.config['SQLALCHEMY_DATABASE_URI']='sqlite+pysqlite:///sqlite.db'db=SQLAlchemy(app)@app.route('/')defindex():"""Return the main view for mentions."""returnrender_template('index.html')@app.route('/update/<source>',methods=['POST'])defget_updates_for_source(source):"""Return the number of updates found after getting new data from *source*."""ifsource=='twitter':updates=get_twitter_mentions()returnjsonify({'updates':updates})@app.route('/read/<id>',methods=['POST'])defread(id):"""Mark a particular mention as having been read."""session=db.session()mention=session.query(Mention).get(id)mention.seen=Truesession.add(mention)session.commit()returnjsonify({})@app.route('/mentions')defshow_mentions():"""Return a list of all mentions in JSON."""session=db.session()mentions=session.query(Mention).all()values=[mention.to_json()formentioninmentions]response=make_response()response.data=json.dumps(values)returnresponsedefmain():"""Main entry point for script.""""app.run()if__name__=='__main__':sys.exit(main())
Client-side rendering with React.js
I've been looking for an excuse to learn Facebook's React.js
framework, and this is the perfect opportunity. I won't go into detail about the
implementation because a) I'm sure there's a better way to do it and b) I'm not
an authority (by any means) on the subject.
Regardless, using React, I was able to create a page that displays all mentions.
Unread mentions are presented in a well. Once clicked, the asynchronously send a
/read request to the database and change their appearance (by changing their
CSS class). So basically there's a visual difference between read and unread
items and it's updated dynamically.
Here's the contents of index.html (which you may notice is very similar to the
React tutorial code...):
I decided to call the project "eavesdropper" as it's constantly listening to
what others are saying :). In the next post about this project, I'll show you
how to extend the project to pull from multiple sources. Until then, thanks for
reading!