twitter-ebooks/README.md

148 lines
5.5 KiB
Markdown
Raw Normal View History

2014-11-15 04:20:18 +11:00
# twitter\_ebooks
2013-11-08 06:02:05 +11:00
2014-11-15 04:01:10 +11:00
[![Gem Version](https://badge.fury.io/rb/twitter_ebooks.svg)](http://badge.fury.io/rb/twitter_ebooks)
2014-11-15 04:02:05 +11:00
[![Build Status](https://travis-ci.org/mispy/twitter_ebooks.svg)](https://travis-ci.org/mispy/twitter_ebooks)
2014-11-15 04:04:11 +11:00
[![Dependency Status](https://gemnasium.com/mispy/twitter_ebooks.svg)](https://gemnasium.com/mispy/twitter_ebooks)
2014-12-05 14:03:11 +11:00
A framework for building interactive twitterbots which respond to mentions/DMs. twitter_ebooks tries to be a good friendly bot citizen by avoiding infinite conversations and spamming people, so you only have to write the interesting parts.
2014-11-15 04:01:10 +11:00
2014-12-05 14:03:11 +11:00
## New in 3.0
2014-12-05 22:57:32 +11:00
- Bots run in their own threads (no eventmachine), and startup is parallelized
- Bots start with `ebooks start`, and no longer die on unhandled exceptions
- `ebooks auth` command will create new access tokens, for running multiple bots
- `ebooks console` starts a ruby interpreter with bots loaded (see Ebooks::Bot.all)
- Replies are slightly rate-limited to prevent infinite bot convos
2014-12-05 14:03:11 +11:00
- Non-participating users in a mention chain will be dropped after a few tweets
2013-11-08 06:02:05 +11:00
## Installation
2014-11-15 04:35:03 +11:00
Requires Ruby 2.0+
2014-03-31 03:08:27 -07:00
2013-11-08 06:02:05 +11:00
```bash
gem install twitter_ebooks
```
2013-11-27 06:55:14 -08:00
## Setting up a bot
2013-11-27 07:02:05 -08:00
Run `ebooks new <reponame>` to generate a new repository containing a sample bots.rb file, which looks like this:
2013-11-27 06:55:14 -08:00
``` ruby
# This is an example bot definition with event handlers commented out
2014-12-05 22:57:32 +11:00
# You can define and instantiate as many bots as you like
class MyBot < Ebooks::Bot
# Configuration here applies to all MyBots
def configure
# Consumer details come from registering an app at https://dev.twitter.com/
# Once you have consumer details, use "ebooks auth" for new access tokens
self.consumer_key = '' # Your app consumer key
self.consumer_secret = '' # Your app consumer secret
# Users to block instead of interacting with
self.blacklist = ['tnietzschequote']
# Range in seconds to randomize delay when bot.delay is called
self.delay_range = 1..6
end
def on_startup
scheduler.every '24h' do
# Tweet something every 24 hours
# See https://github.com/jmettraux/rufus-scheduler
# bot.tweet("hi")
# bot.pictweet("hi", "cuteselfie.jpg")
end
2014-11-12 09:31:39 -06:00
end
2013-11-27 06:55:14 -08:00
2014-12-05 22:57:32 +11:00
def on_message(dm)
2013-11-27 06:55:14 -08:00
# Reply to a DM
# bot.reply(dm, "secret secrets")
end
2014-12-05 22:57:32 +11:00
def on_follow(user)
2013-11-27 06:55:14 -08:00
# Follow a user back
# bot.follow(user[:screen_name])
end
2014-12-05 22:57:32 +11:00
def on_mention(tweet)
2013-11-27 06:55:14 -08:00
# Reply to a mention
2014-12-05 22:57:32 +11:00
# bot.reply(tweet, meta(tweet)[:reply_prefix] + "oh hullo")
2013-11-27 06:55:14 -08:00
end
2014-12-05 22:57:32 +11:00
def on_timeline(tweet)
2013-11-27 06:55:14 -08:00
# Reply to a tweet in the bot's timeline
2014-12-05 22:57:32 +11:00
# bot.reply(tweet, meta(tweet)[:reply_prefix] + "nice tweet")
2013-11-27 06:55:14 -08:00
end
2014-12-05 22:57:32 +11:00
end
2013-11-27 06:55:14 -08:00
2014-12-05 22:57:32 +11:00
# Make a MyBot and attach it to an account
MyBot.new("{{BOT_NAME}}") do |bot|
bot.access_token = "" # Token connecting the app to this account
bot.access_token_secret = "" # Secret connecting the app to this account
2013-11-27 06:55:14 -08:00
end
```
2014-12-05 14:03:11 +11:00
'ebooks start' will run all defined bots in their own threads. The easiest way to run bots in a semi-permanent fashion is with [Heroku](https://www.heroku.com); just make an app, push the bot repository to it, enable a worker process in the web interface and it ought to chug along merrily forever.
2013-11-27 06:55:14 -08:00
2014-12-05 14:03:11 +11:00
The underlying streaming and REST clients from the [twitter gem](https://github.com/sferik/twitter) can be accessed at `bot.stream` and `bot.twitter` respectively.
2013-11-27 06:55:14 -08:00
## Archiving accounts
twitter\_ebooks comes with a syncing tool to download and then incrementally update a local json archive of a user's tweets.
``` zsh
2014-04-28 10:56:48 -07:00
➜ ebooks archive 0xabad1dea corpus/0xabad1dea.json
2013-11-27 06:55:14 -08:00
Currently 20209 tweets for 0xabad1dea
Received 67 new tweets
```
The first time you'll run this, it'll ask for auth details to connect with. Due to API limitations, for users with high numbers of tweets it may not be possible to get their entire history in the initial download. However, so long as you run it frequently enough you can maintain a perfect copy indefinitely into the future.
## Text models
In order to use the included text modeling, you'll first need to preprocess your archive into a more efficient form:
``` zsh
2014-04-28 10:56:48 -07:00
➜ ebooks consume corpus/0xabad1dea.json
2013-11-27 06:55:14 -08:00
Reading json corpus from corpus/0xabad1dea.json
Removing commented lines and sorting mentions
Segmenting text into sentences
Tokenizing 7075 statements and 17947 mentions
Ranking keywords
Corpus consumed to model/0xabad1dea.model
```
2014-04-28 10:56:48 -07:00
Notably, this works with both json tweet archives and plaintext files (based on file extension), so you can make a model out of any kind of text.
2013-11-27 06:55:14 -08:00
Text files use newlines and full stops to seperate statements.
2013-11-27 06:55:14 -08:00
Once you have a model, the primary use is to produce statements and related responses to input, using a pseudo-Markov generator:
``` ruby
> model = Ebooks::Model.load("model/0xabad1dea.model")
> model.make_statement(140)
=> "My Terrible Netbook may be the kind of person who buys Starbucks, but this Rackspace vuln is pretty straight up a backdoor"
> model.make_response("The NSA is coming!", 130)
=> "Hey - someone who claims to be an NSA conspiracy"
```
The secondary function is the "interesting keywords" list. For example, I use this to determine whether a bot wants to fav/retweet/reply to something in its timeline:
``` ruby
2014-12-05 22:57:32 +11:00
top100 = model.keywords.take(100)
2013-11-27 06:55:14 -08:00
tokens = Ebooks::NLP.tokenize(tweet[:text])
if tokens.find { |t| top100.include?(t) }
2014-12-05 22:57:32 +11:00
bot.favorite(tweet[:id])
2013-11-27 06:55:14 -08:00
end
```
2014-12-05 22:57:32 +11:00
## Bot niceness
2013-11-27 06:55:14 -08:00
## Other notes
If you're using Heroku, which has no persistent filesystem, automating the process of archiving, consuming and updating can be tricky. My current solution is just a daily cron job which commits and pushes for me, which is pretty hacky.