Merge branch '3.0'

This commit is contained in:
Jaiden Mispy 2014-12-05 22:57:41 +11:00
commit 56aadea555
20 changed files with 738 additions and 15203 deletions

2
.gitignore vendored
View file

@ -1,3 +1,5 @@
.*.swp
Gemfile.lock
pkg
.yardoc
doc

View file

@ -4,8 +4,16 @@
[![Build Status](https://travis-ci.org/mispy/twitter_ebooks.svg)](https://travis-ci.org/mispy/twitter_ebooks)
[![Dependency Status](https://gemnasium.com/mispy/twitter_ebooks.svg)](https://gemnasium.com/mispy/twitter_ebooks)
A framework for building interactive twitterbots which respond to mentions/DMs. twitter_ebooks tries to be a good friendly bot citizen by avoiding infinite conversations and spamming people, so you only have to write the interesting parts.
Rewrite of my twitter\_ebooks code. While the original was solely a tweeting Markov generator, this framework helps you build any kind of interactive twitterbot which responds to mentions/DMs. See [ebooks\_example](https://github.com/mispy/ebooks_example) for an example of a full bot.
## New in 3.0
- Bots run in their own threads (no eventmachine), and startup is parallelized
- Bots start with `ebooks start`, and no longer die on unhandled exceptions
- `ebooks auth` command will create new access tokens, for running multiple bots
- `ebooks console` starts a ruby interpreter with bots loaded (see Ebooks::Bot.all)
- Replies are slightly rate-limited to prevent infinite bot convos
- Non-participating users in a mention chain will be dropped after a few tweets
## Installation
@ -21,53 +29,63 @@ Run `ebooks new <reponame>` to generate a new repository containing a sample bot
``` ruby
# This is an example bot definition with event handlers commented out
# You can define as many of these as you like; they will run simultaneously
# You can define and instantiate as many bots as you like
Ebooks::Bot.new("abby_ebooks") do |bot|
# Consumer details come from registering an app at https://dev.twitter.com/
# OAuth details can be fetched with https://github.com/marcel/twurl
bot.consumer_key = "" # Your app consumer key
bot.consumer_secret = "" # Your app consumer secret
bot.oauth_token = "" # Token connecting the app to this account
bot.oauth_token_secret = "" # Secret connecting the app to this account
class MyBot < Ebooks::Bot
# Configuration here applies to all MyBots
def configure
# Consumer details come from registering an app at https://dev.twitter.com/
# Once you have consumer details, use "ebooks auth" for new access tokens
self.consumer_key = '' # Your app consumer key
self.consumer_secret = '' # Your app consumer secret
bot.on_startup do
# Run some startup task
# puts "I'm ready!"
# Users to block instead of interacting with
self.blacklist = ['tnietzschequote']
# Range in seconds to randomize delay when bot.delay is called
self.delay_range = 1..6
end
bot.on_message do |dm|
def on_startup
scheduler.every '24h' do
# Tweet something every 24 hours
# See https://github.com/jmettraux/rufus-scheduler
# bot.tweet("hi")
# bot.pictweet("hi", "cuteselfie.jpg")
end
end
def on_message(dm)
# Reply to a DM
# bot.reply(dm, "secret secrets")
end
bot.on_follow do |user|
def on_follow(user)
# Follow a user back
# bot.follow(user[:screen_name])
end
bot.on_mention do |tweet, meta|
def on_mention(tweet)
# Reply to a mention
# bot.reply(tweet, meta[:reply_prefix] + "oh hullo")
# bot.reply(tweet, meta(tweet)[:reply_prefix] + "oh hullo")
end
bot.on_timeline do |tweet, meta|
def on_timeline(tweet)
# Reply to a tweet in the bot's timeline
# bot.reply(tweet, meta[:reply_prefix] + "nice tweet")
# bot.reply(tweet, meta(tweet)[:reply_prefix] + "nice tweet")
end
end
bot.scheduler.every '24h' do
# Tweet something every 24 hours
# See https://github.com/jmettraux/rufus-scheduler
# bot.tweet("hi")
# bot.pictweet("hi", "cuteselfie.jpg", ":possibly_sensitive => true")
end
# Make a MyBot and attach it to an account
MyBot.new("{{BOT_NAME}}") do |bot|
bot.access_token = "" # Token connecting the app to this account
bot.access_token_secret = "" # Secret connecting the app to this account
end
```
Bots defined like this can be spawned by executing `run.rb` in the same directory, and will operate together in a single eventmachine loop. The easiest way to run bots in a semi-permanent fashion is with [Heroku](https://www.heroku.com); just make an app, push the bot repository to it, enable a worker process in the web interface and it ought to chug along merrily forever.
'ebooks start' will run all defined bots in their own threads. The easiest way to run bots in a semi-permanent fashion is with [Heroku](https://www.heroku.com); just make an app, push the bot repository to it, enable a worker process in the web interface and it ought to chug along merrily forever.
The underlying [tweetstream](https://github.com/tweetstream/tweetstream) and [twitter gem](https://github.com/sferik/twitter) client objects can be accessed at `bot.stream` and `bot.twitter` respectively.
The underlying streaming and REST clients from the [twitter gem](https://github.com/sferik/twitter) can be accessed at `bot.stream` and `bot.twitter` respectively.
## Archiving accounts
@ -102,7 +120,6 @@ Text files use newlines and full stops to seperate statements.
Once you have a model, the primary use is to produce statements and related responses to input, using a pseudo-Markov generator:
``` ruby
> require 'twitter_ebooks'
> model = Ebooks::Model.load("model/0xabad1dea.model")
> model.make_statement(140)
=> "My Terrible Netbook may be the kind of person who buys Starbucks, but this Rackspace vuln is pretty straight up a backdoor"
@ -113,14 +130,18 @@ Once you have a model, the primary use is to produce statements and related resp
The secondary function is the "interesting keywords" list. For example, I use this to determine whether a bot wants to fav/retweet/reply to something in its timeline:
``` ruby
top100 = model.keywords.top(100)
top100 = model.keywords.take(100)
tokens = Ebooks::NLP.tokenize(tweet[:text])
if tokens.find { |t| top100.include?(t) }
bot.twitter.favorite(tweet[:id])
bot.favorite(tweet[:id])
end
```
## Bot niceness
## Other notes
If you're using Heroku, which has no persistent filesystem, automating the process of archiving, consuming and updating can be tricky. My current solution is just a daily cron job which commits and pushes for me, which is pretty hacky.

View file

@ -2,54 +2,85 @@
# encoding: utf-8
require 'twitter_ebooks'
require 'csv'
require 'ostruct'
$debug = true
module Ebooks::Util
def pretty_exception(e)
module Ebooks
end
end
module Ebooks::CLI
APP_PATH = Dir.pwd # XXX do some recursive thing instead
HELP = OpenStruct.new
def self.new(reponame)
usage = <<STR
Usage: ebooks new <reponame>
HELP.default = <<STR
Usage:
ebooks help <command>
Creates a new skeleton repository defining a template bot in
the current working directory specified by <reponame>.
ebooks new <reponame>
ebooks auth
ebooks consume <corpus_path> [corpus_path2] [...]
ebooks consume-all <corpus_path> [corpus_path2] [...]
ebooks gen <model_path> [input]
ebooks archive <username> [path]
ebooks tweet <model_path> <botname>
STR
def self.help(command=nil)
if command.nil?
log HELP.default
else
log HELP[command].gsub(/^ {4}/, '')
end
end
HELP.new = <<-STR
Usage: ebooks new <reponame>
Creates a new skeleton repository defining a template bot in
the current working directory specified by <reponame>.
STR
def self.new(reponame)
if reponame.nil?
log usage
exit
help :new
exit 1
end
path = "./#{reponame}"
if File.exists?(path)
log "#{path} already exists. Please remove if you want to recreate."
exit
exit 1
end
FileUtils.cp_r(SKELETON_PATH, path)
FileUtils.cp_r(Ebooks::SKELETON_PATH, path)
File.open(File.join(path, 'bots.rb'), 'w') do |f|
template = File.read(File.join(SKELETON_PATH, 'bots.rb'))
template = File.read(File.join(Ebooks::SKELETON_PATH, 'bots.rb'))
f.write(template.gsub("{{BOT_NAME}}", reponame))
end
File.open(File.join(path, 'Gemfile'), 'w') do |f|
template = File.read(File.join(Ebooks::SKELETON_PATH, 'Gemfile'))
f.write(template.gsub("{{RUBY_VERSION}}", RUBY_VERSION))
end
log "New twitter_ebooks app created at #{reponame}"
end
HELP.consume = <<-STR
Usage: ebooks consume <corpus_path> [corpus_path2] [...]
Processes some number of text files or json tweet corpuses
into usable models. These will be output at model/<name>.model
STR
def self.consume(pathes)
usage = <<STR
Usage: ebooks consume <corpus_path> [corpus_path2] [...]
Processes some number of text files or json tweet corpuses
into usable models. These will be output at model/<name>.model
STR
if pathes.empty?
log usage
exit
help :consume
exit 1
end
pathes.each do |path|
@ -57,50 +88,43 @@ STR
shortname = filename.split('.')[0..-2].join('.')
outpath = File.join(APP_PATH, 'model', "#{shortname}.model")
Model.consume(path).save(outpath)
Ebooks::Model.consume(path).save(outpath)
log "Corpus consumed to #{outpath}"
end
end
HELP.consume_all = <<-STR
Usage: ebooks consume-all <name> <corpus_path> [corpus_path2] [...]
Processes some number of text files or json tweet corpuses
into one usable model. It will be output at model/<name>.model
STR
def self.consume_all(name, paths)
usage = <<STR
Usage: ebooks consume-all <name> <corpus_path> [corpus_path2] [...]
Processes some number of text files or json tweet corpuses
into one usable model. It will be output at model/<name>.model
STR
if paths.empty?
log usage
exit
help :consume_all
exit 1
end
outpath = File.join(APP_PATH, 'model', "#{name}.model")
#pathes.each do |path|
# filename = File.basename(path)
# shortname = filename.split('.')[0..-2].join('.')
#
# outpath = File.join(APP_PATH, 'model', "#{shortname}.model")
# Model.consume(path).save(outpath)
# log "Corpus consumed to #{outpath}"
#end
Model.consume_all(paths).save(outpath)
Ebooks::Model.consume_all(paths).save(outpath)
log "Corpuses consumed to #{outpath}"
end
def self.gen(model_path, input)
usage = <<STR
Usage: ebooks gen <model_path> [input]
HELP.gen = <<-STR
Usage: ebooks gen <model_path> [input]
Make a test tweet from the processed model at <model_path>.
Will respond to input if provided.
STR
Make a test tweet from the processed model at <model_path>.
Will respond to input if provided.
STR
def self.gen(model_path, input)
if model_path.nil?
log usage
exit
help :gen
exit 1
end
model = Model.load(model_path)
model = Ebooks::Model.load(model_path)
if input && !input.empty?
puts "@cmd " + model.make_response(input, 135)
else
@ -108,81 +132,186 @@ STR
end
end
def self.score(model_path, input)
usage = <<STR
Usage: ebooks score <model_path> <input>
HELP.archive = <<-STR
Usage: ebooks archive <username> [outpath]
Scores "interest" in some text input according to how
well unique keywords match the model.
STR
if model_path.nil? || input.nil?
log usage
exit
Downloads a json corpus of the <username>'s tweets.
Output defaults to corpus/<username>.json
Due to API limitations, this can only receive up to ~3000 tweets
into the past.
STR
def self.archive(username, outpath=nil)
if username.nil?
help :archive
exit 1
end
model = Model.load(model_path)
model.score_interest(input)
Ebooks::Archive.new(username, outpath).sync
end
def self.archive(username, outpath)
usage = <<STR
Usage: ebooks archive <username> <outpath>
HELP.tweet = <<-STR
Usage: ebooks tweet <model_path> <botname>
Downloads a json corpus of the <username>'s tweets to <outpath>.
Due to API limitations, this can only receive up to ~3000 tweets
into the past.
STR
if username.nil? || outpath.nil?
log usage
exit
end
Archive.new(username, outpath).sync
end
Sends a public tweet from the specified bot using text
from the processed model at <model_path>.
STR
def self.tweet(modelpath, botname)
usage = <<STR
Usage: ebooks tweet <model_path> <botname>
Sends a public tweet from the specified bot using text
from the processed model at <model_path>.
STR
if modelpath.nil? || botname.nil?
log usage
exit
help :tweet
exit 1
end
load File.join(APP_PATH, 'bots.rb')
model = Model.load(modelpath)
model = Ebooks::Model.load(modelpath)
statement = model.make_statement
log "@#{botname}: #{statement}"
bot = Bot.get(botname)
bot = Ebooks::Bot.get(botname)
bot.configure
bot.tweet(statement)
end
def self.c
HELP.auth = <<-STR
Usage: ebooks auth
Authenticates your Twitter app for any account. By default, will
use the consumer key and secret from the first defined bot. You
can specify another by setting the CONSUMER_KEY and CONSUMER_SECRET
environment variables.
STR
def self.auth
consumer_key, consumer_secret = find_consumer
require 'oauth'
consumer = OAuth::Consumer.new(
consumer_key,
consumer_secret,
site: 'https://twitter.com/',
scheme: :header
)
request_token = consumer.get_request_token
auth_url = request_token.authorize_url()
pin = nil
loop do
log auth_url
log "Go to the above url and follow the prompts, then enter the PIN code here."
print "> "
pin = STDIN.gets.chomp
break unless pin.empty?
end
access_token = request_token.get_access_token(oauth_verifier: pin)
log "Account authorized successfully. Make sure to put these in your bots.rb!\n" +
" access token: #{access_token.token}\n" +
" access token secret: #{access_token.secret}"
end
HELP.console = <<-STR
Usage: ebooks c[onsole]
Starts an interactive ruby session with your bots loaded
and configured.
STR
def self.console
load_bots
require 'pry'; Ebooks.module_exec { pry }
end
HELP.start = <<-STR
Usage: ebooks s[tart] [botname]
Starts running bots. If botname is provided, only runs that bot.
STR
def self.start(botname=nil)
load_bots
if botname.nil?
bots = Ebooks::Bot.all
else
bots = Ebooks::Bot.all.select { |bot| bot.username == botname }
if bots.empty?
log "Couldn't find a defined bot for @#{botname}!"
exit 1
end
end
threads = []
bots.each do |bot|
threads << Thread.new { bot.prepare }
end
threads.each(&:join)
threads = []
bots.each do |bot|
threads << Thread.new do
loop do
begin
bot.start
rescue Exception => e
bot.log e.inspect
puts e.backtrace.map { |s| "\t"+s }.join("\n")
end
bot.log "Sleeping before reconnect"
sleep 5
end
end
end
threads.each(&:join)
end
# Non-command methods
def self.find_consumer
if ENV['CONSUMER_KEY'] && ENV['CONSUMER_SECRET']
log "Using consumer details from environment variables:\n" +
" consumer key: #{ENV['CONSUMER_KEY']}\n" +
" consumer secret: #{ENV['CONSUMER_SECRET']}"
return [ENV['CONSUMER_KEY'], ENV['CONSUMER_SECRET']]
end
load_bots
consumer_key = nil
consumer_secret = nil
Ebooks::Bot.all.each do |bot|
if bot.consumer_key && bot.consumer_secret
consumer_key = bot.consumer_key
consumer_secret = bot.consumer_secret
log "Using consumer details from @#{bot.username}:\n" +
" consumer key: #{bot.consumer_key}\n" +
" consumer secret: #{bot.consumer_secret}\n"
return consumer_key, consumer_secret
end
end
if consumer_key.nil? || consumer_secret.nil?
log "Couldn't find any consumer details to auth an account with.\n" +
"Please either configure a bot with consumer_key and consumer_secret\n" +
"or provide the CONSUMER_KEY and CONSUMER_SECRET environment variables."
exit 1
end
end
def self.load_bots
load 'bots.rb'
require 'pry'; pry
if Ebooks::Bot.all.empty?
puts "Couldn't find any bots! Please make sure bots.rb instantiates at least one bot."
end
end
def self.command(args)
usage = <<STR
Usage:
ebooks new <reponame>
ebooks consume <corpus_path> [corpus_path2] [...]
ebooks consume-all <corpus_path> [corpus_path2] [...]
ebooks gen <model_path> [input]
ebooks score <model_path> <input>
ebooks archive <@user> <outpath>
ebooks tweet <model_path> <botname>
STR
if args.length == 0
log usage
exit
help
exit 1
end
case args[0]
@ -190,16 +319,21 @@ STR
when "consume" then consume(args[1..-1])
when "consume-all" then consume_all(args[1], args[2..-1])
when "gen" then gen(args[1], args[2..-1].join(' '))
when "score" then score(args[1], args[2..-1].join(' '))
when "archive" then archive(args[1], args[2])
when "tweet" then tweet(args[1], args[2])
when "jsonify" then jsonify(args[1..-1])
when "c" then c
when "auth" then auth
when "console" then console
when "c" then console
when "start" then start(args[1])
when "s" then start(args[1])
when "help" then help(args[1])
else
log usage
log "No such command '#{args[0]}'"
help
exit 1
end
end
end
Ebooks.command(ARGV)
Ebooks::CLI.command(ARGV)

View file

@ -11,11 +11,11 @@ module Ebooks
SKELETON_PATH = File.join(GEM_PATH, 'skeleton')
TEST_PATH = File.join(GEM_PATH, 'test')
TEST_CORPUS_PATH = File.join(TEST_PATH, 'corpus/0xabad1dea.tweets')
INTERIM = :interim
end
require 'twitter_ebooks/nlp'
require 'twitter_ebooks/archive'
require 'twitter_ebooks/markov'
require 'twitter_ebooks/suffix'
require 'twitter_ebooks/model'
require 'twitter_ebooks/bot'

View file

@ -39,9 +39,14 @@ module Ebooks
end
end
def initialize(username, path, client=nil)
def initialize(username, path=nil, client=nil)
@username = username
@path = path || "#{username}.json"
@path = path || "corpus/#{username}.json"
if File.directory?(@path)
@path = File.join(@path, "#{username}.json")
end
@client = client || make_client
if File.exists?(@path)

409
lib/twitter_ebooks/bot.rb Executable file → Normal file
View file

@ -6,143 +6,91 @@ module Ebooks
class ConfigurationError < Exception
end
# We track how many unprompted interactions the bot has had with
# each user and start dropping them from mentions after two in a row
class UserInfo
attr_reader :username
attr_accessor :pesters_left
# Represents a single reply tree of tweets
class Conversation
attr_reader :last_update
def initialize(username)
@username = username
@pesters_left = 1
end
def can_pester?
@pesters_left > 0
end
end
# Represents a current "interaction state" with another user
class Interaction
attr_reader :userinfo, :received, :last_update
def initialize(userinfo)
@userinfo = userinfo
@received = []
# @param bot [Ebooks::Bot]
def initialize(bot)
@bot = bot
@tweets = []
@last_update = Time.now
end
def receive(tweet)
@received << tweet
# @param tweet [Twitter::Tweet] tweet to add
def add(tweet)
@tweets << tweet
@last_update = Time.now
@userinfo.pesters_left += 2
end
# Make an informed guess as to whether this user is a bot
# based on its username and reply speed
def is_bot?
if @received.length > 2
if (@received[-1].created_at - @received[-3].created_at) < 30
# Make an informed guess as to whether a user is a bot based
# on their behavior in this conversation
def is_bot?(username)
usertweets = @tweets.select { |t| t.user.screen_name == username }
if usertweets.length > 2
if (usertweets[-1].created_at - usertweets[-3].created_at) < 30
return true
end
end
@userinfo.username.include?("ebooks")
username.include?("ebooks")
end
def continue?
if is_bot?
true if @received.length < 2
else
true
end
# Figure out whether to keep this user in the reply prefix
# We want to avoid spamming non-participating users
def can_include?(username)
@tweets.length <= 4 ||
!@tweets[-4..-1].select { |t| t.user.screen_name == username }.empty?
end
end
class Bot
attr_accessor :consumer_key, :consumer_secret,
:access_token, :access_token_secret
# Meta information about a tweet that we calculate for ourselves
class TweetMeta
# @return [Array<String>] usernames mentioned in tweet
attr_accessor :mentions
# @return [String] text of tweets with mentions removed
attr_accessor :mentionless
# @return [Array<String>] usernames to include in a reply
attr_accessor :reply_mentions
# @return [String] mentions to start reply with
attr_accessor :reply_prefix
# @return [Integer] available chars for reply
attr_accessor :limit
attr_reader :twitter, :stream, :thread
# Configuration
attr_accessor :username, :delay_range, :blacklist
@@all = [] # List of all defined bots
def self.all; @@all; end
def self.get(name)
all.find { |bot| bot.username == name }
end
def log(*args)
STDOUT.print "@#{@username}: " + args.map(&:to_s).join(' ') + "\n"
STDOUT.flush
end
def initialize(*args, &b)
@username ||= nil
@blacklist ||= []
@delay_range ||= 0
@users ||= {}
@interactions ||= {}
configure(*args, &b)
# Tweet ids we've already observed, to avoid duplication
@seen_tweets ||= {}
end
def userinfo(username)
@users[username] ||= UserInfo.new(username)
end
def interaction(username)
if @interactions[username] &&
Time.now - @interactions[username].last_update < 600
@interactions[username]
else
@interactions[username] = Interaction.new(userinfo(username))
end
end
def twitter
@twitter ||= Twitter::REST::Client.new do |config|
config.consumer_key = @consumer_key
config.consumer_secret = @consumer_secret
config.access_token = @access_token
config.access_token_secret = @access_token_secret
end
end
def stream
@stream ||= Twitter::Streaming::Client.new do |config|
config.consumer_key = @consumer_key
config.consumer_secret = @consumer_secret
config.access_token = @access_token
config.access_token_secret = @access_token_secret
end
end
# Calculate some meta information about a tweet relevant for replying
def calc_meta(ev)
meta = {}
meta[:mentions] = ev.attrs[:entities][:user_mentions].map { |x| x[:screen_name] }
# @return [Ebooks::Bot] associated bot
attr_accessor :bot
# @return [Twitter::Tweet] associated tweet
attr_accessor :tweet
# Check whether this tweet mentions our bot
# @return [Boolean]
def mentions_bot?
# To check if this is someone talking to us, ensure:
# - The tweet mentions list contains our username
# - The tweet is not being retweeted by somebody else
# - Or soft-retweeted by somebody else
meta[:mentions_bot] = meta[:mentions].map(&:downcase).include?(@username.downcase) && !ev.retweeted_status? && !ev.text.start_with?('RT ')
@mentions.map(&:downcase).include?(@bot.username.downcase) && !@tweet.retweeted_status? && !@tweet.text.start_with?('RT ')
end
# @param bot [Ebooks::Bot]
# @param ev [Twitter::Tweet]
def initialize(bot, ev)
@bot = bot
@tweet = ev
@mentions = ev.attrs[:entities][:user_mentions].map { |x| x[:screen_name] }
# Process mentions to figure out who to reply to
reply_mentions = meta[:mentions].reject { |m| m.downcase == @username.downcase }
reply_mentions = reply_mentions.select { |username| userinfo(username).can_pester? }
meta[:reply_mentions] = [ev.user.screen_name] + reply_mentions
# i.e. not self and nobody who has seen too many secondary mentions
reply_mentions = @mentions.reject do |m|
username = m.downcase
username == @bot.username || !@bot.conversation(ev).can_include?(username)
end
@reply_mentions = ([ev.user.screen_name] + reply_mentions).uniq
meta[:reply_prefix] = meta[:reply_mentions].uniq.map { |m| '@'+m }.join(' ') + ' '
meta[:limit] = 140 - meta[:reply_prefix].length
@reply_prefix = @reply_mentions.map { |m| '@'+m }.join(' ') + ' '
@limit = 140 - @reply_prefix.length
mless = ev.text
begin
@ -155,12 +103,116 @@ module Ebooks
p ev.text
raise
end
meta[:mentionless] = mless
@mentionless = mless
end
end
meta
class Bot
# @return [String] OAuth consumer key for a Twitter app
attr_accessor :consumer_key
# @return [String] OAuth consumer secret for a Twitter app
attr_accessor :consumer_secret
# @return [String] OAuth access token from `ebooks auth`
attr_accessor :access_token
# @return [String] OAuth access secret from `ebooks auth`
attr_accessor :access_token_secret
# @return [String] Twitter username of bot
attr_accessor :username
# @return [Array<String>] list of usernames to block on contact
attr_accessor :blacklist
# @return [Hash{String => Ebooks::Conversation}] maps tweet ids to their conversation contexts
attr_accessor :conversations
# @return [Range, Integer] range of seconds to delay in delay method
attr_accessor :delay_range
# @return [Array] list of all defined bots
def self.all; @@all ||= []; end
# Fetches a bot by username
# @param username [String]
# @return [Ebooks::Bot]
def self.get(username)
all.find { |bot| bot.username == username }
end
# Logs info to stdout in the context of this bot
def log(*args)
STDOUT.print "@#{@username}: " + args.map(&:to_s).join(' ') + "\n"
STDOUT.flush
end
# Initializes and configures bot
# @param args Arguments passed to configure method
# @param b Block to call with new bot
def initialize(username, &b)
@blacklist ||= []
@conversations ||= {}
# Tweet ids we've already observed, to avoid duplication
@seen_tweets ||= {}
@username = username
configure
b.call(self) unless b.nil?
Bot.all << self
end
# Find or create the conversation context for this tweet
# @param tweet [Twitter::Tweet]
# @return [Ebooks::Conversation]
def conversation(tweet)
conv = if tweet.in_reply_to_status_id?
@conversations[tweet.in_reply_to_status_id]
end
if conv.nil?
conv = @conversations[tweet.id] || Conversation.new(self)
end
if tweet.in_reply_to_status_id?
@conversations[tweet.in_reply_to_status_id] = conv
end
@conversations[tweet.id] = conv
# Expire any old conversations to prevent memory growth
@conversations.each do |k,v|
if v != conv && Time.now - v.last_update > 3600
@conversations.delete(k)
end
end
conv
end
# @return [Twitter::REST::Client] underlying REST client from twitter gem
def twitter
@twitter ||= Twitter::REST::Client.new do |config|
config.consumer_key = @consumer_key
config.consumer_secret = @consumer_secret
config.access_token = @access_token
config.access_token_secret = @access_token_secret
end
end
# @return [Twitter::Streaming::Client] underlying streaming client from twitter gem
def stream
@stream ||= Twitter::Streaming::Client.new do |config|
config.consumer_key = @consumer_key
config.consumer_secret = @consumer_secret
config.access_token = @access_token
config.access_token_secret = @access_token_secret
end
end
# Calculate some meta information about a tweet relevant for replying
# @param ev [Twitter::Tweet]
# @return [Ebooks::TweetMeta]
def meta(ev)
TweetMeta.new(self, ev)
end
# Receive an event from the twitter stream
# @param ev [Object] Twitter streaming event
def receive_event(ev)
if ev.is_a? Array # Initial array sent on first connection
log "Online!"
@ -181,7 +233,7 @@ module Ebooks
return unless ev.text # If it's not a text-containing tweet, ignore it
return if ev.user.screen_name == @username # Ignore our own tweets
meta = calc_meta(ev)
meta = meta(ev)
if blacklisted?(ev.user.screen_name)
log "Blocking blacklisted user @#{ev.user.screen_name}"
@ -190,17 +242,18 @@ module Ebooks
# Avoid responding to duplicate tweets
if @seen_tweets[ev.id]
log "Not firing event for duplicate tweet #{ev.id}"
return
else
@seen_tweets[ev.id] = true
end
if meta[:mentions_bot]
if meta.mentions_bot?
log "Mention from @#{ev.user.screen_name}: #{ev.text}"
interaction(ev.user.screen_name).receive(ev)
fire(:mention, ev, meta)
conversation(ev).add(ev)
fire(:mention, ev)
else
fire(:timeline, ev, meta)
fire(:timeline, ev)
end
elsif ev.is_a?(Twitter::Streaming::DeletedTweet) ||
@ -211,7 +264,31 @@ module Ebooks
end
end
def start_stream
# Configures client and fires startup event
def prepare
# Sanity check
if @username.nil?
raise ConfigurationError, "bot username cannot be nil"
end
if @consumer_key.nil? || @consumer_key.empty? ||
@consumer_secret.nil? || @consumer_key.empty?
log "Missing consumer_key or consumer_secret. These details can be acquired by registering a Twitter app at https://apps.twitter.com/"
exit 1
end
if @access_token.nil? || @access_token.empty? ||
@access_token_secret.nil? || @access_token_secret.empty?
log "Missing access_token or access_token_secret. Please run `ebooks auth`."
exit 1
end
twitter
fire(:startup)
end
# Start running user event stream
def start
log "starting tweet stream"
stream.user do |ev|
@ -219,22 +296,9 @@ module Ebooks
end
end
def prepare
# Sanity check
if @username.nil?
raise ConfigurationError, "bot.username cannot be nil"
end
twitter
fire(:startup)
end
# Connects to tweetstream and opens event handlers for this bot
def start
start_stream
end
# Fire an event
# @param event [Symbol] event to fire
# @param args arguments for event handler
def fire(event, *args)
handler = "on_#{event}".to_sym
if respond_to? handler
@ -242,11 +306,17 @@ module Ebooks
end
end
def delay(&b)
time = @delay.to_a.sample unless @delay.is_a? Integer
# Delay an action for a variable period of time
# @param range [Range, Integer] range of seconds to choose for delay
def delay(range=@delay_range, &b)
time = range.to_a.sample unless range.is_a? Integer
sleep time
b.call
end
# Check if a username is blacklisted
# @param username [String]
# @return [Boolean]
def blacklisted?(username)
if @blacklist.include?(username)
true
@ -256,46 +326,37 @@ module Ebooks
end
# Reply to a tweet or a DM.
# @param ev [Twitter::Tweet, Twitter::DirectMessage]
# @param text [String] contents of reply excluding reply_prefix
# @param opts [Hash] additional params to pass to twitter gem
def reply(ev, text, opts={})
opts = opts.clone
if ev.is_a? Twitter::DirectMessage
return if blacklisted?(ev.sender.screen_name)
log "Sending DM to @#{ev.sender.screen_name}: #{text}"
twitter.create_direct_message(ev.sender.screen_name, text, opts)
elsif ev.is_a? Twitter::Tweet
meta = calc_meta(ev)
meta = meta(ev)
if !interaction(ev.user.screen_name).continue?
if conversation(ev).is_bot?(ev.user.screen_name)
log "Not replying to suspected bot @#{ev.user.screen_name}"
return
return false
end
if !meta[:mentions_bot]
if !userinfo(ev.user.screen_name).can_pester?
log "Not replying: leaving @#{ev.user.screen_name} alone"
return
else
userinfo(ev.user.screen_name).pesters_left -= 1
end
end
log "Replying to @#{ev.user.screen_name} with: #{meta[:reply_prefix] + text}"
twitter.update(meta[:reply_prefix] + text, in_reply_to_status_id: ev.id)
log "Replying to @#{ev.user.screen_name} with: #{meta.reply_prefix + text}"
tweet = twitter.update(meta.reply_prefix + text, in_reply_to_status_id: ev.id)
conversation(tweet).add(tweet)
tweet
else
raise Exception("Don't know how to reply to a #{ev.class}")
end
end
# Favorite a tweet
# @param tweet [Twitter::Tweet]
def favorite(tweet)
return if blacklisted?(tweet.user.screen_name)
log "Favoriting @#{tweet.user.screen_name}: #{tweet.text}"
meta = calc_meta(tweet)
if !meta[:mentions_bot] && !userinfo(ev.user.screen_name).can_pester?
log "Not favoriting: leaving @#{ev.user.screen_name} alone"
end
begin
twitter.favorite(tweet.id)
rescue Twitter::Error::Forbidden
@ -303,8 +364,9 @@ module Ebooks
end
end
# Retweet a tweet
# @param tweet [Twitter::Tweet]
def retweet(tweet)
return if blacklisted?(tweet.user.screen_name)
log "Retweeting @#{tweet.user.screen_name}: #{tweet.text}"
begin
@ -314,21 +376,36 @@ module Ebooks
end
end
def follow(*args)
log "Following #{args}"
twitter.follow(*args)
# Follow a user
# @param user [String] username or user id
def follow(user, *args)
log "Following #{user}"
twitter.follow(user, *args)
end
def tweet(*args)
log "Tweeting #{args.inspect}"
twitter.update(*args)
# Unfollow a user
# @param user [String] username or user id
def unfollow(user, *args)
log "Unfollowing #{user}"
twiter.unfollow(user, *args)
end
# Tweet something
# @param text [String]
def tweet(text, *args)
log "Tweeting '#{text}'"
twitter.update(text, *args)
end
# Get a scheduler for this bot
# @return [Rufus::Scheduler]
def scheduler
@scheduler ||= Rufus::Scheduler.new
end
# could easily just be *args however the separation keeps it clean.
# Tweet some text with an image
# @param txt [String]
# @param pic [String] filename
def pictweet(txt, pic, *args)
log "Tweeting #{txt.inspect} - #{pic} #{args}"
twitter.update_with_media(txt, File.new(pic), *args)

View file

@ -1,82 +0,0 @@
module Ebooks
# Special INTERIM token represents sentence boundaries
# This is so we can include start and end of statements in model
# Due to the way the sentence tokenizer works, can correspond
# to multiple actual parts of text (such as ^, $, \n and .?!)
INTERIM = :interim
# This is an ngram-based Markov model optimized to build from a
# tokenized sentence list without requiring too much transformation
class MarkovModel
def self.build(sentences)
MarkovModel.new.consume(sentences)
end
def consume(sentences)
# These models are of the form ngram => [[sentence_pos, token_pos] || INTERIM, ...]
# We map by both bigrams and unigrams so we can fall back to the latter in
# cases where an input bigram is unavailable, such as starting a sentence
@sentences = sentences
@unigrams = {}
@bigrams = {}
sentences.each_with_index do |tokens, i|
last_token = INTERIM
tokens.each_with_index do |token, j|
@unigrams[last_token] ||= []
@unigrams[last_token] << [i, j]
@bigrams[last_token] ||= {}
@bigrams[last_token][token] ||= []
if j == tokens.length-1 # Mark sentence endings
@unigrams[token] ||= []
@unigrams[token] << INTERIM
@bigrams[last_token][token] << INTERIM
else
@bigrams[last_token][token] << [i, j+1]
end
last_token = token
end
end
self
end
def find_token(index)
if index == INTERIM
INTERIM
else
@sentences[index[0]][index[1]]
end
end
def chain(tokens)
if tokens.length == 1
matches = @unigrams[tokens[-1]]
else
matches = @bigrams[tokens[-2]][tokens[-1]]
matches = @unigrams[tokens[-1]] if matches.length < 2
end
if matches.empty?
# This should never happen unless a strange token is
# supplied from outside the dataset
raise ArgumentError, "Unable to continue chain for: #{tokens.inspect}"
end
next_token = find_token(matches.sample)
if next_token == INTERIM # We chose to end the sentence
return tokens
else
return chain(tokens + [next_token])
end
end
def generate
NLP.reconstruct(chain([INTERIM]))
end
end
end

View file

@ -8,16 +8,41 @@ require 'csv'
module Ebooks
class Model
attr_accessor :hash, :tokens, :sentences, :mentions, :keywords
# @return [Array<String>]
# An array of unique tokens. This is the main source of actual strings
# in the model. Manipulation of a token is done using its index
# in this array, which we call a "tiki"
attr_accessor :tokens
def self.consume(txtpath)
Model.new.consume(txtpath)
# @return [Array<Array<Integer>>]
# Sentences represented by arrays of tikis
attr_accessor :sentences
# @return [Array<Array<Integer>>]
# Sentences derived from Twitter mentions
attr_accessor :mentions
# @return [Array<String>]
# The top 200 most important keywords, in descending order
attr_accessor :keywords
# Generate a new model from a corpus file
# @param path [String]
# @return [Ebooks::Model]
def self.consume(path)
Model.new.consume(path)
end
# Generate a new model from multiple corpus files
# @param paths [Array<String>]
# @return [Ebooks::Model]
def self.consume_all(paths)
Model.new.consume_all(paths)
end
# Load a saved model
# @param path [String]
# @return [Ebooks::Model]
def self.load(path)
model = Model.new
model.instance_eval do
@ -30,6 +55,8 @@ module Ebooks
model
end
# Save model to a file
# @param path [String]
def save(path)
File.open(path, 'wb') do |f|
f.write(Marshal.dump({
@ -43,19 +70,22 @@ module Ebooks
end
def initialize
# This is the only source of actual strings in the model. It is
# an array of unique tokens. Manipulation of a token is mostly done
# using its index in this array, which we call a "tiki"
@tokens = []
# Reverse lookup tiki by token, for faster generation
@tikis = {}
end
# Reverse lookup a token index from a token
# @param token [String]
# @return [Integer]
def tikify(token)
@tikis[token] or (@tokens << token and @tikis[token] = @tokens.length-1)
end
# Convert a body of text into arrays of tikis
# @param text [String]
# @return [Array<Array<Integer>>]
def mass_tikify(text)
sentences = NLP.sentences(text)
@ -69,9 +99,10 @@ module Ebooks
end
end
# Consume a corpus into this model
# @param path [String]
def consume(path)
content = File.read(path, :encoding => 'utf-8')
@hash = Digest::MD5.hexdigest(content)
if path.split('.')[-1] == "json"
log "Reading json corpus from #{path}"
@ -94,6 +125,8 @@ module Ebooks
consume_lines(lines)
end
# Consume a sequence of lines
# @param lines [Array<String>]
def consume_lines(lines)
log "Removing commented lines and sorting mentions"
@ -126,11 +159,12 @@ module Ebooks
self
end
# Consume multiple corpuses into this model
# @param paths [Array<String>]
def consume_all(paths)
lines = []
paths.each do |path|
content = File.read(path, :encoding => 'utf-8')
@hash = Digest::MD5.hexdigest(content)
if path.split('.')[-1] == "json"
log "Reading json corpus from #{path}"
@ -156,25 +190,26 @@ module Ebooks
consume_lines(lines)
end
def fix(tweet)
# This seems to require an external api call
#begin
# fixer = NLP.gingerice.parse(tweet)
# log fixer if fixer['corrections']
# tweet = fixer['result']
#rescue Exception => e
# log e.message
# log e.backtrace
#end
NLP.htmlentities.decode tweet
# Correct encoding issues in generated text
# @param text [String]
# @return [String]
def fix(text)
NLP.htmlentities.decode text
end
# Check if an array of tikis comprises a valid tweet
# @param tikis [Array<Integer>]
# @param limit Integer how many chars we have left
def valid_tweet?(tikis, limit)
tweet = NLP.reconstruct(tikis, @tokens)
tweet.length <= limit && !NLP.unmatched_enclosers?(tweet)
end
# Generate some text
# @param limit [Integer] available characters
# @param generator [SuffixGenerator, nil]
# @param retry_limit [Integer] how many times to retry on duplicates
# @return [String]
def make_statement(limit=140, generator=nil, retry_limit=10)
responding = !generator.nil?
generator ||= SuffixGenerator.build(@sentences)
@ -209,12 +244,17 @@ module Ebooks
end
# Test if a sentence has been copied verbatim from original
def verbatim?(tokens)
@sentences.include?(tokens) || @mentions.include?(tokens)
# @param tikis [Array<Integer>]
# @return [Boolean]
def verbatim?(tikis)
@sentences.include?(tikis) || @mentions.include?(tikis)
end
# Finds all relevant tokenized sentences to given input by
# Finds relevant and slightly relevant tokenized sentences to input
# comparing non-stopword token overlaps
# @param sentences [Array<Array<Integer>>]
# @param input [String]
# @return [Array<Array<Array<Integer>>, Array<Array<Integer>>>]
def find_relevant(sentences, input)
relevant = []
slightly_relevant = []
@ -235,6 +275,10 @@ module Ebooks
# Generates a response by looking for related sentences
# in the corpus and building a smaller generator from these
# @param input [String]
# @param limit [Integer] characters available for response
# @param sentences [Array<Array<Integer>>]
# @return [String]
def make_response(input, limit=140, sentences=@mentions)
# Prefer mentions
relevant, slightly_relevant = find_relevant(sentences, input)

View file

@ -12,31 +12,35 @@ module Ebooks
# Some of this stuff is pretty heavy and we don't necessarily need
# to be using it all of the time
# Lazily loads an array of stopwords
# Stopwords are common English words that should often be ignored
# @return [Array<String>]
def self.stopwords
@stopwords ||= File.read(File.join(DATA_PATH, 'stopwords.txt')).split
end
# Lazily loads an array of known English nouns
# @return [Array<String>]
def self.nouns
@nouns ||= File.read(File.join(DATA_PATH, 'nouns.txt')).split
end
# Lazily loads an array of known English adjectives
# @return [Array<String>]
def self.adjectives
@adjectives ||= File.read(File.join(DATA_PATH, 'adjectives.txt')).split
end
# POS tagger
# Lazily load part-of-speech tagging library
# This can determine whether a word is being used as a noun/adjective/verb
# @return [EngTagger]
def self.tagger
require 'engtagger'
@tagger ||= EngTagger.new
end
# Gingerice text correction service
def self.gingerice
require 'gingerice'
Gingerice::Parser.new # No caching for this one
end
# For decoding html entities
# Lazily load HTML entity decoder
# @return [HTMLEntities]
def self.htmlentities
require 'htmlentities'
@htmlentities ||= HTMLEntities.new
@ -44,7 +48,9 @@ module Ebooks
### Utility functions
# We don't really want to deal with all this weird unicode punctuation
# Normalize some strange unicode punctuation variants
# @param text [String]
# @return [String]
def self.normalize(text)
htmlentities.decode text.gsub('“', '"').gsub('”', '"').gsub('', "'").gsub('…', '...')
end
@ -53,6 +59,8 @@ module Ebooks
# We use ad hoc approach because fancy libraries do not deal
# especially well with tweet formatting, and we can fake solving
# the quote problem during generation
# @param text [String]
# @return [Array<String>]
def self.sentences(text)
text.split(/\n+|(?<=[.?!])\s+/)
end
@ -60,15 +68,23 @@ module Ebooks
# Split a sentence into word-level tokens
# As above, this is ad hoc because tokenization libraries
# do not behave well wrt. things like emoticons and timestamps
# @param sentence [String]
# @return [Array<String>]
def self.tokenize(sentence)
regex = /\s+|(?<=[#{PUNCTUATION}]\s)(?=[a-zA-Z])|(?<=[a-zA-Z])(?=[#{PUNCTUATION}]+\s)/
sentence.split(regex)
end
# Get the 'stem' form of a word e.g. 'cats' -> 'cat'
# @param word [String]
# @return [String]
def self.stem(word)
Stemmer::stem_word(word.downcase)
end
# Use highscore gem to find interesting keywords in a corpus
# @param text [String]
# @return [Highscore::Keywords]
def self.keywords(text)
# Preprocess to remove stopwords (highscore's blacklist is v. slow)
text = NLP.tokenize(text).reject { |t| stopword?(t) }.join(' ')
@ -90,7 +106,10 @@ module Ebooks
text.keywords
end
# Takes a list of tokens and builds a nice-looking sentence
# Builds a proper sentence from a list of tikis
# @param tikis [Array<Integer>]
# @param tokens [Array<String>]
# @return [String]
def self.reconstruct(tikis, tokens)
text = ""
last_token = nil
@ -105,6 +124,9 @@ module Ebooks
end
# Determine if we need to insert a space between two tokens
# @param token1 [String]
# @param token2 [String]
# @return [Boolean]
def self.space_between?(token1, token2)
p1 = self.punctuation?(token1)
p2 = self.punctuation?(token2)
@ -119,10 +141,16 @@ module Ebooks
end
end
# Is this token comprised of punctuation?
# @param token [String]
# @return [Boolean]
def self.punctuation?(token)
(token.chars.to_set - PUNCTUATION.chars.to_set).empty?
end
# Is this token a stopword?
# @param token [String]
# @return [Boolean]
def self.stopword?(token)
@stopword_set ||= stopwords.map(&:downcase).to_set
@stopword_set.include?(token.downcase)
@ -130,7 +158,9 @@ module Ebooks
# Determine if a sample of text contains unmatched brackets or quotes
# This is one of the more frequent and noticeable failure modes for
# the markov generator; we can just tell it to retry
# the generator; we can just tell it to retry
# @param text [String]
# @return [Boolean]
def self.unmatched_enclosers?(text)
enclosers = ['**', '""', '()', '[]', '``', "''"]
enclosers.each do |pair|
@ -153,10 +183,13 @@ module Ebooks
end
# Determine if a2 is a subsequence of a1
# @param a1 [Array]
# @param a2 [Array]
# @return [Boolean]
def self.subseq?(a1, a2)
a1.each_index.find do |i|
!a1.each_index.find do |i|
a1[i...i+a2.length] == a2
end
end.nil?
end
end
end

View file

@ -1,11 +1,14 @@
# encoding: utf-8
module Ebooks
# This generator uses data identical to the markov model, but
# This generator uses data identical to a markov model, but
# instead of making a chain by looking up bigrams it uses the
# positions to randomly replace suffixes in one sentence with
# matching suffixes in another
class SuffixGenerator
# Build a generator from a corpus of tikified sentences
# @param sentences [Array<Array<Integer>>]
# @return [SuffixGenerator]
def self.build(sentences)
SuffixGenerator.new(sentences)
end
@ -39,6 +42,11 @@ module Ebooks
self
end
# Generate a recombined sequence of tikis
# @param passes [Integer] number of times to recombine
# @param n [Symbol] :unigrams or :bigrams (affects how conservative the model is)
# @return [Array<Integer>]
def generate(passes=5, n=:unigrams)
index = rand(@sentences.length)
tikis = @sentences[index]

View file

@ -1,3 +1,3 @@
module Ebooks
VERSION = "2.3.2"
VERSION = "3.0.0"
end

View file

@ -1,4 +1,4 @@
source 'http://rubygems.org'
ruby '1.9.3'
ruby '{{RUBY_VERSION}}'
gem 'twitter_ebooks'

View file

@ -1 +1 @@
worker: ruby run.rb start
worker: ebooks start

59
skeleton/bots.rb Executable file → Normal file
View file

@ -1,42 +1,55 @@
#!/usr/bin/env ruby
require 'twitter_ebooks'
# This is an example bot definition with event handlers commented out
# You can define as many of these as you like; they will run simultaneously
# You can define and instantiate as many bots as you like
Ebooks::Bot.new("{{BOT_NAME}}") do |bot|
# Consumer details come from registering an app at https://dev.twitter.com/
# OAuth details can be fetched with https://github.com/marcel/twurl
bot.consumer_key = "" # Your app consumer key
bot.consumer_secret = "" # Your app consumer secret
bot.oauth_token = "" # Token connecting the app to this account
bot.oauth_token_secret = "" # Secret connecting the app to this account
class MyBot < Ebooks::Bot
# Configuration here applies to all MyBots
def configure
# Consumer details come from registering an app at https://dev.twitter.com/
# Once you have consumer details, use "ebooks auth" for new access tokens
self.consumer_key = '' # Your app consumer key
self.consumer_secret = '' # Your app consumer secret
bot.on_message do |dm|
# Users to block instead of interacting with
self.blacklist = ['tnietzschequote']
# Range in seconds to randomize delay when bot.delay is called
self.delay_range = 1..6
end
def on_startup
scheduler.every '24h' do
# Tweet something every 24 hours
# See https://github.com/jmettraux/rufus-scheduler
# bot.tweet("hi")
# bot.pictweet("hi", "cuteselfie.jpg")
end
end
def on_message(dm)
# Reply to a DM
# bot.reply(dm, "secret secrets")
end
bot.on_follow do |user|
def on_follow(user)
# Follow a user back
# bot.follow(user[:screen_name])
end
bot.on_mention do |tweet, meta|
def on_mention(tweet)
# Reply to a mention
# bot.reply(tweet, meta[:reply_prefix] + "oh hullo")
# bot.reply(tweet, meta(tweet)[:reply_prefix] + "oh hullo")
end
bot.on_timeline do |tweet, meta|
def on_timeline(tweet)
# Reply to a tweet in the bot's timeline
# bot.reply(tweet, meta[:reply_prefix] + "nice tweet")
end
bot.scheduler.every '24h' do
# Tweet something every 24 hours
# See https://github.com/jmettraux/rufus-scheduler
# bot.tweet("hi")
# bot.pictweet("hi", "cuteselfie.jpg", ":possibly_sensitive => true")
# bot.reply(tweet, meta(tweet)[:reply_prefix] + "nice tweet")
end
end
# Make a MyBot and attach it to an account
MyBot.new("{{BOT_NAME}}") do |bot|
bot.access_token = "" # Token connecting the app to this account
bot.access_token_secret = "" # Secret connecting the app to this account
end

View file

@ -1,9 +0,0 @@
#!/usr/bin/env ruby
require_relative 'bots'
EM.run do
Ebooks::Bot.all.each do |bot|
bot.start
end
end

View file

@ -3,13 +3,10 @@ require 'memory_profiler'
require 'tempfile'
require 'timecop'
def Process.rss; `ps -o rss= -p #{Process.pid}`.chomp.to_i; end
class TestBot < Ebooks::Bot
attr_accessor :twitter
def configure
self.username = "test_ebooks"
end
def on_direct_message(dm)
@ -17,7 +14,7 @@ class TestBot < Ebooks::Bot
end
def on_mention(tweet, meta)
reply tweet, "echo: #{meta[:mentionless]}"
reply tweet, "echo: #{meta.mentionless}"
end
def on_timeline(tweet, meta)
@ -43,10 +40,11 @@ module Ebooks::Test
# Creates a mock tweet
# @param username User sending the tweet
# @param text Tweet content
def mock_tweet(username, text)
def mock_tweet(username, text, extra={})
mentions = text.split.find_all { |x| x.start_with?('@') }
Twitter::Tweet.new(
tweet = Twitter::Tweet.new({
id: twitter_id,
in_reply_to_status_id: 'mock-link',
user: { id: twitter_id, screen_name: username },
text: text,
created_at: Time.now.to_s,
@ -56,29 +54,36 @@ module Ebooks::Test
indices: [text.index(m), text.index(m)+m.length] }
}
}
)
}.merge!(extra))
tweet
end
def twitter_spy(bot)
twitter = spy("twitter")
allow(twitter).to receive(:update).and_return(mock_tweet(bot.username, "test tweet"))
twitter
end
def simulate(bot, &b)
bot.twitter = spy("twitter")
bot.twitter = twitter_spy(bot)
b.call
end
def expect_direct_message(bot, content)
expect(bot.twitter).to have_received(:create_direct_message).with(anything(), content, {})
bot.twitter = spy("twitter")
bot.twitter = twitter_spy(bot)
end
def expect_tweet(bot, content)
expect(bot.twitter).to have_received(:update).with(content, anything())
bot.twitter = spy("twitter")
bot.twitter = twitter_spy(bot)
end
end
describe Ebooks::Bot do
include Ebooks::Test
let(:bot) { TestBot.new }
let(:bot) { TestBot.new('test_ebooks') }
before { Timecop.freeze }
after { Timecop.return }
@ -104,6 +109,20 @@ describe Ebooks::Bot do
end
end
it "links tweets to conversations correctly" do
tweet1 = mock_tweet("m1sp", "tweet 1", id: 1, in_reply_to_status_id: nil)
tweet2 = mock_tweet("m1sp", "tweet 2", id: 2, in_reply_to_status_id: 1)
tweet3 = mock_tweet("m1sp", "tweet 3", id: 3, in_reply_to_status_id: nil)
bot.conversation(tweet1).add(tweet1)
expect(bot.conversation(tweet2)).to eq(bot.conversation(tweet1))
bot.conversation(tweet2).add(tweet2)
expect(bot.conversation(tweet3)).to_not eq(bot.conversation(tweet2))
end
it "stops mentioning people after a certain limit" do
simulate(bot) do
bot.receive_event(mock_tweet("spammer", "@test_ebooks @m1sp 1"))

File diff suppressed because it is too large Load diff

View file

@ -1,18 +0,0 @@
#!/usr/bin/env ruby
# encoding: utf-8
require 'twitter_ebooks'
require 'minitest/autorun'
require 'benchmark'
module Ebooks
class TestKeywords < Minitest::Test
corpus = NLP.normalize(File.read(ARGV[0]))
puts "Finding and ranking keywords"
puts Benchmark.measure {
NLP.keywords(corpus).top(50).each do |keyword|
puts "#{keyword.text} #{keyword.weight}"
end
}
end
end

View file

@ -1,18 +0,0 @@
#!/usr/bin/env ruby
# encoding: utf-8
require 'twitter_ebooks'
require 'minitest/autorun'
module Ebooks
class TestTokenize < Minitest::Test
corpus = NLP.normalize(File.read(TEST_CORPUS_PATH))
sents = NLP.sentences(corpus).sample(10)
NLP.sentences(corpus).sample(10).each do |sent|
p sent
p NLP.tokenize(sent)
puts
end
end
end

View file

@ -18,8 +18,9 @@ Gem::Specification.new do |gem|
gem.add_development_dependency 'rspec'
gem.add_development_dependency 'rspec-mocks'
gem.add_development_dependency 'memory_profiler'
gem.add_development_dependency 'pry-byebug'
gem.add_development_dependency 'timecop'
gem.add_development_dependency 'pry-byebug'
gem.add_development_dependency 'yard'
gem.add_runtime_dependency 'twitter', '~> 5.0'
gem.add_runtime_dependency 'simple_oauth'
@ -30,4 +31,5 @@ Gem::Specification.new do |gem|
gem.add_runtime_dependency 'engtagger'
gem.add_runtime_dependency 'fast-stemmer'
gem.add_runtime_dependency 'highscore'
gem.add_runtime_dependency 'pry'
end