Merge branch '3.0'

2014-12-05 22:57:41 +11:00 · 2014-12-05 22:57:41 +11:00 · 56aadea555
commit 56aadea555
parent 1a40ef85f9 822f5e4c6c
20 changed files with 738 additions and 15203 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,3 +1,5 @@
 .*.swp
 Gemfile.lock
 pkg
 .yardoc
 doc
--- a/README.md
+++ b/README.md
@ -4,8 +4,16 @@
 [![Build Status](https://travis-ci.org/mispy/twitter_ebooks.svg)](https://travis-ci.org/mispy/twitter_ebooks)
 [![Dependency Status](https://gemnasium.com/mispy/twitter_ebooks.svg)](https://gemnasium.com/mispy/twitter_ebooks)
 A framework for building interactive twitterbots which respond to mentions/DMs. twitter_ebooks tries to be a good friendly bot citizen by avoiding infinite conversations and spamming people, so you only have to write the interesting parts.
-Rewrite of my twitter\_ebooks code. While the original was solely a tweeting Markov generator, this framework helps you build any kind of interactive twitterbot which responds to mentions/DMs. See [ebooks\_example](https://github.com/mispy/ebooks_example) for an example of a full bot.
+## New in 3.0
 - Bots run in their own threads (no eventmachine), and startup is parallelized
 - Bots start with `ebooks start`, and no longer die on unhandled exceptions
 - `ebooks auth` command will create new access tokens, for running multiple bots
 - `ebooks console` starts a ruby interpreter with bots loaded (see Ebooks::Bot.all)
 - Replies are slightly rate-limited to prevent infinite bot convos
 - Non-participating users in a mention chain will be dropped after a few tweets
 ## Installation
@ -21,53 +29,63 @@ Run `ebooks new <reponame>` to generate a new repository containing a sample bot
 ``` ruby
 # This is an example bot definition with event handlers commented out
-# You can define as many of these as you like; they will run simultaneously
+# You can define and instantiate as many bots as you like
-Ebooks::Bot.new("abby_ebooks") do |bot|
+class MyBot < Ebooks::Bot
-  # Consumer details come from registering an app at https://dev.twitter.com/
+  # Configuration here applies to all MyBots
-  # OAuth details can be fetched with https://github.com/marcel/twurl
+  def configure
-  bot.consumer_key = "" # Your app consumer key
+    # Consumer details come from registering an app at https://dev.twitter.com/
-  bot.consumer_secret = "" # Your app consumer secret
+    # Once you have consumer details, use "ebooks auth" for new access tokens
-  bot.oauth_token = "" # Token connecting the app to this account
+    self.consumer_key = '' # Your app consumer key
-  bot.oauth_token_secret = "" # Secret connecting the app to this account
+    self.consumer_secret = '' # Your app consumer secret
-  bot.on_startup do
+    # Users to block instead of interacting with
-    # Run some startup task
+    self.blacklist = ['tnietzschequote']
-    # puts "I'm ready!"
+
    # Range in seconds to randomize delay when bot.delay is called
    self.delay_range = 1..6
  end
-  bot.on_message do |dm|
+  def on_startup
    scheduler.every '24h' do
      # Tweet something every 24 hours
      # See https://github.com/jmettraux/rufus-scheduler
      # bot.tweet("hi")
      # bot.pictweet("hi", "cuteselfie.jpg")
    end
  end
  def on_message(dm)
    # Reply to a DM
    # bot.reply(dm, "secret secrets")
  end
-  bot.on_follow do |user|
+  def on_follow(user)
    # Follow a user back
    # bot.follow(user[:screen_name])
  end
-  bot.on_mention do |tweet, meta|
+  def on_mention(tweet)
    # Reply to a mention
-    # bot.reply(tweet, meta[:reply_prefix] + "oh hullo")
+    # bot.reply(tweet, meta(tweet)[:reply_prefix] + "oh hullo")
  end
-  bot.on_timeline do |tweet, meta|
+  def on_timeline(tweet)
    # Reply to a tweet in the bot's timeline
-    # bot.reply(tweet, meta[:reply_prefix] + "nice tweet")
+    # bot.reply(tweet, meta(tweet)[:reply_prefix] + "nice tweet")
  end
 end
-  bot.scheduler.every '24h' do
+# Make a MyBot and attach it to an account
-    # Tweet something every 24 hours
+MyBot.new("{{BOT_NAME}}") do |bot|
-    # See https://github.com/jmettraux/rufus-scheduler
+  bot.access_token = "" # Token connecting the app to this account
-    # bot.tweet("hi")
+  bot.access_token_secret = "" # Secret connecting the app to this account
 	# bot.pictweet("hi", "cuteselfie.jpg", ":possibly_sensitive => true")
  end
 end
 ```
-Bots defined like this can be spawned by executing `run.rb` in the same directory, and will operate together in a single eventmachine loop. The easiest way to run bots in a semi-permanent fashion is with [Heroku](https://www.heroku.com); just make an app, push the bot repository to it, enable a worker process in the web interface and it ought to chug along merrily forever.
+'ebooks start' will run all defined bots in their own threads. The easiest way to run bots in a semi-permanent fashion is with [Heroku](https://www.heroku.com); just make an app, push the bot repository to it, enable a worker process in the web interface and it ought to chug along merrily forever.
-The underlying [tweetstream](https://github.com/tweetstream/tweetstream) and [twitter gem](https://github.com/sferik/twitter) client objects can be accessed at `bot.stream` and `bot.twitter` respectively.
+The underlying streaming and REST clients from the [twitter gem](https://github.com/sferik/twitter) can be accessed at `bot.stream` and `bot.twitter` respectively.
 ## Archiving accounts
@ -102,7 +120,6 @@ Text files use newlines and full stops to seperate statements.
 Once you have a model, the primary use is to produce statements and related responses to input, using a pseudo-Markov generator:
 ``` ruby
 > require 'twitter_ebooks'
 > model = Ebooks::Model.load("model/0xabad1dea.model")
 > model.make_statement(140)
 => "My Terrible Netbook may be the kind of person who buys Starbucks, but this Rackspace vuln is pretty straight up a backdoor"
@ -113,14 +130,18 @@ Once you have a model, the primary use is to produce statements and related resp
 The secondary function is the "interesting keywords" list. For example, I use this to determine whether a bot wants to fav/retweet/reply to something in its timeline:
 ``` ruby
-top100 = model.keywords.top(100)
+top100 = model.keywords.take(100)
 tokens = Ebooks::NLP.tokenize(tweet[:text])
 if tokens.find { |t| top100.include?(t) }
-  bot.twitter.favorite(tweet[:id])
+  bot.favorite(tweet[:id])
 end
 ```
 ## Bot niceness
 ## Other notes
 If you're using Heroku, which has no persistent filesystem, automating the process of archiving, consuming and updating can be tricky. My current solution is just a daily cron job which commits and pushes for me, which is pretty hacky.
--- a/bin/ebooks
+++ b/bin/ebooks
@ -2,54 +2,85 @@
 # encoding: utf-8
 require 'twitter_ebooks'
-require 'csv'
+require 'ostruct'
-$debug = true
+module Ebooks::Util
  def pretty_exception(e)
-module Ebooks
+  end
 end
 module Ebooks::CLI
  APP_PATH = Dir.pwd # XXX do some recursive thing instead
  HELP = OpenStruct.new
-  def self.new(reponame)
+  HELP.default = <<STR
-    usage = <<STR
+Usage:
-Usage: ebooks new <reponame>
+     ebooks help <command>
-Creates a new skeleton repository defining a template bot in
+     ebooks new <reponame>
-the current working directory specified by <reponame>.
+     ebooks auth
     ebooks consume <corpus_path> [corpus_path2] [...]
     ebooks consume-all <corpus_path> [corpus_path2] [...]
     ebooks gen <model_path> [input]
     ebooks archive <username> [path]
     ebooks tweet <model_path> <botname>
 STR
  def self.help(command=nil)
    if command.nil?
      log HELP.default
    else
      log HELP[command].gsub(/^ {4}/, '')
    end
  end
  HELP.new = <<-STR
    Usage: ebooks new <reponame>
    Creates a new skeleton repository defining a template bot in
    the current working directory specified by <reponame>.
  STR
  def self.new(reponame)
    if reponame.nil?
-      log usage
+      help :new
-      exit
+      exit 1
    end
    path = "./#{reponame}"
    if File.exists?(path)
      log "#{path} already exists. Please remove if you want to recreate."
-      exit
+      exit 1
    end
-    FileUtils.cp_r(SKELETON_PATH, path)
+    FileUtils.cp_r(Ebooks::SKELETON_PATH, path)
    File.open(File.join(path, 'bots.rb'), 'w') do |f|
-      template = File.read(File.join(SKELETON_PATH, 'bots.rb'))
+      template = File.read(File.join(Ebooks::SKELETON_PATH, 'bots.rb'))
      f.write(template.gsub("{{BOT_NAME}}", reponame))
    end
    File.open(File.join(path, 'Gemfile'), 'w') do |f|
      template = File.read(File.join(Ebooks::SKELETON_PATH, 'Gemfile'))
      f.write(template.gsub("{{RUBY_VERSION}}", RUBY_VERSION))
    end
    log "New twitter_ebooks app created at #{reponame}"
  end
  HELP.consume = <<-STR
    Usage: ebooks consume <corpus_path> [corpus_path2] [...]
    Processes some number of text files or json tweet corpuses
    into usable models. These will be output at model/<name>.model
  STR
  def self.consume(pathes)
    usage = <<STR
 Usage: ebooks consume <corpus_path> [corpus_path2] [...]
 Processes some number of text files or json tweet corpuses
 into usable models. These will be output at model/<name>.model
 STR
    if pathes.empty?
-      log usage
+      help :consume
-      exit
+      exit 1
    end
    pathes.each do |path|
@ -57,50 +88,43 @@ STR
      shortname = filename.split('.')[0..-2].join('.')
      outpath = File.join(APP_PATH, 'model', "#{shortname}.model")
-      Model.consume(path).save(outpath)
+      Ebooks::Model.consume(path).save(outpath)
      log "Corpus consumed to #{outpath}"
    end
  end
  HELP.consume_all = <<-STR
    Usage: ebooks consume-all <name> <corpus_path> [corpus_path2] [...]
    Processes some number of text files or json tweet corpuses
    into one usable model. It will be output at model/<name>.model
  STR
  def self.consume_all(name, paths)
    usage = <<STR
 Usage: ebooks consume-all <name> <corpus_path> [corpus_path2] [...]
 Processes some number of text files or json tweet corpuses
 into one usable model. It will be output at model/<name>.model
 STR
    if paths.empty?
-      log usage
+      help :consume_all
-      exit
+      exit 1
    end
    outpath = File.join(APP_PATH, 'model', "#{name}.model")
-    #pathes.each do |path|
+    Ebooks::Model.consume_all(paths).save(outpath)
    #  filename = File.basename(path)
    #  shortname = filename.split('.')[0..-2].join('.')
    #
    #  outpath = File.join(APP_PATH, 'model', "#{shortname}.model")
    #  Model.consume(path).save(outpath)
    #  log "Corpus consumed to #{outpath}"
    #end
    Model.consume_all(paths).save(outpath)
    log "Corpuses consumed to #{outpath}"
  end
-  def self.gen(model_path, input)
+  HELP.gen = <<-STR
-    usage = <<STR
+    Usage: ebooks gen <model_path> [input]
 Usage: ebooks gen <model_path> [input]
-Make a test tweet from the processed model at <model_path>.
+    Make a test tweet from the processed model at <model_path>.
-Will respond to input if provided.
+    Will respond to input if provided.
-STR
+  STR
  def self.gen(model_path, input)
    if model_path.nil?
-      log usage
+      help :gen
-      exit
+      exit 1
    end
-    model = Model.load(model_path)
+    model = Ebooks::Model.load(model_path)
    if input && !input.empty?
      puts "@cmd " + model.make_response(input, 135)
    else
@ -108,81 +132,186 @@ STR
    end
  end
-  def self.score(model_path, input)
+  HELP.archive = <<-STR
-    usage = <<STR
+    Usage: ebooks archive <username> [outpath]
 Usage: ebooks score <model_path> <input>
-Scores "interest" in some text input according to how
+    Downloads a json corpus of the <username>'s tweets.
-well unique keywords match the model.
+    Output defaults to corpus/<username>.json
-STR
+    Due to API limitations, this can only receive up to ~3000 tweets
-    if model_path.nil? || input.nil?
+    into the past.
-      log usage
+  STR
-      exit
+
  def self.archive(username, outpath=nil)
    if username.nil?
      help :archive
      exit 1
    end
-    model = Model.load(model_path)
+    Ebooks::Archive.new(username, outpath).sync
    model.score_interest(input)
  end
-  def self.archive(username, outpath)
+  HELP.tweet = <<-STR
-    usage = <<STR
+    Usage: ebooks tweet <model_path> <botname>
 Usage: ebooks archive <username> <outpath>
-Downloads a json corpus of the <username>'s tweets to <outpath>.
+    Sends a public tweet from the specified bot using text
-Due to API limitations, this can only receive up to ~3000 tweets
+    from the processed model at <model_path>.
-into the past.
+  STR
 STR
    if username.nil? || outpath.nil?
      log usage
      exit
    end
    Archive.new(username, outpath).sync
  end
  def self.tweet(modelpath, botname)
    usage = <<STR
 Usage: ebooks tweet <model_path> <botname>
 Sends a public tweet from the specified bot using text
 from the processed model at <model_path>.
 STR
    if modelpath.nil? || botname.nil?
-      log usage
+      help :tweet
-      exit
+      exit 1
    end
    load File.join(APP_PATH, 'bots.rb')
-    model = Model.load(modelpath)
+    model = Ebooks::Model.load(modelpath)
    statement = model.make_statement
-    log "@#{botname}: #{statement}"
+    bot = Ebooks::Bot.get(botname)
    bot = Bot.get(botname)
    bot.configure
    bot.tweet(statement)
  end
-  def self.c
+  HELP.auth = <<-STR
    Usage: ebooks auth
    Authenticates your Twitter app for any account. By default, will
    use the consumer key and secret from the first defined bot. You
    can specify another by setting the CONSUMER_KEY and CONSUMER_SECRET
    environment variables.
  STR
  def self.auth
    consumer_key, consumer_secret = find_consumer
    require 'oauth'
    consumer = OAuth::Consumer.new(
      consumer_key,
      consumer_secret,
      site: 'https://twitter.com/',
      scheme: :header
    )
    request_token = consumer.get_request_token
    auth_url = request_token.authorize_url()
    pin = nil
    loop do
      log auth_url
      log "Go to the above url and follow the prompts, then enter the PIN code here."
      print "> "
      pin = STDIN.gets.chomp
      break unless pin.empty?
    end
    access_token = request_token.get_access_token(oauth_verifier: pin)
    log "Account authorized successfully. Make sure to put these in your bots.rb!\n" +
         "  access token: #{access_token.token}\n" +
         "  access token secret: #{access_token.secret}"
  end
  HELP.console = <<-STR
    Usage: ebooks c[onsole]
    Starts an interactive ruby session with your bots loaded
    and configured.
  STR
  def self.console
    load_bots
    require 'pry'; Ebooks.module_exec { pry }
  end
  HELP.start = <<-STR
    Usage: ebooks s[tart] [botname]
    Starts running bots. If botname is provided, only runs that bot.
  STR
  def self.start(botname=nil)
    load_bots
    if botname.nil?
      bots = Ebooks::Bot.all
    else
      bots = Ebooks::Bot.all.select { |bot| bot.username == botname }
      if bots.empty?
        log "Couldn't find a defined bot for @#{botname}!"
        exit 1
      end
    end
    threads = []
    bots.each do |bot|
      threads << Thread.new { bot.prepare }
    end
    threads.each(&:join)
    threads = []
    bots.each do |bot|
      threads << Thread.new do
        loop do
          begin
            bot.start
          rescue Exception => e
            bot.log e.inspect
            puts e.backtrace.map { |s| "\t"+s }.join("\n")
          end
          bot.log "Sleeping before reconnect"
          sleep 5
        end
      end
    end
    threads.each(&:join)
  end
  # Non-command methods
  def self.find_consumer
    if ENV['CONSUMER_KEY'] && ENV['CONSUMER_SECRET']
      log "Using consumer details from environment variables:\n" +
          "  consumer key: #{ENV['CONSUMER_KEY']}\n" +
          "  consumer secret: #{ENV['CONSUMER_SECRET']}"
      return [ENV['CONSUMER_KEY'], ENV['CONSUMER_SECRET']]
    end
    load_bots
    consumer_key = nil
    consumer_secret = nil
    Ebooks::Bot.all.each do |bot|
      if bot.consumer_key && bot.consumer_secret
        consumer_key = bot.consumer_key
        consumer_secret = bot.consumer_secret
        log "Using consumer details from @#{bot.username}:\n" +
            "  consumer key: #{bot.consumer_key}\n" +
            "  consumer secret: #{bot.consumer_secret}\n"
        return consumer_key, consumer_secret
      end
    end
    if consumer_key.nil? || consumer_secret.nil?
      log "Couldn't find any consumer details to auth an account with.\n" +
          "Please either configure a bot with consumer_key and consumer_secret\n" +
          "or provide the CONSUMER_KEY and CONSUMER_SECRET environment variables."
      exit 1
    end
  end
  def self.load_bots
    load 'bots.rb'
-    require 'pry'; pry
+
    if Ebooks::Bot.all.empty?
      puts "Couldn't find any bots! Please make sure bots.rb instantiates at least one bot."
    end
  end
  def self.command(args)
    usage = <<STR
 Usage:
     ebooks new <reponame>
     ebooks consume <corpus_path> [corpus_path2] [...]
     ebooks consume-all <corpus_path> [corpus_path2] [...]
     ebooks gen <model_path> [input]
     ebooks score <model_path> <input>
     ebooks archive <@user> <outpath>
     ebooks tweet <model_path> <botname>
 STR
    if args.length == 0
-      log usage
+      help
-      exit
+      exit 1
    end
    case args[0]
@ -190,16 +319,21 @@ STR
    when "consume" then consume(args[1..-1])
    when "consume-all" then consume_all(args[1], args[2..-1])
    when "gen" then gen(args[1], args[2..-1].join(' '))
    when "score" then score(args[1], args[2..-1].join(' '))
    when "archive" then archive(args[1], args[2])
    when "tweet" then tweet(args[1], args[2])
    when "jsonify" then jsonify(args[1..-1])
-    when "c" then c
+    when "auth" then auth
    when "console" then console
    when "c" then console
    when "start" then start(args[1])
    when "s" then start(args[1])
    when "help" then help(args[1])
    else
-      log usage
+      log "No such command '#{args[0]}'"
      help
      exit 1
    end
  end
 end
-Ebooks.command(ARGV)
+Ebooks::CLI.command(ARGV)
--- a/lib/twitter_ebooks.rb
+++ b/lib/twitter_ebooks.rb
@ -11,11 +11,11 @@ module Ebooks
  SKELETON_PATH = File.join(GEM_PATH, 'skeleton')
  TEST_PATH = File.join(GEM_PATH, 'test')
  TEST_CORPUS_PATH = File.join(TEST_PATH, 'corpus/0xabad1dea.tweets')
  INTERIM = :interim
 end
 require 'twitter_ebooks/nlp'
 require 'twitter_ebooks/archive'
 require 'twitter_ebooks/markov'
 require 'twitter_ebooks/suffix'
 require 'twitter_ebooks/model'
 require 'twitter_ebooks/bot'
--- a/lib/twitter_ebooks/archive.rb
+++ b/lib/twitter_ebooks/archive.rb
@ -39,9 +39,14 @@ module Ebooks
      end
    end
-    def initialize(username, path, client=nil)
+    def initialize(username, path=nil, client=nil)
      @username = username
-      @path = path || "#{username}.json"
+      @path = path || "corpus/#{username}.json"
      if File.directory?(@path)
        @path = File.join(@path, "#{username}.json")
      end
      @client = client || make_client
      if File.exists?(@path)
--- a/lib/twitter_ebooks/bot.rb
+++ b/lib/twitter_ebooks/bot.rb
@ -6,143 +6,91 @@ module Ebooks
  class ConfigurationError < Exception
  end
-  # We track how many unprompted interactions the bot has had with
+  # Represents a single reply tree of tweets
-  # each user and start dropping them from mentions after two in a row
+  class Conversation
-  class UserInfo
+    attr_reader :last_update
    attr_reader :username
    attr_accessor :pesters_left
-    def initialize(username)
+    # @param bot [Ebooks::Bot]
-      @username = username
+    def initialize(bot)
-      @pesters_left = 1
+      @bot = bot
-    end
+      @tweets = []
    def can_pester?
      @pesters_left > 0
    end
  end
  # Represents a current "interaction state" with another user
  class Interaction
    attr_reader :userinfo, :received, :last_update
    def initialize(userinfo)
      @userinfo = userinfo
      @received = []
      @last_update = Time.now
    end
-    def receive(tweet)
+    # @param tweet [Twitter::Tweet] tweet to add
-      @received << tweet
+    def add(tweet)
      @tweets << tweet
      @last_update = Time.now
      @userinfo.pesters_left += 2
    end
-    # Make an informed guess as to whether this user is a bot
+    # Make an informed guess as to whether a user is a bot based
-    # based on its username and reply speed
+    # on their behavior in this conversation
-    def is_bot?
+    def is_bot?(username)
-      if @received.length > 2
+      usertweets = @tweets.select { |t| t.user.screen_name == username }
-        if (@received[-1].created_at - @received[-3].created_at) < 30
+
      if usertweets.length > 2
        if (usertweets[-1].created_at - usertweets[-3].created_at) < 30
          return true
        end
      end
-      @userinfo.username.include?("ebooks")
+      username.include?("ebooks")
    end
-    def continue?
+    # Figure out whether to keep this user in the reply prefix
-      if is_bot?
+    # We want to avoid spamming non-participating users
-        true if @received.length < 2
+    def can_include?(username)
-      else
+      @tweets.length <= 4 ||
-        true
+        !@tweets[-4..-1].select { |t| t.user.screen_name == username }.empty?
      end
    end
  end
-  class Bot
+  # Meta information about a tweet that we calculate for ourselves
-    attr_accessor :consumer_key, :consumer_secret,
+  class TweetMeta
-                  :access_token, :access_token_secret
+    # @return [Array<String>] usernames mentioned in tweet
    attr_accessor :mentions
    # @return [String] text of tweets with mentions removed
    attr_accessor :mentionless
    # @return [Array<String>] usernames to include in a reply
    attr_accessor :reply_mentions
    # @return [String] mentions to start reply with
    attr_accessor :reply_prefix
    # @return [Integer] available chars for reply
    attr_accessor :limit
-    attr_reader :twitter, :stream, :thread
+    # @return [Ebooks::Bot] associated bot
-
+    attr_accessor :bot
-    # Configuration
+    # @return [Twitter::Tweet] associated tweet
-    attr_accessor :username, :delay_range, :blacklist
+    attr_accessor :tweet
    @@all = [] # List of all defined bots
    def self.all; @@all; end
    def self.get(name)
      all.find { |bot| bot.username == name }
    end
    def log(*args)
      STDOUT.print "@#{@username}: " + args.map(&:to_s).join(' ') + "\n"
      STDOUT.flush
    end
    def initialize(*args, &b)
      @username ||= nil
      @blacklist ||= []
      @delay_range ||= 0
      @users ||= {}
      @interactions ||= {}
      configure(*args, &b)
      # Tweet ids we've already observed, to avoid duplication
      @seen_tweets ||= {}
    end
    def userinfo(username)
      @users[username] ||= UserInfo.new(username)
    end
    def interaction(username)
      if @interactions[username] &&
         Time.now - @interactions[username].last_update < 600
        @interactions[username]
      else
        @interactions[username] = Interaction.new(userinfo(username))
      end
    end
    def twitter
      @twitter ||= Twitter::REST::Client.new do |config|
        config.consumer_key = @consumer_key
        config.consumer_secret = @consumer_secret
        config.access_token = @access_token
        config.access_token_secret = @access_token_secret
      end
    end
    def stream
      @stream ||= Twitter::Streaming::Client.new do |config|
        config.consumer_key = @consumer_key
        config.consumer_secret = @consumer_secret
        config.access_token = @access_token
        config.access_token_secret = @access_token_secret
      end
    end
    # Calculate some meta information about a tweet relevant for replying
    def calc_meta(ev)
      meta = {}
      meta[:mentions] = ev.attrs[:entities][:user_mentions].map { |x| x[:screen_name] }
    # Check whether this tweet mentions our bot
    # @return [Boolean]
    def mentions_bot?
      # To check if this is someone talking to us, ensure:
      # - The tweet mentions list contains our username
      # - The tweet is not being retweeted by somebody else
      # - Or soft-retweeted by somebody else
-      meta[:mentions_bot] = meta[:mentions].map(&:downcase).include?(@username.downcase) && !ev.retweeted_status? && !ev.text.start_with?('RT ')
+      @mentions.map(&:downcase).include?(@bot.username.downcase) && !@tweet.retweeted_status? && !@tweet.text.start_with?('RT ')
    end
    # @param bot [Ebooks::Bot]
    # @param ev [Twitter::Tweet]
    def initialize(bot, ev)
      @bot = bot
      @tweet = ev
      @mentions = ev.attrs[:entities][:user_mentions].map { |x| x[:screen_name] }
      # Process mentions to figure out who to reply to
-      reply_mentions = meta[:mentions].reject { |m| m.downcase == @username.downcase }
+      # i.e. not self and nobody who has seen too many secondary mentions
-      reply_mentions = reply_mentions.select { |username| userinfo(username).can_pester? }
+      reply_mentions = @mentions.reject do |m|
-      meta[:reply_mentions] = [ev.user.screen_name] + reply_mentions
+        username = m.downcase
        username == @bot.username || !@bot.conversation(ev).can_include?(username)
      end
      @reply_mentions = ([ev.user.screen_name] + reply_mentions).uniq
-      meta[:reply_prefix] = meta[:reply_mentions].uniq.map { |m| '@'+m }.join(' ') + ' '
+      @reply_prefix = @reply_mentions.map { |m| '@'+m }.join(' ') + ' '
-
+      @limit = 140 - @reply_prefix.length
      meta[:limit] = 140 - meta[:reply_prefix].length
      mless = ev.text
      begin
@ -155,12 +103,116 @@ module Ebooks
        p ev.text
        raise
      end
-      meta[:mentionless] = mless
+      @mentionless = mless
    end
  end
-      meta
+  class Bot
    # @return [String] OAuth consumer key for a Twitter app
    attr_accessor :consumer_key
    # @return [String] OAuth consumer secret for a Twitter app
    attr_accessor :consumer_secret
    # @return [String] OAuth access token from `ebooks auth`
    attr_accessor :access_token
    # @return [String] OAuth access secret from `ebooks auth`
    attr_accessor :access_token_secret
    # @return [String] Twitter username of bot
    attr_accessor :username
    # @return [Array<String>] list of usernames to block on contact
    attr_accessor :blacklist
    # @return [Hash{String => Ebooks::Conversation}] maps tweet ids to their conversation contexts
    attr_accessor :conversations
    # @return [Range, Integer] range of seconds to delay in delay method
    attr_accessor :delay_range
    # @return [Array] list of all defined bots
    def self.all; @@all ||= []; end
    # Fetches a bot by username
    # @param username [String]
    # @return [Ebooks::Bot]
    def self.get(username)
      all.find { |bot| bot.username == username }
    end
    # Logs info to stdout in the context of this bot
    def log(*args)
      STDOUT.print "@#{@username}: " + args.map(&:to_s).join(' ') + "\n"
      STDOUT.flush
    end
    # Initializes and configures bot
    # @param args Arguments passed to configure method
    # @param b Block to call with new bot
    def initialize(username, &b)
      @blacklist ||= []
      @conversations ||= {}
      # Tweet ids we've already observed, to avoid duplication
      @seen_tweets ||= {}
      @username = username
      configure
      b.call(self) unless b.nil?
      Bot.all << self
    end
    # Find or create the conversation context for this tweet
    # @param tweet [Twitter::Tweet]
    # @return [Ebooks::Conversation]
    def conversation(tweet)
      conv = if tweet.in_reply_to_status_id?
        @conversations[tweet.in_reply_to_status_id]
      end
      if conv.nil?
        conv = @conversations[tweet.id] || Conversation.new(self)
      end
      if tweet.in_reply_to_status_id?
        @conversations[tweet.in_reply_to_status_id] = conv
      end
      @conversations[tweet.id] = conv
      # Expire any old conversations to prevent memory growth
      @conversations.each do |k,v|
        if v != conv && Time.now - v.last_update > 3600
          @conversations.delete(k)
        end
      end
      conv
    end
    # @return [Twitter::REST::Client] underlying REST client from twitter gem
    def twitter
      @twitter ||= Twitter::REST::Client.new do |config|
        config.consumer_key = @consumer_key
        config.consumer_secret = @consumer_secret
        config.access_token = @access_token
        config.access_token_secret = @access_token_secret
      end
    end
    # @return [Twitter::Streaming::Client] underlying streaming client from twitter gem
    def stream
      @stream ||= Twitter::Streaming::Client.new do |config|
        config.consumer_key = @consumer_key
        config.consumer_secret = @consumer_secret
        config.access_token = @access_token
        config.access_token_secret = @access_token_secret
      end
    end
    # Calculate some meta information about a tweet relevant for replying
    # @param ev [Twitter::Tweet]
    # @return [Ebooks::TweetMeta]
    def meta(ev)
      TweetMeta.new(self, ev)
    end
    # Receive an event from the twitter stream
    # @param ev [Object] Twitter streaming event
    def receive_event(ev)
      if ev.is_a? Array # Initial array sent on first connection
        log "Online!"
@ -181,7 +233,7 @@ module Ebooks
        return unless ev.text # If it's not a text-containing tweet, ignore it
        return if ev.user.screen_name == @username # Ignore our own tweets
-        meta = calc_meta(ev)
+        meta = meta(ev)
        if blacklisted?(ev.user.screen_name)
          log "Blocking blacklisted user @#{ev.user.screen_name}"
@ -190,17 +242,18 @@ module Ebooks
        # Avoid responding to duplicate tweets
        if @seen_tweets[ev.id]
          log "Not firing event for duplicate tweet #{ev.id}"
          return
        else
          @seen_tweets[ev.id] = true
        end
-        if meta[:mentions_bot]
+        if meta.mentions_bot?
          log "Mention from @#{ev.user.screen_name}: #{ev.text}"
-          interaction(ev.user.screen_name).receive(ev)
+          conversation(ev).add(ev)
-          fire(:mention, ev, meta)
+          fire(:mention, ev)
        else
-          fire(:timeline, ev, meta)
+          fire(:timeline, ev)
        end
      elsif ev.is_a?(Twitter::Streaming::DeletedTweet) ||
@ -211,7 +264,31 @@ module Ebooks
      end
    end
-    def start_stream
+    # Configures client and fires startup event
    def prepare
      # Sanity check
      if @username.nil?
        raise ConfigurationError, "bot username cannot be nil"
      end
      if @consumer_key.nil? || @consumer_key.empty? ||
         @consumer_secret.nil? || @consumer_key.empty?
        log "Missing consumer_key or consumer_secret. These details can be acquired by registering a Twitter app at https://apps.twitter.com/"
        exit 1
      end
      if @access_token.nil? || @access_token.empty? ||
         @access_token_secret.nil? || @access_token_secret.empty?
        log "Missing access_token or access_token_secret. Please run `ebooks auth`."
        exit 1
      end
      twitter
      fire(:startup)
    end
    # Start running user event stream
    def start
      log "starting tweet stream"
      stream.user do |ev|
@ -219,22 +296,9 @@ module Ebooks
      end
    end
    def prepare
      # Sanity check
      if @username.nil?
        raise ConfigurationError, "bot.username cannot be nil"
      end
      twitter
      fire(:startup)
    end
    # Connects to tweetstream and opens event handlers for this bot
    def start
      start_stream
    end
    # Fire an event
    # @param event [Symbol] event to fire
    # @param args arguments for event handler
    def fire(event, *args)
      handler = "on_#{event}".to_sym
      if respond_to? handler
@ -242,11 +306,17 @@ module Ebooks
      end
    end
-    def delay(&b)
+    # Delay an action for a variable period of time
-      time = @delay.to_a.sample unless @delay.is_a? Integer
+    # @param range [Range, Integer] range of seconds to choose for delay
    def delay(range=@delay_range, &b)
      time = range.to_a.sample unless range.is_a? Integer
      sleep time
      b.call
    end
    # Check if a username is blacklisted
    # @param username [String]
    # @return [Boolean]
    def blacklisted?(username)
      if @blacklist.include?(username)
        true
@ -256,46 +326,37 @@ module Ebooks
    end
    # Reply to a tweet or a DM.
    # @param ev [Twitter::Tweet, Twitter::DirectMessage]
    # @param text [String] contents of reply excluding reply_prefix
    # @param opts [Hash] additional params to pass to twitter gem
    def reply(ev, text, opts={})
      opts = opts.clone
      if ev.is_a? Twitter::DirectMessage
        return if blacklisted?(ev.sender.screen_name)
        log "Sending DM to @#{ev.sender.screen_name}: #{text}"
        twitter.create_direct_message(ev.sender.screen_name, text, opts)
      elsif ev.is_a? Twitter::Tweet
-        meta = calc_meta(ev)
+        meta = meta(ev)
-        if !interaction(ev.user.screen_name).continue?
+        if conversation(ev).is_bot?(ev.user.screen_name)
          log "Not replying to suspected bot @#{ev.user.screen_name}"
-          return
+          return false
        end
-        if !meta[:mentions_bot]
+        log "Replying to @#{ev.user.screen_name} with: #{meta.reply_prefix + text}"
-          if !userinfo(ev.user.screen_name).can_pester?
+        tweet = twitter.update(meta.reply_prefix + text, in_reply_to_status_id: ev.id)
-            log "Not replying: leaving @#{ev.user.screen_name} alone"
+        conversation(tweet).add(tweet)
-            return
+        tweet
          else
            userinfo(ev.user.screen_name).pesters_left -= 1
          end
        end
        log "Replying to @#{ev.user.screen_name} with: #{meta[:reply_prefix] + text}"
        twitter.update(meta[:reply_prefix] + text, in_reply_to_status_id: ev.id)
      else
        raise Exception("Don't know how to reply to a #{ev.class}")
      end
    end
    # Favorite a tweet
    # @param tweet [Twitter::Tweet]
    def favorite(tweet)
      return if blacklisted?(tweet.user.screen_name)
      log "Favoriting @#{tweet.user.screen_name}: #{tweet.text}"
      meta = calc_meta(tweet)
      if !meta[:mentions_bot] && !userinfo(ev.user.screen_name).can_pester?
        log "Not favoriting: leaving @#{ev.user.screen_name} alone"
      end
      begin
        twitter.favorite(tweet.id)
      rescue Twitter::Error::Forbidden
@ -303,8 +364,9 @@ module Ebooks
      end
    end
    # Retweet a tweet
    # @param tweet [Twitter::Tweet]
    def retweet(tweet)
      return if blacklisted?(tweet.user.screen_name)
      log "Retweeting @#{tweet.user.screen_name}: #{tweet.text}"
      begin
@ -314,21 +376,36 @@ module Ebooks
      end
    end
-    def follow(*args)
+    # Follow a user
-      log "Following #{args}"
+    # @param user [String] username or user id
-      twitter.follow(*args)
+    def follow(user, *args)
      log "Following #{user}"
      twitter.follow(user, *args)
    end
-    def tweet(*args)
+    # Unfollow a user
-      log "Tweeting #{args.inspect}"
+    # @param user [String] username or user id
-      twitter.update(*args)
+    def unfollow(user, *args)
      log "Unfollowing #{user}"
      twiter.unfollow(user, *args)
    end
    # Tweet something
    # @param text [String]
    def tweet(text, *args)
      log "Tweeting '#{text}'"
      twitter.update(text, *args)
    end
    # Get a scheduler for this bot
    # @return [Rufus::Scheduler]
    def scheduler
      @scheduler ||= Rufus::Scheduler.new
    end
-    # could easily just be *args however the separation keeps it clean.
+    # Tweet some text with an image
    # @param txt [String]
    # @param pic [String] filename
    def pictweet(txt, pic, *args)
      log "Tweeting #{txt.inspect} - #{pic} #{args}"
      twitter.update_with_media(txt, File.new(pic), *args)
--- a/lib/twitter_ebooks/markov.rb
+++ b/lib/twitter_ebooks/markov.rb
@ -1,82 +0,0 @@
 module Ebooks
  # Special INTERIM token represents sentence boundaries
  # This is so we can include start and end of statements in model
  # Due to the way the sentence tokenizer works, can correspond
  # to multiple actual parts of text (such as ^, $, \n and .?!)
  INTERIM = :interim
  # This is an ngram-based Markov model optimized to build from a
  # tokenized sentence list without requiring too much transformation
  class MarkovModel
    def self.build(sentences)
      MarkovModel.new.consume(sentences)
    end
    def consume(sentences)
      # These models are of the form ngram => [[sentence_pos, token_pos] || INTERIM, ...]
      # We map by both bigrams and unigrams so we can fall back to the latter in
      # cases where an input bigram is unavailable, such as starting a sentence
      @sentences = sentences
      @unigrams = {}
      @bigrams = {}
      sentences.each_with_index do |tokens, i|
        last_token = INTERIM
        tokens.each_with_index do |token, j|
          @unigrams[last_token] ||= []
          @unigrams[last_token] << [i, j]
          @bigrams[last_token] ||= {}
          @bigrams[last_token][token] ||= []
          if j == tokens.length-1 # Mark sentence endings
            @unigrams[token] ||= []
            @unigrams[token] << INTERIM
            @bigrams[last_token][token] << INTERIM
          else
            @bigrams[last_token][token] << [i, j+1]
          end
          last_token = token
        end
      end
      self
    end
    def find_token(index)
      if index == INTERIM
        INTERIM
      else
        @sentences[index[0]][index[1]]
      end
    end
    def chain(tokens)
      if tokens.length == 1
        matches = @unigrams[tokens[-1]]
      else
        matches = @bigrams[tokens[-2]][tokens[-1]]
        matches = @unigrams[tokens[-1]] if matches.length < 2
      end
      if matches.empty?
        # This should never happen unless a strange token is
        # supplied from outside the dataset
        raise ArgumentError, "Unable to continue chain for: #{tokens.inspect}"
      end
      next_token = find_token(matches.sample)
      if next_token == INTERIM # We chose to end the sentence
        return tokens
      else
        return chain(tokens + [next_token])
      end
    end
    def generate
      NLP.reconstruct(chain([INTERIM]))
    end
  end
 end
--- a/lib/twitter_ebooks/model.rb
+++ b/lib/twitter_ebooks/model.rb
@ -8,16 +8,41 @@ require 'csv'
 module Ebooks
  class Model
-    attr_accessor :hash, :tokens, :sentences, :mentions, :keywords
+    # @return [Array<String>]
    # An array of unique tokens. This is the main source of actual strings
    # in the model. Manipulation of a token is done using its index
    # in this array, which we call a "tiki"
    attr_accessor :tokens
-    def self.consume(txtpath)
+    # @return [Array<Array<Integer>>]
-      Model.new.consume(txtpath)
+    # Sentences represented by arrays of tikis
    attr_accessor :sentences
    # @return [Array<Array<Integer>>]
    # Sentences derived from Twitter mentions
    attr_accessor :mentions
    # @return [Array<String>]
    # The top 200 most important keywords, in descending order
    attr_accessor :keywords
    # Generate a new model from a corpus file
    # @param path [String]
    # @return [Ebooks::Model]
    def self.consume(path)
      Model.new.consume(path)
    end
    # Generate a new model from multiple corpus files
    # @param paths [Array<String>]
    # @return [Ebooks::Model]
    def self.consume_all(paths)
      Model.new.consume_all(paths)
    end
    # Load a saved model
    # @param path [String]
    # @return [Ebooks::Model]
    def self.load(path)
      model = Model.new
      model.instance_eval do
@ -30,6 +55,8 @@ module Ebooks
      model
    end
    # Save model to a file
    # @param path [String]
    def save(path)
      File.open(path, 'wb') do |f|
        f.write(Marshal.dump({
@ -43,19 +70,22 @@ module Ebooks
    end
    def initialize
      # This is the only source of actual strings in the model. It is
      # an array of unique tokens. Manipulation of a token is mostly done
      # using its index in this array, which we call a "tiki"
      @tokens = []
      # Reverse lookup tiki by token, for faster generation
      @tikis = {}
    end
    # Reverse lookup a token index from a token
    # @param token [String]
    # @return [Integer]
    def tikify(token)
      @tikis[token] or (@tokens << token and @tikis[token] = @tokens.length-1)
    end
    # Convert a body of text into arrays of tikis
    # @param text [String]
    # @return [Array<Array<Integer>>]
    def mass_tikify(text)
      sentences = NLP.sentences(text)
@ -69,9 +99,10 @@ module Ebooks
      end
    end
    # Consume a corpus into this model
    # @param path [String]
    def consume(path)
      content = File.read(path, :encoding => 'utf-8')
      @hash = Digest::MD5.hexdigest(content)
      if path.split('.')[-1] == "json"
        log "Reading json corpus from #{path}"
@ -94,6 +125,8 @@ module Ebooks
      consume_lines(lines)
    end
    # Consume a sequence of lines
    # @param lines [Array<String>]
    def consume_lines(lines)
      log "Removing commented lines and sorting mentions"
@ -126,11 +159,12 @@ module Ebooks
      self
    end
    # Consume multiple corpuses into this model
    # @param paths [Array<String>]
    def consume_all(paths)
      lines = []
      paths.each do |path|
        content = File.read(path, :encoding => 'utf-8')
        @hash = Digest::MD5.hexdigest(content)
        if path.split('.')[-1] == "json"
          log "Reading json corpus from #{path}"
@ -156,25 +190,26 @@ module Ebooks
      consume_lines(lines)
    end
-    def fix(tweet)
+    # Correct encoding issues in generated text
-      # This seems to require an external api call
+    # @param text [String]
-      #begin
+    # @return [String]
-      #  fixer = NLP.gingerice.parse(tweet)
+    def fix(text)
-      #  log fixer if fixer['corrections']
+      NLP.htmlentities.decode text
      #  tweet = fixer['result']
      #rescue Exception => e
      #  log e.message
      #  log e.backtrace
      #end
      NLP.htmlentities.decode tweet
    end
    # Check if an array of tikis comprises a valid tweet
    # @param tikis [Array<Integer>]
    # @param limit Integer how many chars we have left
    def valid_tweet?(tikis, limit)
      tweet = NLP.reconstruct(tikis, @tokens)
      tweet.length <= limit && !NLP.unmatched_enclosers?(tweet)
    end
    # Generate some text
    # @param limit [Integer] available characters
    # @param generator [SuffixGenerator, nil]
    # @param retry_limit [Integer] how many times to retry on duplicates
    # @return [String]
    def make_statement(limit=140, generator=nil, retry_limit=10)
      responding = !generator.nil?
      generator ||= SuffixGenerator.build(@sentences)
@ -209,12 +244,17 @@ module Ebooks
    end
    # Test if a sentence has been copied verbatim from original
-    def verbatim?(tokens)
+    # @param tikis [Array<Integer>]
-      @sentences.include?(tokens) || @mentions.include?(tokens)
+    # @return [Boolean]
    def verbatim?(tikis)
      @sentences.include?(tikis) || @mentions.include?(tikis)
    end
-    # Finds all relevant tokenized sentences to given input by
+    # Finds relevant and slightly relevant tokenized sentences to input
    # comparing non-stopword token overlaps
    # @param sentences [Array<Array<Integer>>]
    # @param input [String]
    # @return [Array<Array<Array<Integer>>, Array<Array<Integer>>>]
    def find_relevant(sentences, input)
      relevant = []
      slightly_relevant = []
@ -235,6 +275,10 @@ module Ebooks
    # Generates a response by looking for related sentences
    # in the corpus and building a smaller generator from these
    # @param input [String]
    # @param limit [Integer] characters available for response
    # @param sentences [Array<Array<Integer>>]
    # @return [String]
    def make_response(input, limit=140, sentences=@mentions)
      # Prefer mentions
      relevant, slightly_relevant = find_relevant(sentences, input)
--- a/lib/twitter_ebooks/nlp.rb
+++ b/lib/twitter_ebooks/nlp.rb
@ -12,31 +12,35 @@ module Ebooks
    # Some of this stuff is pretty heavy and we don't necessarily need
    # to be using it all of the time
    # Lazily loads an array of stopwords
    # Stopwords are common English words that should often be ignored
    # @return [Array<String>]
    def self.stopwords
      @stopwords ||= File.read(File.join(DATA_PATH, 'stopwords.txt')).split
    end
    # Lazily loads an array of known English nouns
    # @return [Array<String>]
    def self.nouns
      @nouns ||= File.read(File.join(DATA_PATH, 'nouns.txt')).split
    end
    # Lazily loads an array of known English adjectives
    # @return [Array<String>]
    def self.adjectives
      @adjectives ||= File.read(File.join(DATA_PATH, 'adjectives.txt')).split
    end
-    # POS tagger
+    # Lazily load part-of-speech tagging library
    # This can determine whether a word is being used as a noun/adjective/verb
    # @return [EngTagger]
    def self.tagger
      require 'engtagger'
      @tagger ||= EngTagger.new
    end
-    # Gingerice text correction service
+    # Lazily load HTML entity decoder
-    def self.gingerice
+    # @return [HTMLEntities]
      require 'gingerice'
      Gingerice::Parser.new # No caching for this one
    end
    # For decoding html entities
    def self.htmlentities
      require 'htmlentities'
      @htmlentities ||= HTMLEntities.new
@ -44,7 +48,9 @@ module Ebooks
    ### Utility functions
-    # We don't really want to deal with all this weird unicode punctuation
+    # Normalize some strange unicode punctuation variants
    # @param text [String]
    # @return [String]
    def self.normalize(text)
      htmlentities.decode text.gsub('“', '"').gsub('”', '"').gsub('’', "'").gsub('…', '...')
    end
@ -53,6 +59,8 @@ module Ebooks
    # We use ad hoc approach because fancy libraries do not deal
    # especially well with tweet formatting, and we can fake solving
    # the quote problem during generation
    # @param text [String]
    # @return [Array<String>]
    def self.sentences(text)
      text.split(/\n+|(?<=[.?!])\s+/)
    end
@ -60,15 +68,23 @@ module Ebooks
    # Split a sentence into word-level tokens
    # As above, this is ad hoc because tokenization libraries
    # do not behave well wrt. things like emoticons and timestamps
    # @param sentence [String]
    # @return [Array<String>]
    def self.tokenize(sentence)
      regex = /\s+|(?<=[#{PUNCTUATION}]\s)(?=[a-zA-Z])|(?<=[a-zA-Z])(?=[#{PUNCTUATION}]+\s)/
      sentence.split(regex)
    end
    # Get the 'stem' form of a word e.g. 'cats' -> 'cat'
    # @param word [String]
    # @return [String]
    def self.stem(word)
      Stemmer::stem_word(word.downcase)
    end
    # Use highscore gem to find interesting keywords in a corpus
    # @param text [String]
    # @return [Highscore::Keywords]
    def self.keywords(text)
      # Preprocess to remove stopwords (highscore's blacklist is v. slow)
      text = NLP.tokenize(text).reject { |t| stopword?(t) }.join(' ')
@ -90,7 +106,10 @@ module Ebooks
      text.keywords
    end
-    # Takes a list of tokens and builds a nice-looking sentence
+    # Builds a proper sentence from a list of tikis
    # @param tikis [Array<Integer>]
    # @param tokens [Array<String>]
    # @return [String]
    def self.reconstruct(tikis, tokens)
      text = ""
      last_token = nil
@ -105,6 +124,9 @@ module Ebooks
    end
    # Determine if we need to insert a space between two tokens
    # @param token1 [String]
    # @param token2 [String]
    # @return [Boolean]
    def self.space_between?(token1, token2)
      p1 = self.punctuation?(token1)
      p2 = self.punctuation?(token2)
@ -119,10 +141,16 @@ module Ebooks
      end
    end
    # Is this token comprised of punctuation?
    # @param token [String]
    # @return [Boolean]
    def self.punctuation?(token)
      (token.chars.to_set - PUNCTUATION.chars.to_set).empty?
    end
    # Is this token a stopword?
    # @param token [String]
    # @return [Boolean]
    def self.stopword?(token)
      @stopword_set ||= stopwords.map(&:downcase).to_set
      @stopword_set.include?(token.downcase)
@ -130,7 +158,9 @@ module Ebooks
    # Determine if a sample of text contains unmatched brackets or quotes
    # This is one of the more frequent and noticeable failure modes for
-    # the markov generator; we can just tell it to retry
+    # the generator; we can just tell it to retry
    # @param text [String]
    # @return [Boolean]
    def self.unmatched_enclosers?(text)
      enclosers = ['**', '""', '()', '[]', '``', "''"]
      enclosers.each do |pair|
@ -153,10 +183,13 @@ module Ebooks
    end
    # Determine if a2 is a subsequence of a1
    # @param a1 [Array]
    # @param a2 [Array]
    # @return [Boolean]
    def self.subseq?(a1, a2)
-      a1.each_index.find do |i|
+      !a1.each_index.find do |i|
        a1[i...i+a2.length] == a2
-      end
+      end.nil?
    end
  end
 end
--- a/lib/twitter_ebooks/suffix.rb
+++ b/lib/twitter_ebooks/suffix.rb
@ -1,11 +1,14 @@
 # encoding: utf-8
 module Ebooks
-  # This generator uses data identical to the markov model, but
+  # This generator uses data identical to a markov model, but
  # instead of making a chain by looking up bigrams it uses the
  # positions to randomly replace suffixes in one sentence with
  # matching suffixes in another
  class SuffixGenerator
    # Build a generator from a corpus of tikified sentences
    # @param sentences [Array<Array<Integer>>]
    # @return [SuffixGenerator]
    def self.build(sentences)
      SuffixGenerator.new(sentences)
    end
@ -39,6 +42,11 @@ module Ebooks
      self
    end
    # Generate a recombined sequence of tikis
    # @param passes [Integer] number of times to recombine
    # @param n [Symbol] :unigrams or :bigrams (affects how conservative the model is)
    # @return [Array<Integer>]
    def generate(passes=5, n=:unigrams)
      index = rand(@sentences.length)
      tikis = @sentences[index]
--- a/lib/twitter_ebooks/version.rb
+++ b/lib/twitter_ebooks/version.rb
@ -1,3 +1,3 @@
 module Ebooks
-  VERSION = "2.3.2"
+  VERSION = "3.0.0"
 end
--- a/skeleton/Gemfile
+++ b/skeleton/Gemfile
@ -1,4 +1,4 @@
 source 'http://rubygems.org'
-ruby '1.9.3'
+ruby '{{RUBY_VERSION}}'
 gem 'twitter_ebooks'
--- a/skeleton/Procfile
+++ b/skeleton/Procfile
@ -1 +1 @@
-worker: ruby run.rb start
+worker: ebooks start
--- a/skeleton/bots.rb
+++ b/skeleton/bots.rb
@ -1,42 +1,55 @@
 #!/usr/bin/env ruby
 require 'twitter_ebooks'
 # This is an example bot definition with event handlers commented out
-# You can define as many of these as you like; they will run simultaneously
+# You can define and instantiate as many bots as you like
-Ebooks::Bot.new("{{BOT_NAME}}") do |bot|
+class MyBot < Ebooks::Bot
-  # Consumer details come from registering an app at https://dev.twitter.com/
+  # Configuration here applies to all MyBots
-  # OAuth details can be fetched with https://github.com/marcel/twurl
+  def configure
-  bot.consumer_key = "" # Your app consumer key
+    # Consumer details come from registering an app at https://dev.twitter.com/
-  bot.consumer_secret = "" # Your app consumer secret
+    # Once you have consumer details, use "ebooks auth" for new access tokens
-  bot.oauth_token = "" # Token connecting the app to this account
+    self.consumer_key = '' # Your app consumer key
-  bot.oauth_token_secret = "" # Secret connecting the app to this account
+    self.consumer_secret = '' # Your app consumer secret
-  bot.on_message do |dm|
+    # Users to block instead of interacting with
    self.blacklist = ['tnietzschequote']
    # Range in seconds to randomize delay when bot.delay is called
    self.delay_range = 1..6
  end
  def on_startup
    scheduler.every '24h' do
      # Tweet something every 24 hours
      # See https://github.com/jmettraux/rufus-scheduler
      # bot.tweet("hi")
      # bot.pictweet("hi", "cuteselfie.jpg")
    end
  end
  def on_message(dm)
    # Reply to a DM
    # bot.reply(dm, "secret secrets")
  end
-  bot.on_follow do |user|
+  def on_follow(user)
    # Follow a user back
    # bot.follow(user[:screen_name])
  end
-  bot.on_mention do |tweet, meta|
+  def on_mention(tweet)
    # Reply to a mention
-    # bot.reply(tweet, meta[:reply_prefix] + "oh hullo")
+    # bot.reply(tweet, meta(tweet)[:reply_prefix] + "oh hullo")
  end
-  bot.on_timeline do |tweet, meta|
+  def on_timeline(tweet)
    # Reply to a tweet in the bot's timeline
-    # bot.reply(tweet, meta[:reply_prefix] + "nice tweet")
+    # bot.reply(tweet, meta(tweet)[:reply_prefix] + "nice tweet")
  end
  bot.scheduler.every '24h' do
    # Tweet something every 24 hours
    # See https://github.com/jmettraux/rufus-scheduler
    # bot.tweet("hi")
    # bot.pictweet("hi", "cuteselfie.jpg", ":possibly_sensitive => true")
  end
 end
 # Make a MyBot and attach it to an account
 MyBot.new("{{BOT_NAME}}") do |bot|
  bot.access_token = "" # Token connecting the app to this account
  bot.access_token_secret = "" # Secret connecting the app to this account
 end
--- a/skeleton/run.rb
+++ b/skeleton/run.rb
@ -1,9 +0,0 @@
 #!/usr/bin/env ruby
 require_relative 'bots'
 EM.run do
 Ebooks::Bot.all.each do |bot|
    bot.start
  end
 end
--- a/spec/bot_spec.rb
+++ b/spec/bot_spec.rb
@ -3,13 +3,10 @@ require 'memory_profiler'
 require 'tempfile'
 require 'timecop'
 def Process.rss; `ps -o rss= -p #{Process.pid}`.chomp.to_i; end
 class TestBot < Ebooks::Bot
  attr_accessor :twitter
  def configure
    self.username = "test_ebooks"
  end
  def on_direct_message(dm)
@ -17,7 +14,7 @@ class TestBot < Ebooks::Bot
  end
  def on_mention(tweet, meta)
-    reply tweet, "echo: #{meta[:mentionless]}"
+    reply tweet, "echo: #{meta.mentionless}"
  end
  def on_timeline(tweet, meta)
@ -43,10 +40,11 @@ module Ebooks::Test
  # Creates a mock tweet
  # @param username User sending the tweet
  # @param text Tweet content
-  def mock_tweet(username, text)
+  def mock_tweet(username, text, extra={})
    mentions = text.split.find_all { |x| x.start_with?('@') }
-    Twitter::Tweet.new(
+    tweet = Twitter::Tweet.new({
      id: twitter_id,
      in_reply_to_status_id: 'mock-link',
      user: { id: twitter_id, screen_name: username },
      text: text,
      created_at: Time.now.to_s,
@ -56,29 +54,36 @@ module Ebooks::Test
            indices: [text.index(m), text.index(m)+m.length] }
        }
      }
-    )
+    }.merge!(extra))
    tweet
  end
  def twitter_spy(bot)
    twitter = spy("twitter")
    allow(twitter).to receive(:update).and_return(mock_tweet(bot.username, "test tweet"))
    twitter
  end
  def simulate(bot, &b)
-    bot.twitter = spy("twitter")
+    bot.twitter = twitter_spy(bot)
    b.call
  end
  def expect_direct_message(bot, content)
    expect(bot.twitter).to have_received(:create_direct_message).with(anything(), content, {})
-    bot.twitter = spy("twitter")
+    bot.twitter = twitter_spy(bot)
  end
  def expect_tweet(bot, content)
    expect(bot.twitter).to have_received(:update).with(content, anything())
-    bot.twitter = spy("twitter")
+    bot.twitter = twitter_spy(bot)
  end
 end
 describe Ebooks::Bot do
  include Ebooks::Test
-  let(:bot) { TestBot.new }
+  let(:bot) { TestBot.new('test_ebooks') }
  before { Timecop.freeze }
  after { Timecop.return }
@ -104,6 +109,20 @@ describe Ebooks::Bot do
    end
  end
  it "links tweets to conversations correctly" do
    tweet1 = mock_tweet("m1sp", "tweet 1", id: 1, in_reply_to_status_id: nil)
    tweet2 = mock_tweet("m1sp", "tweet 2", id: 2, in_reply_to_status_id: 1)
    tweet3 = mock_tweet("m1sp", "tweet 3", id: 3, in_reply_to_status_id: nil)
    bot.conversation(tweet1).add(tweet1)
    expect(bot.conversation(tweet2)).to eq(bot.conversation(tweet1))
    bot.conversation(tweet2).add(tweet2)
    expect(bot.conversation(tweet3)).to_not eq(bot.conversation(tweet2))
  end
  it "stops mentioning people after a certain limit" do
    simulate(bot) do
      bot.receive_event(mock_tweet("spammer", "@test_ebooks @m1sp 1"))
--- a/test/corpus/0xabad1dea.tweets
+++ b/test/corpus/0xabad1dea.tweets
--- a/test/keywords.rb
+++ b/test/keywords.rb
@ -1,18 +0,0 @@
 #!/usr/bin/env ruby
 # encoding: utf-8
 require 'twitter_ebooks'
 require 'minitest/autorun'
 require 'benchmark'
 module Ebooks
  class TestKeywords < Minitest::Test
    corpus = NLP.normalize(File.read(ARGV[0]))
    puts "Finding and ranking keywords"
    puts Benchmark.measure {
      NLP.keywords(corpus).top(50).each do |keyword|
        puts "#{keyword.text} #{keyword.weight}"
      end
    }
  end
 end
--- a/test/tokenize.rb
+++ b/test/tokenize.rb
@ -1,18 +0,0 @@
 #!/usr/bin/env ruby
 # encoding: utf-8
 require 'twitter_ebooks'
 require 'minitest/autorun'
 module Ebooks
  class TestTokenize < Minitest::Test
    corpus = NLP.normalize(File.read(TEST_CORPUS_PATH))
    sents = NLP.sentences(corpus).sample(10)
    NLP.sentences(corpus).sample(10).each do |sent|
      p sent
      p NLP.tokenize(sent)
      puts
    end
  end
 end
--- a/twitter_ebooks.gemspec
+++ b/twitter_ebooks.gemspec
@ -18,8 +18,9 @@ Gem::Specification.new do |gem|
  gem.add_development_dependency 'rspec'
  gem.add_development_dependency 'rspec-mocks'
  gem.add_development_dependency 'memory_profiler'
  gem.add_development_dependency 'pry-byebug'
  gem.add_development_dependency 'timecop'
  gem.add_development_dependency 'pry-byebug'
  gem.add_development_dependency 'yard'
  gem.add_runtime_dependency 'twitter', '~> 5.0'
  gem.add_runtime_dependency 'simple_oauth'
@ -30,4 +31,5 @@ Gem::Specification.new do |gem|
  gem.add_runtime_dependency 'engtagger'
  gem.add_runtime_dependency 'fast-stemmer'
  gem.add_runtime_dependency 'highscore'
  gem.add_runtime_dependency 'pry'
 end
`@ -1 +1 @@`
	`worker: ruby run.rb start`	`worker: ebooks start`