Merge branch '3.0'

2014-12-05 22:57:41 +11:00 · 2014-12-05 22:57:41 +11:00 · 56aadea555
commit 56aadea555
parent 1a40ef85f9 822f5e4c6c
20 changed files with 738 additions and 15203 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,3 +1,5 @@
 .*.swp
 Gemfile.lock
 pkg
+.yardoc
+doc
--- a/README.md
+++ b/README.md
@ -4,8 +4,16 @@
 [![Build Status](https://travis-ci.org/mispy/twitter_ebooks.svg)](https://travis-ci.org/mispy/twitter_ebooks)
 [![Dependency Status](https://gemnasium.com/mispy/twitter_ebooks.svg)](https://gemnasium.com/mispy/twitter_ebooks)

+A framework for building interactive twitterbots which respond to mentions/DMs. twitter_ebooks tries to be a good friendly bot citizen by avoiding infinite conversations and spamming people, so you only have to write the interesting parts.

-Rewrite of my twitter\_ebooks code. While the original was solely a tweeting Markov generator, this framework helps you build any kind of interactive twitterbot which responds to mentions/DMs. See [ebooks\_example](https://github.com/mispy/ebooks_example) for an example of a full bot.
+## New in 3.0
+
+- Bots run in their own threads (no eventmachine), and startup is parallelized
+- Bots start with `ebooks start`, and no longer die on unhandled exceptions
+- `ebooks auth` command will create new access tokens, for running multiple bots
+- `ebooks console` starts a ruby interpreter with bots loaded (see Ebooks::Bot.all)
+- Replies are slightly rate-limited to prevent infinite bot convos
+- Non-participating users in a mention chain will be dropped after a few tweets

 ## Installation

@ -21,53 +29,63 @@ Run `ebooks new <reponame>` to generate a new repository containing a sample bot

 ``` ruby
 # This is an example bot definition with event handlers commented out
-# You can define as many of these as you like; they will run simultaneously
+# You can define and instantiate as many bots as you like

-Ebooks::Bot.new("abby_ebooks") do |bot|
-  # Consumer details come from registering an app at https://dev.twitter.com/
-  # OAuth details can be fetched with https://github.com/marcel/twurl
-  bot.consumer_key = "" # Your app consumer key
-  bot.consumer_secret = "" # Your app consumer secret
-  bot.oauth_token = "" # Token connecting the app to this account
-  bot.oauth_token_secret = "" # Secret connecting the app to this account
+class MyBot < Ebooks::Bot
+  # Configuration here applies to all MyBots
+  def configure
+    # Consumer details come from registering an app at https://dev.twitter.com/
+    # Once you have consumer details, use "ebooks auth" for new access tokens
+    self.consumer_key = '' # Your app consumer key
+    self.consumer_secret = '' # Your app consumer secret

-  bot.on_startup do
-    # Run some startup task
-    # puts "I'm ready!"
+    # Users to block instead of interacting with
+    self.blacklist = ['tnietzschequote']
+
+    # Range in seconds to randomize delay when bot.delay is called
+    self.delay_range = 1..6
  end

-  bot.on_message do |dm|
+  def on_startup
+    scheduler.every '24h' do
+      # Tweet something every 24 hours
+      # See https://github.com/jmettraux/rufus-scheduler
+      # bot.tweet("hi")
+      # bot.pictweet("hi", "cuteselfie.jpg")
+    end
+  end
+
+  def on_message(dm)
    # Reply to a DM
    # bot.reply(dm, "secret secrets")
  end

-  bot.on_follow do |user|
+  def on_follow(user)
    # Follow a user back
    # bot.follow(user[:screen_name])
  end

-  bot.on_mention do |tweet, meta|
+  def on_mention(tweet)
    # Reply to a mention
-    # bot.reply(tweet, meta[:reply_prefix] + "oh hullo")
+    # bot.reply(tweet, meta(tweet)[:reply_prefix] + "oh hullo")
  end

-  bot.on_timeline do |tweet, meta|
+  def on_timeline(tweet)
    # Reply to a tweet in the bot's timeline
-    # bot.reply(tweet, meta[:reply_prefix] + "nice tweet")
+    # bot.reply(tweet, meta(tweet)[:reply_prefix] + "nice tweet")
  end
+end

-  bot.scheduler.every '24h' do
-    # Tweet something every 24 hours
-    # See https://github.com/jmettraux/rufus-scheduler
-    # bot.tweet("hi")
-	# bot.pictweet("hi", "cuteselfie.jpg", ":possibly_sensitive => true")
-  end
+# Make a MyBot and attach it to an account
+MyBot.new("{{BOT_NAME}}") do |bot|
+  bot.access_token = "" # Token connecting the app to this account
+  bot.access_token_secret = "" # Secret connecting the app to this account
 end
 ```

-Bots defined like this can be spawned by executing `run.rb` in the same directory, and will operate together in a single eventmachine loop. The easiest way to run bots in a semi-permanent fashion is with [Heroku](https://www.heroku.com); just make an app, push the bot repository to it, enable a worker process in the web interface and it ought to chug along merrily forever.
+'ebooks start' will run all defined bots in their own threads. The easiest way to run bots in a semi-permanent fashion is with [Heroku](https://www.heroku.com); just make an app, push the bot repository to it, enable a worker process in the web interface and it ought to chug along merrily forever.

-The underlying [tweetstream](https://github.com/tweetstream/tweetstream) and [twitter gem](https://github.com/sferik/twitter) client objects can be accessed at `bot.stream` and `bot.twitter` respectively.
+The underlying streaming and REST clients from the [twitter gem](https://github.com/sferik/twitter) can be accessed at `bot.stream` and `bot.twitter` respectively.

 ## Archiving accounts

@ -102,7 +120,6 @@ Text files use newlines and full stops to seperate statements.
 Once you have a model, the primary use is to produce statements and related responses to input, using a pseudo-Markov generator:

 ``` ruby
-> require 'twitter_ebooks'
 > model = Ebooks::Model.load("model/0xabad1dea.model")
 > model.make_statement(140)
 => "My Terrible Netbook may be the kind of person who buys Starbucks, but this Rackspace vuln is pretty straight up a backdoor"
@ -113,14 +130,18 @@ Once you have a model, the primary use is to produce statements and related resp
 The secondary function is the "interesting keywords" list. For example, I use this to determine whether a bot wants to fav/retweet/reply to something in its timeline:

 ``` ruby
-top100 = model.keywords.top(100)
+top100 = model.keywords.take(100)
 tokens = Ebooks::NLP.tokenize(tweet[:text])

 if tokens.find { |t| top100.include?(t) }
-  bot.twitter.favorite(tweet[:id])
+  bot.favorite(tweet[:id])
 end
 ```

+## Bot niceness
+
+
+
 ## Other notes

 If you're using Heroku, which has no persistent filesystem, automating the process of archiving, consuming and updating can be tricky. My current solution is just a daily cron job which commits and pushes for me, which is pretty hacky.
--- a/bin/ebooks
+++ b/bin/ebooks
@ -2,54 +2,85 @@
 # encoding: utf-8

 require 'twitter_ebooks'
-require 'csv'
+require 'ostruct'

-$debug = true
+module Ebooks::Util
+  def pretty_exception(e)

-module Ebooks
+  end
+end
+
+module Ebooks::CLI
  APP_PATH = Dir.pwd # XXX do some recursive thing instead
+  HELP = OpenStruct.new

-  def self.new(reponame)
-    usage = <<STR
-Usage: ebooks new <reponame>
+  HELP.default = <<STR
+Usage:
+     ebooks help <command>

-Creates a new skeleton repository defining a template bot in
-the current working directory specified by <reponame>.
+     ebooks new <reponame>
+     ebooks auth
+     ebooks consume <corpus_path> [corpus_path2] [...]
+     ebooks consume-all <corpus_path> [corpus_path2] [...]
+     ebooks gen <model_path> [input]
+     ebooks archive <username> [path]
+     ebooks tweet <model_path> <botname>
 STR

+  def self.help(command=nil)
+    if command.nil?
+      log HELP.default
+    else
+      log HELP[command].gsub(/^ {4}/, '')
+    end
+  end
+
+  HELP.new = <<-STR
+    Usage: ebooks new <reponame>
+
+    Creates a new skeleton repository defining a template bot in
+    the current working directory specified by <reponame>.
+  STR
+
+  def self.new(reponame)
    if reponame.nil?
-      log usage
-      exit
+      help :new
+      exit 1
    end

    path = "./#{reponame}"

    if File.exists?(path)
      log "#{path} already exists. Please remove if you want to recreate."
-      exit
+      exit 1
    end

-    FileUtils.cp_r(SKELETON_PATH, path)
+    FileUtils.cp_r(Ebooks::SKELETON_PATH, path)

    File.open(File.join(path, 'bots.rb'), 'w') do |f|
-      template = File.read(File.join(SKELETON_PATH, 'bots.rb'))
+      template = File.read(File.join(Ebooks::SKELETON_PATH, 'bots.rb'))
      f.write(template.gsub("{{BOT_NAME}}", reponame))
    end

+    File.open(File.join(path, 'Gemfile'), 'w') do |f|
+      template = File.read(File.join(Ebooks::SKELETON_PATH, 'Gemfile'))
+      f.write(template.gsub("{{RUBY_VERSION}}", RUBY_VERSION))
+    end
+
    log "New twitter_ebooks app created at #{reponame}"
  end

+  HELP.consume = <<-STR
+    Usage: ebooks consume <corpus_path> [corpus_path2] [...]
+
+    Processes some number of text files or json tweet corpuses
+    into usable models. These will be output at model/<name>.model
+  STR
+
  def self.consume(pathes)
-    usage = <<STR
-Usage: ebooks consume <corpus_path> [corpus_path2] [...]
-
-Processes some number of text files or json tweet corpuses
-into usable models. These will be output at model/<name>.model
-STR
-
    if pathes.empty?
-      log usage
-      exit
+      help :consume
+      exit 1
    end

    pathes.each do |path|
@ -57,50 +88,43 @@ STR
      shortname = filename.split('.')[0..-2].join('.')

      outpath = File.join(APP_PATH, 'model', "#{shortname}.model")
-      Model.consume(path).save(outpath)
+      Ebooks::Model.consume(path).save(outpath)
      log "Corpus consumed to #{outpath}"
    end
  end

+  HELP.consume_all = <<-STR
+    Usage: ebooks consume-all <name> <corpus_path> [corpus_path2] [...]
+
+    Processes some number of text files or json tweet corpuses
+    into one usable model. It will be output at model/<name>.model
+  STR
+
  def self.consume_all(name, paths)
-    usage = <<STR
-Usage: ebooks consume-all <name> <corpus_path> [corpus_path2] [...]
-
-Processes some number of text files or json tweet corpuses
-into one usable model. It will be output at model/<name>.model
-STR
-
    if paths.empty?
-      log usage
-      exit
+      help :consume_all
+      exit 1
    end

    outpath = File.join(APP_PATH, 'model', "#{name}.model")
-    #pathes.each do |path|
-    #  filename = File.basename(path)
-    #  shortname = filename.split('.')[0..-2].join('.')
-    #
-    #  outpath = File.join(APP_PATH, 'model', "#{shortname}.model")
-    #  Model.consume(path).save(outpath)
-    #  log "Corpus consumed to #{outpath}"
-    #end
-    Model.consume_all(paths).save(outpath)
+    Ebooks::Model.consume_all(paths).save(outpath)
    log "Corpuses consumed to #{outpath}"
  end

-  def self.gen(model_path, input)
-    usage = <<STR
-Usage: ebooks gen <model_path> [input]
+  HELP.gen = <<-STR
+    Usage: ebooks gen <model_path> [input]

-Make a test tweet from the processed model at <model_path>.
-Will respond to input if provided.
-STR
+    Make a test tweet from the processed model at <model_path>.
+    Will respond to input if provided.
+  STR
+
+  def self.gen(model_path, input)
    if model_path.nil?
-      log usage
-      exit
+      help :gen
+      exit 1
    end

-    model = Model.load(model_path)
+    model = Ebooks::Model.load(model_path)
    if input && !input.empty?
      puts "@cmd " + model.make_response(input, 135)
    else
@ -108,81 +132,186 @@ STR
    end
  end

-  def self.score(model_path, input)
-    usage = <<STR
-Usage: ebooks score <model_path> <input>
+  HELP.archive = <<-STR
+    Usage: ebooks archive <username> [outpath]

-Scores "interest" in some text input according to how
-well unique keywords match the model.
-STR
-    if model_path.nil? || input.nil?
-      log usage
-      exit
+    Downloads a json corpus of the <username>'s tweets.
+    Output defaults to corpus/<username>.json
+    Due to API limitations, this can only receive up to ~3000 tweets
+    into the past.
+  STR
+
+  def self.archive(username, outpath=nil)
+    if username.nil?
+      help :archive
+      exit 1
    end

-    model = Model.load(model_path)
-    model.score_interest(input)
+    Ebooks::Archive.new(username, outpath).sync
  end

-  def self.archive(username, outpath)
-    usage = <<STR
-Usage: ebooks archive <username> <outpath>
+  HELP.tweet = <<-STR
+    Usage: ebooks tweet <model_path> <botname>

-Downloads a json corpus of the <username>'s tweets to <outpath>.
-Due to API limitations, this can only receive up to ~3000 tweets
-into the past.
-STR
-
-    if username.nil? || outpath.nil?
-      log usage
-      exit
-    end
-
-    Archive.new(username, outpath).sync
-  end
+    Sends a public tweet from the specified bot using text
+    from the processed model at <model_path>.
+  STR

  def self.tweet(modelpath, botname)
-    usage = <<STR
-Usage: ebooks tweet <model_path> <botname>
-
-Sends a public tweet from the specified bot using text
-from the processed model at <model_path>.
-STR
-
    if modelpath.nil? || botname.nil?
-      log usage
-      exit
+      help :tweet
+      exit 1
    end

    load File.join(APP_PATH, 'bots.rb')
-    model = Model.load(modelpath)
+    model = Ebooks::Model.load(modelpath)
    statement = model.make_statement
-    log "@#{botname}: #{statement}"
-    bot = Bot.get(botname)
+    bot = Ebooks::Bot.get(botname)
    bot.configure
    bot.tweet(statement)
  end

-  def self.c
+  HELP.auth = <<-STR
+    Usage: ebooks auth
+
+    Authenticates your Twitter app for any account. By default, will
+    use the consumer key and secret from the first defined bot. You
+    can specify another by setting the CONSUMER_KEY and CONSUMER_SECRET
+    environment variables.
+  STR
+
+  def self.auth
+    consumer_key, consumer_secret = find_consumer
+    require 'oauth'
+
+    consumer = OAuth::Consumer.new(
+      consumer_key,
+      consumer_secret,
+      site: 'https://twitter.com/',
+      scheme: :header
+    )
+
+    request_token = consumer.get_request_token
+    auth_url = request_token.authorize_url()
+
+    pin = nil
+    loop do
+      log auth_url
+
+      log "Go to the above url and follow the prompts, then enter the PIN code here."
+      print "> "
+
+      pin = STDIN.gets.chomp
+
+      break unless pin.empty?
+    end
+
+    access_token = request_token.get_access_token(oauth_verifier: pin)
+
+    log "Account authorized successfully. Make sure to put these in your bots.rb!\n" +
+         "  access token: #{access_token.token}\n" +
+         "  access token secret: #{access_token.secret}"
+  end
+
+  HELP.console = <<-STR
+    Usage: ebooks c[onsole]
+
+    Starts an interactive ruby session with your bots loaded
+    and configured.
+  STR
+
+  def self.console
+    load_bots
+    require 'pry'; Ebooks.module_exec { pry }
+  end
+
+  HELP.start = <<-STR
+    Usage: ebooks s[tart] [botname]
+
+    Starts running bots. If botname is provided, only runs that bot.
+  STR
+
+  def self.start(botname=nil)
+    load_bots
+
+    if botname.nil?
+      bots = Ebooks::Bot.all
+    else
+      bots = Ebooks::Bot.all.select { |bot| bot.username == botname }
+      if bots.empty?
+        log "Couldn't find a defined bot for @#{botname}!"
+        exit 1
+      end
+    end
+
+    threads = []
+    bots.each do |bot|
+      threads << Thread.new { bot.prepare }
+    end
+    threads.each(&:join)
+
+    threads = []
+    bots.each do |bot|
+      threads << Thread.new do
+        loop do
+          begin
+            bot.start
+          rescue Exception => e
+            bot.log e.inspect
+            puts e.backtrace.map { |s| "\t"+s }.join("\n")
+          end
+          bot.log "Sleeping before reconnect"
+          sleep 5
+        end
+      end
+    end
+    threads.each(&:join)
+  end
+
+  # Non-command methods
+
+  def self.find_consumer
+    if ENV['CONSUMER_KEY'] && ENV['CONSUMER_SECRET']
+      log "Using consumer details from environment variables:\n" +
+          "  consumer key: #{ENV['CONSUMER_KEY']}\n" +
+          "  consumer secret: #{ENV['CONSUMER_SECRET']}"
+      return [ENV['CONSUMER_KEY'], ENV['CONSUMER_SECRET']]
+    end
+
+    load_bots
+    consumer_key = nil
+    consumer_secret = nil
+    Ebooks::Bot.all.each do |bot|
+      if bot.consumer_key && bot.consumer_secret
+        consumer_key = bot.consumer_key
+        consumer_secret = bot.consumer_secret
+        log "Using consumer details from @#{bot.username}:\n" +
+            "  consumer key: #{bot.consumer_key}\n" +
+            "  consumer secret: #{bot.consumer_secret}\n"
+        return consumer_key, consumer_secret
+      end
+    end
+
+    if consumer_key.nil? || consumer_secret.nil?
+      log "Couldn't find any consumer details to auth an account with.\n" +
+          "Please either configure a bot with consumer_key and consumer_secret\n" +
+          "or provide the CONSUMER_KEY and CONSUMER_SECRET environment variables."
+      exit 1
+    end
+  end
+
+  def self.load_bots
    load 'bots.rb'
-    require 'pry'; pry
+
+    if Ebooks::Bot.all.empty?
+      puts "Couldn't find any bots! Please make sure bots.rb instantiates at least one bot."
+    end
  end

  def self.command(args)
-    usage = <<STR
-Usage:
-     ebooks new <reponame>
-     ebooks consume <corpus_path> [corpus_path2] [...]
-     ebooks consume-all <corpus_path> [corpus_path2] [...]
-     ebooks gen <model_path> [input]
-     ebooks score <model_path> <input>
-     ebooks archive <@user> <outpath>
-     ebooks tweet <model_path> <botname>
-STR
-
    if args.length == 0
-      log usage
-      exit
+      help
+      exit 1
    end

    case args[0]
@ -190,16 +319,21 @@ STR
    when "consume" then consume(args[1..-1])
    when "consume-all" then consume_all(args[1], args[2..-1])
    when "gen" then gen(args[1], args[2..-1].join(' '))
-    when "score" then score(args[1], args[2..-1].join(' '))
    when "archive" then archive(args[1], args[2])
    when "tweet" then tweet(args[1], args[2])
    when "jsonify" then jsonify(args[1..-1])
-    when "c" then c
+    when "auth" then auth
+    when "console" then console
+    when "c" then console
+    when "start" then start(args[1])
+    when "s" then start(args[1])
+    when "help" then help(args[1])
    else
-      log usage
+      log "No such command '#{args[0]}'"
+      help
      exit 1
    end
  end
 end

-Ebooks.command(ARGV)
+Ebooks::CLI.command(ARGV)
--- a/lib/twitter_ebooks.rb
+++ b/lib/twitter_ebooks.rb
@ -11,11 +11,11 @@ module Ebooks
  SKELETON_PATH = File.join(GEM_PATH, 'skeleton')
  TEST_PATH = File.join(GEM_PATH, 'test')
  TEST_CORPUS_PATH = File.join(TEST_PATH, 'corpus/0xabad1dea.tweets')
+  INTERIM = :interim
 end

 require 'twitter_ebooks/nlp'
 require 'twitter_ebooks/archive'
-require 'twitter_ebooks/markov'
 require 'twitter_ebooks/suffix'
 require 'twitter_ebooks/model'
 require 'twitter_ebooks/bot'
--- a/lib/twitter_ebooks/archive.rb
+++ b/lib/twitter_ebooks/archive.rb
@ -39,9 +39,14 @@ module Ebooks
      end
    end

-    def initialize(username, path, client=nil)
+    def initialize(username, path=nil, client=nil)
      @username = username
-      @path = path || "#{username}.json"
+      @path = path || "corpus/#{username}.json"
+
+      if File.directory?(@path)
+        @path = File.join(@path, "#{username}.json")
+      end
+
      @client = client || make_client

      if File.exists?(@path)
--- a/lib/twitter_ebooks/bot.rb
+++ b/lib/twitter_ebooks/bot.rb
@ -6,143 +6,91 @@ module Ebooks
  class ConfigurationError < Exception
  end

-  # We track how many unprompted interactions the bot has had with
-  # each user and start dropping them from mentions after two in a row
-  class UserInfo
-    attr_reader :username
-    attr_accessor :pesters_left
+  # Represents a single reply tree of tweets
+  class Conversation
+    attr_reader :last_update

-    def initialize(username)
-      @username = username
-      @pesters_left = 1
-    end
-
-    def can_pester?
-      @pesters_left > 0
-    end
-  end
-
-  # Represents a current "interaction state" with another user
-  class Interaction
-    attr_reader :userinfo, :received, :last_update
-
-    def initialize(userinfo)
-      @userinfo = userinfo
-      @received = []
+    # @param bot [Ebooks::Bot]
+    def initialize(bot)
+      @bot = bot
+      @tweets = []
      @last_update = Time.now
    end

-    def receive(tweet)
-      @received << tweet
+    # @param tweet [Twitter::Tweet] tweet to add
+    def add(tweet)
+      @tweets << tweet
      @last_update = Time.now
-      @userinfo.pesters_left += 2
    end

-    # Make an informed guess as to whether this user is a bot
-    # based on its username and reply speed
-    def is_bot?
-      if @received.length > 2
-        if (@received[-1].created_at - @received[-3].created_at) < 30
+    # Make an informed guess as to whether a user is a bot based
+    # on their behavior in this conversation
+    def is_bot?(username)
+      usertweets = @tweets.select { |t| t.user.screen_name == username }
+
+      if usertweets.length > 2
+        if (usertweets[-1].created_at - usertweets[-3].created_at) < 30
          return true
        end
      end

-      @userinfo.username.include?("ebooks")
+      username.include?("ebooks")
    end

-    def continue?
-      if is_bot?
-        true if @received.length < 2
-      else
-        true
-      end
+    # Figure out whether to keep this user in the reply prefix
+    # We want to avoid spamming non-participating users
+    def can_include?(username)
+      @tweets.length <= 4 ||
+        !@tweets[-4..-1].select { |t| t.user.screen_name == username }.empty?
    end
  end

-  class Bot
-    attr_accessor :consumer_key, :consumer_secret,
-                  :access_token, :access_token_secret
+  # Meta information about a tweet that we calculate for ourselves
+  class TweetMeta
+    # @return [Array<String>] usernames mentioned in tweet
+    attr_accessor :mentions
+    # @return [String] text of tweets with mentions removed
+    attr_accessor :mentionless
+    # @return [Array<String>] usernames to include in a reply
+    attr_accessor :reply_mentions
+    # @return [String] mentions to start reply with
+    attr_accessor :reply_prefix
+    # @return [Integer] available chars for reply
+    attr_accessor :limit

-    attr_reader :twitter, :stream, :thread
-
-    # Configuration
-    attr_accessor :username, :delay_range, :blacklist
-
-    @@all = [] # List of all defined bots
-    def self.all; @@all; end
-
-    def self.get(name)
-      all.find { |bot| bot.username == name }
-    end
-
-    def log(*args)
-      STDOUT.print "@#{@username}: " + args.map(&:to_s).join(' ') + "\n"
-      STDOUT.flush
-    end
-
-    def initialize(*args, &b)
-      @username ||= nil
-      @blacklist ||= []
-      @delay_range ||= 0
-
-      @users ||= {}
-      @interactions ||= {}
-      configure(*args, &b)
-
-      # Tweet ids we've already observed, to avoid duplication
-      @seen_tweets ||= {}
-    end
-
-    def userinfo(username)
-      @users[username] ||= UserInfo.new(username)
-    end
-
-    def interaction(username)
-      if @interactions[username] &&
-         Time.now - @interactions[username].last_update < 600
-        @interactions[username]
-      else
-        @interactions[username] = Interaction.new(userinfo(username))
-      end
-    end
-
-    def twitter
-      @twitter ||= Twitter::REST::Client.new do |config|
-        config.consumer_key = @consumer_key
-        config.consumer_secret = @consumer_secret
-        config.access_token = @access_token
-        config.access_token_secret = @access_token_secret
-      end
-    end
-
-    def stream
-      @stream ||= Twitter::Streaming::Client.new do |config|
-        config.consumer_key = @consumer_key
-        config.consumer_secret = @consumer_secret
-        config.access_token = @access_token
-        config.access_token_secret = @access_token_secret
-      end
-    end
-
-    # Calculate some meta information about a tweet relevant for replying
-    def calc_meta(ev)
-      meta = {}
-      meta[:mentions] = ev.attrs[:entities][:user_mentions].map { |x| x[:screen_name] }
+    # @return [Ebooks::Bot] associated bot
+    attr_accessor :bot
+    # @return [Twitter::Tweet] associated tweet
+    attr_accessor :tweet

+    # Check whether this tweet mentions our bot
+    # @return [Boolean]
+    def mentions_bot?
      # To check if this is someone talking to us, ensure:
      # - The tweet mentions list contains our username
      # - The tweet is not being retweeted by somebody else
      # - Or soft-retweeted by somebody else
-      meta[:mentions_bot] = meta[:mentions].map(&:downcase).include?(@username.downcase) && !ev.retweeted_status? && !ev.text.start_with?('RT ')
+      @mentions.map(&:downcase).include?(@bot.username.downcase) && !@tweet.retweeted_status? && !@tweet.text.start_with?('RT ')
+    end
+
+    # @param bot [Ebooks::Bot]
+    # @param ev [Twitter::Tweet]
+    def initialize(bot, ev)
+      @bot = bot
+      @tweet = ev
+
+      @mentions = ev.attrs[:entities][:user_mentions].map { |x| x[:screen_name] }

      # Process mentions to figure out who to reply to
-      reply_mentions = meta[:mentions].reject { |m| m.downcase == @username.downcase }
-      reply_mentions = reply_mentions.select { |username| userinfo(username).can_pester? }
-      meta[:reply_mentions] = [ev.user.screen_name] + reply_mentions
+      # i.e. not self and nobody who has seen too many secondary mentions
+      reply_mentions = @mentions.reject do |m|
+        username = m.downcase
+        username == @bot.username || !@bot.conversation(ev).can_include?(username)
+      end
+      @reply_mentions = ([ev.user.screen_name] + reply_mentions).uniq

-      meta[:reply_prefix] = meta[:reply_mentions].uniq.map { |m| '@'+m }.join(' ') + ' '
-
-      meta[:limit] = 140 - meta[:reply_prefix].length
+      @reply_prefix = @reply_mentions.map { |m| '@'+m }.join(' ') + ' '
+      @limit = 140 - @reply_prefix.length

      mless = ev.text
      begin
@ -155,12 +103,116 @@ module Ebooks
        p ev.text
        raise
      end
-      meta[:mentionless] = mless
+      @mentionless = mless
+    end
+  end

-      meta
+  class Bot
+    # @return [String] OAuth consumer key for a Twitter app
+    attr_accessor :consumer_key
+    # @return [String] OAuth consumer secret for a Twitter app
+    attr_accessor :consumer_secret
+    # @return [String] OAuth access token from `ebooks auth`
+    attr_accessor :access_token
+    # @return [String] OAuth access secret from `ebooks auth`
+    attr_accessor :access_token_secret
+    # @return [String] Twitter username of bot
+    attr_accessor :username
+    # @return [Array<String>] list of usernames to block on contact
+    attr_accessor :blacklist
+    # @return [Hash{String => Ebooks::Conversation}] maps tweet ids to their conversation contexts
+    attr_accessor :conversations
+    # @return [Range, Integer] range of seconds to delay in delay method
+    attr_accessor :delay_range
+
+    # @return [Array] list of all defined bots
+    def self.all; @@all ||= []; end
+
+    # Fetches a bot by username
+    # @param username [String]
+    # @return [Ebooks::Bot]
+    def self.get(username)
+      all.find { |bot| bot.username == username }
+    end
+
+    # Logs info to stdout in the context of this bot
+    def log(*args)
+      STDOUT.print "@#{@username}: " + args.map(&:to_s).join(' ') + "\n"
+      STDOUT.flush
+    end
+
+    # Initializes and configures bot
+    # @param args Arguments passed to configure method
+    # @param b Block to call with new bot
+    def initialize(username, &b)
+      @blacklist ||= []
+      @conversations ||= {}
+      # Tweet ids we've already observed, to avoid duplication
+      @seen_tweets ||= {}
+
+      @username = username
+      configure
+
+      b.call(self) unless b.nil?
+      Bot.all << self
+    end
+
+    # Find or create the conversation context for this tweet
+    # @param tweet [Twitter::Tweet]
+    # @return [Ebooks::Conversation]
+    def conversation(tweet)
+      conv = if tweet.in_reply_to_status_id?
+        @conversations[tweet.in_reply_to_status_id]
+      end
+
+      if conv.nil?
+        conv = @conversations[tweet.id] || Conversation.new(self)
+      end
+
+      if tweet.in_reply_to_status_id?
+        @conversations[tweet.in_reply_to_status_id] = conv
+      end
+      @conversations[tweet.id] = conv
+
+      # Expire any old conversations to prevent memory growth
+      @conversations.each do |k,v|
+        if v != conv && Time.now - v.last_update > 3600
+          @conversations.delete(k)
+        end
+      end
+
+      conv
+    end
+
+    # @return [Twitter::REST::Client] underlying REST client from twitter gem
+    def twitter
+      @twitter ||= Twitter::REST::Client.new do |config|
+        config.consumer_key = @consumer_key
+        config.consumer_secret = @consumer_secret
+        config.access_token = @access_token
+        config.access_token_secret = @access_token_secret
+      end
+    end
+
+    # @return [Twitter::Streaming::Client] underlying streaming client from twitter gem
+    def stream
+      @stream ||= Twitter::Streaming::Client.new do |config|
+        config.consumer_key = @consumer_key
+        config.consumer_secret = @consumer_secret
+        config.access_token = @access_token
+        config.access_token_secret = @access_token_secret
+      end
+    end
+
+    # Calculate some meta information about a tweet relevant for replying
+    # @param ev [Twitter::Tweet]
+    # @return [Ebooks::TweetMeta]
+    def meta(ev)
+      TweetMeta.new(self, ev)
    end

    # Receive an event from the twitter stream
+    # @param ev [Object] Twitter streaming event
    def receive_event(ev)
      if ev.is_a? Array # Initial array sent on first connection
        log "Online!"
@ -181,7 +233,7 @@ module Ebooks
        return unless ev.text # If it's not a text-containing tweet, ignore it
        return if ev.user.screen_name == @username # Ignore our own tweets

-        meta = calc_meta(ev)
+        meta = meta(ev)

        if blacklisted?(ev.user.screen_name)
          log "Blocking blacklisted user @#{ev.user.screen_name}"
@ -190,17 +242,18 @@ module Ebooks

        # Avoid responding to duplicate tweets
        if @seen_tweets[ev.id]
+          log "Not firing event for duplicate tweet #{ev.id}"
          return
        else
          @seen_tweets[ev.id] = true
        end

-        if meta[:mentions_bot]
+        if meta.mentions_bot?
          log "Mention from @#{ev.user.screen_name}: #{ev.text}"
-          interaction(ev.user.screen_name).receive(ev)
-          fire(:mention, ev, meta)
+          conversation(ev).add(ev)
+          fire(:mention, ev)
        else
-          fire(:timeline, ev, meta)
+          fire(:timeline, ev)
        end

      elsif ev.is_a?(Twitter::Streaming::DeletedTweet) ||
@ -211,7 +264,31 @@ module Ebooks
      end
    end

-    def start_stream
+    # Configures client and fires startup event
+    def prepare
+      # Sanity check
+      if @username.nil?
+        raise ConfigurationError, "bot username cannot be nil"
+      end
+
+      if @consumer_key.nil? || @consumer_key.empty? ||
+         @consumer_secret.nil? || @consumer_key.empty?
+        log "Missing consumer_key or consumer_secret. These details can be acquired by registering a Twitter app at https://apps.twitter.com/"
+        exit 1
+      end
+
+      if @access_token.nil? || @access_token.empty? ||
+         @access_token_secret.nil? || @access_token_secret.empty?
+        log "Missing access_token or access_token_secret. Please run `ebooks auth`."
+        exit 1
+      end
+
+      twitter
+      fire(:startup)
+    end
+
+    # Start running user event stream
+    def start
      log "starting tweet stream"

      stream.user do |ev|
@ -219,22 +296,9 @@ module Ebooks
      end
    end

-    def prepare
-      # Sanity check
-      if @username.nil?
-        raise ConfigurationError, "bot.username cannot be nil"
-      end
-
-      twitter
-      fire(:startup)
-    end
-
-    # Connects to tweetstream and opens event handlers for this bot
-    def start
-      start_stream
-    end
-
    # Fire an event
+    # @param event [Symbol] event to fire
+    # @param args arguments for event handler
    def fire(event, *args)
      handler = "on_#{event}".to_sym
      if respond_to? handler
@ -242,11 +306,17 @@ module Ebooks
      end
    end

-    def delay(&b)
-      time = @delay.to_a.sample unless @delay.is_a? Integer
+    # Delay an action for a variable period of time
+    # @param range [Range, Integer] range of seconds to choose for delay
+    def delay(range=@delay_range, &b)
+      time = range.to_a.sample unless range.is_a? Integer
      sleep time
+      b.call
    end

+    # Check if a username is blacklisted
+    # @param username [String]
+    # @return [Boolean]
    def blacklisted?(username)
      if @blacklist.include?(username)
        true
@ -256,46 +326,37 @@ module Ebooks
    end

    # Reply to a tweet or a DM.
+    # @param ev [Twitter::Tweet, Twitter::DirectMessage]
+    # @param text [String] contents of reply excluding reply_prefix
+    # @param opts [Hash] additional params to pass to twitter gem
    def reply(ev, text, opts={})
      opts = opts.clone

      if ev.is_a? Twitter::DirectMessage
-        return if blacklisted?(ev.sender.screen_name)
        log "Sending DM to @#{ev.sender.screen_name}: #{text}"
        twitter.create_direct_message(ev.sender.screen_name, text, opts)
      elsif ev.is_a? Twitter::Tweet
-        meta = calc_meta(ev)
+        meta = meta(ev)

-        if !interaction(ev.user.screen_name).continue?
+        if conversation(ev).is_bot?(ev.user.screen_name)
          log "Not replying to suspected bot @#{ev.user.screen_name}"
-          return
+          return false
        end

-        if !meta[:mentions_bot]
-          if !userinfo(ev.user.screen_name).can_pester?
-            log "Not replying: leaving @#{ev.user.screen_name} alone"
-            return
-          else
-            userinfo(ev.user.screen_name).pesters_left -= 1
-          end
-        end
-
-        log "Replying to @#{ev.user.screen_name} with: #{meta[:reply_prefix] + text}"
-        twitter.update(meta[:reply_prefix] + text, in_reply_to_status_id: ev.id)
+        log "Replying to @#{ev.user.screen_name} with: #{meta.reply_prefix + text}"
+        tweet = twitter.update(meta.reply_prefix + text, in_reply_to_status_id: ev.id)
+        conversation(tweet).add(tweet)
+        tweet
      else
        raise Exception("Don't know how to reply to a #{ev.class}")
      end
    end

+    # Favorite a tweet
+    # @param tweet [Twitter::Tweet]
    def favorite(tweet)
-      return if blacklisted?(tweet.user.screen_name)
      log "Favoriting @#{tweet.user.screen_name}: #{tweet.text}"

-      meta = calc_meta(tweet)
-      if !meta[:mentions_bot] && !userinfo(ev.user.screen_name).can_pester?
-        log "Not favoriting: leaving @#{ev.user.screen_name} alone"
-      end
-
      begin
        twitter.favorite(tweet.id)
      rescue Twitter::Error::Forbidden
@ -303,8 +364,9 @@ module Ebooks
      end
    end

+    # Retweet a tweet
+    # @param tweet [Twitter::Tweet]
    def retweet(tweet)
-      return if blacklisted?(tweet.user.screen_name)
      log "Retweeting @#{tweet.user.screen_name}: #{tweet.text}"

      begin
@ -314,21 +376,36 @@ module Ebooks
      end
    end

-    def follow(*args)
-      log "Following #{args}"
-      twitter.follow(*args)
+    # Follow a user
+    # @param user [String] username or user id
+    def follow(user, *args)
+      log "Following #{user}"
+      twitter.follow(user, *args)
    end

-    def tweet(*args)
-      log "Tweeting #{args.inspect}"
-      twitter.update(*args)
+    # Unfollow a user
+    # @param user [String] username or user id
+    def unfollow(user, *args)
+      log "Unfollowing #{user}"
+      twiter.unfollow(user, *args)
    end

+    # Tweet something
+    # @param text [String]
+    def tweet(text, *args)
+      log "Tweeting '#{text}'"
+      twitter.update(text, *args)
+    end
+
+    # Get a scheduler for this bot
+    # @return [Rufus::Scheduler]
    def scheduler
      @scheduler ||= Rufus::Scheduler.new
    end

-    # could easily just be *args however the separation keeps it clean.
+    # Tweet some text with an image
+    # @param txt [String]
+    # @param pic [String] filename
    def pictweet(txt, pic, *args)
      log "Tweeting #{txt.inspect} - #{pic} #{args}"
      twitter.update_with_media(txt, File.new(pic), *args)
--- a/lib/twitter_ebooks/markov.rb
+++ b/lib/twitter_ebooks/markov.rb
@ -1,82 +0,0 @@
-module Ebooks
-  # Special INTERIM token represents sentence boundaries
-  # This is so we can include start and end of statements in model
-  # Due to the way the sentence tokenizer works, can correspond
-  # to multiple actual parts of text (such as ^, $, \n and .?!)
-  INTERIM = :interim
-
-  # This is an ngram-based Markov model optimized to build from a
-  # tokenized sentence list without requiring too much transformation
-  class MarkovModel
-    def self.build(sentences)
-      MarkovModel.new.consume(sentences)
-    end
-
-    def consume(sentences)
-      # These models are of the form ngram => [[sentence_pos, token_pos] || INTERIM, ...]
-      # We map by both bigrams and unigrams so we can fall back to the latter in
-      # cases where an input bigram is unavailable, such as starting a sentence
-      @sentences = sentences
-      @unigrams = {}
-      @bigrams = {}
-
-      sentences.each_with_index do |tokens, i|
-        last_token = INTERIM
-        tokens.each_with_index do |token, j|
-          @unigrams[last_token] ||= []
-          @unigrams[last_token] << [i, j]
-
-          @bigrams[last_token] ||= {}
-          @bigrams[last_token][token] ||= []
-
-          if j == tokens.length-1 # Mark sentence endings
-            @unigrams[token] ||= []
-            @unigrams[token] << INTERIM
-            @bigrams[last_token][token] << INTERIM
-          else
-            @bigrams[last_token][token] << [i, j+1]
-          end
-
-          last_token = token
-        end
-      end
-
-      self
-    end
-
-    def find_token(index)
-      if index == INTERIM
-        INTERIM
-      else
-        @sentences[index[0]][index[1]]
-      end
-    end
-
-    def chain(tokens)
-      if tokens.length == 1
-        matches = @unigrams[tokens[-1]]
-      else
-        matches = @bigrams[tokens[-2]][tokens[-1]]
-        matches = @unigrams[tokens[-1]] if matches.length < 2
-      end
-
-      if matches.empty?
-        # This should never happen unless a strange token is
-        # supplied from outside the dataset
-        raise ArgumentError, "Unable to continue chain for: #{tokens.inspect}"
-      end
-
-      next_token = find_token(matches.sample)
-
-      if next_token == INTERIM # We chose to end the sentence
-        return tokens
-      else
-        return chain(tokens + [next_token])
-      end
-    end
-
-    def generate
-      NLP.reconstruct(chain([INTERIM]))
-    end
-  end
-end
--- a/lib/twitter_ebooks/model.rb
+++ b/lib/twitter_ebooks/model.rb
@ -8,16 +8,41 @@ require 'csv'

 module Ebooks
  class Model
-    attr_accessor :hash, :tokens, :sentences, :mentions, :keywords
+    # @return [Array<String>]
+    # An array of unique tokens. This is the main source of actual strings
+    # in the model. Manipulation of a token is done using its index
+    # in this array, which we call a "tiki"
+    attr_accessor :tokens

-    def self.consume(txtpath)
-      Model.new.consume(txtpath)
+    # @return [Array<Array<Integer>>]
+    # Sentences represented by arrays of tikis
+    attr_accessor :sentences
+
+    # @return [Array<Array<Integer>>]
+    # Sentences derived from Twitter mentions
+    attr_accessor :mentions
+
+    # @return [Array<String>]
+    # The top 200 most important keywords, in descending order
+    attr_accessor :keywords
+
+    # Generate a new model from a corpus file
+    # @param path [String]
+    # @return [Ebooks::Model]
+    def self.consume(path)
+      Model.new.consume(path)
    end

+    # Generate a new model from multiple corpus files
+    # @param paths [Array<String>]
+    # @return [Ebooks::Model]
    def self.consume_all(paths)
      Model.new.consume_all(paths)
    end

+    # Load a saved model
+    # @param path [String]
+    # @return [Ebooks::Model]
    def self.load(path)
      model = Model.new
      model.instance_eval do
@ -30,6 +55,8 @@ module Ebooks
      model
    end

+    # Save model to a file
+    # @param path [String]
    def save(path)
      File.open(path, 'wb') do |f|
        f.write(Marshal.dump({
@ -43,19 +70,22 @@ module Ebooks
    end

    def initialize
-      # This is the only source of actual strings in the model. It is
-      # an array of unique tokens. Manipulation of a token is mostly done
-      # using its index in this array, which we call a "tiki"
      @tokens = []

      # Reverse lookup tiki by token, for faster generation
      @tikis = {}
    end

+    # Reverse lookup a token index from a token
+    # @param token [String]
+    # @return [Integer]
    def tikify(token)
      @tikis[token] or (@tokens << token and @tikis[token] = @tokens.length-1)
    end

+    # Convert a body of text into arrays of tikis
+    # @param text [String]
+    # @return [Array<Array<Integer>>]
    def mass_tikify(text)
      sentences = NLP.sentences(text)

@ -69,9 +99,10 @@ module Ebooks
      end
    end

+    # Consume a corpus into this model
+    # @param path [String]
    def consume(path)
      content = File.read(path, :encoding => 'utf-8')
-      @hash = Digest::MD5.hexdigest(content)

      if path.split('.')[-1] == "json"
        log "Reading json corpus from #{path}"
@ -94,6 +125,8 @@ module Ebooks
      consume_lines(lines)
    end

+    # Consume a sequence of lines
+    # @param lines [Array<String>]
    def consume_lines(lines)
      log "Removing commented lines and sorting mentions"

@ -126,11 +159,12 @@ module Ebooks
      self
    end

+    # Consume multiple corpuses into this model
+    # @param paths [Array<String>]
    def consume_all(paths)
      lines = []
      paths.each do |path|
        content = File.read(path, :encoding => 'utf-8')
-        @hash = Digest::MD5.hexdigest(content)

        if path.split('.')[-1] == "json"
          log "Reading json corpus from #{path}"
@ -156,25 +190,26 @@ module Ebooks
      consume_lines(lines)
    end

-    def fix(tweet)
-      # This seems to require an external api call
-      #begin
-      #  fixer = NLP.gingerice.parse(tweet)
-      #  log fixer if fixer['corrections']
-      #  tweet = fixer['result']
-      #rescue Exception => e
-      #  log e.message
-      #  log e.backtrace
-      #end
-
-      NLP.htmlentities.decode tweet
+    # Correct encoding issues in generated text
+    # @param text [String]
+    # @return [String]
+    def fix(text)
+      NLP.htmlentities.decode text
    end

+    # Check if an array of tikis comprises a valid tweet
+    # @param tikis [Array<Integer>]
+    # @param limit Integer how many chars we have left
    def valid_tweet?(tikis, limit)
      tweet = NLP.reconstruct(tikis, @tokens)
      tweet.length <= limit && !NLP.unmatched_enclosers?(tweet)
    end

+    # Generate some text
+    # @param limit [Integer] available characters
+    # @param generator [SuffixGenerator, nil]
+    # @param retry_limit [Integer] how many times to retry on duplicates
+    # @return [String]
    def make_statement(limit=140, generator=nil, retry_limit=10)
      responding = !generator.nil?
      generator ||= SuffixGenerator.build(@sentences)
@ -209,12 +244,17 @@ module Ebooks
    end

    # Test if a sentence has been copied verbatim from original
-    def verbatim?(tokens)
-      @sentences.include?(tokens) || @mentions.include?(tokens)
+    # @param tikis [Array<Integer>]
+    # @return [Boolean]
+    def verbatim?(tikis)
+      @sentences.include?(tikis) || @mentions.include?(tikis)
    end

-    # Finds all relevant tokenized sentences to given input by
+    # Finds relevant and slightly relevant tokenized sentences to input
    # comparing non-stopword token overlaps
+    # @param sentences [Array<Array<Integer>>]
+    # @param input [String]
+    # @return [Array<Array<Array<Integer>>, Array<Array<Integer>>>]
    def find_relevant(sentences, input)
      relevant = []
      slightly_relevant = []
@ -235,6 +275,10 @@ module Ebooks

    # Generates a response by looking for related sentences
    # in the corpus and building a smaller generator from these
+    # @param input [String]
+    # @param limit [Integer] characters available for response
+    # @param sentences [Array<Array<Integer>>]
+    # @return [String]
    def make_response(input, limit=140, sentences=@mentions)
      # Prefer mentions
      relevant, slightly_relevant = find_relevant(sentences, input)
--- a/lib/twitter_ebooks/nlp.rb
+++ b/lib/twitter_ebooks/nlp.rb
@ -12,31 +12,35 @@ module Ebooks
    # Some of this stuff is pretty heavy and we don't necessarily need
    # to be using it all of the time

+    # Lazily loads an array of stopwords
+    # Stopwords are common English words that should often be ignored
+    # @return [Array<String>]
    def self.stopwords
      @stopwords ||= File.read(File.join(DATA_PATH, 'stopwords.txt')).split
    end

+    # Lazily loads an array of known English nouns
+    # @return [Array<String>]
    def self.nouns
      @nouns ||= File.read(File.join(DATA_PATH, 'nouns.txt')).split
    end

+    # Lazily loads an array of known English adjectives
+    # @return [Array<String>]
    def self.adjectives
      @adjectives ||= File.read(File.join(DATA_PATH, 'adjectives.txt')).split
    end

-    # POS tagger
+    # Lazily load part-of-speech tagging library
+    # This can determine whether a word is being used as a noun/adjective/verb
+    # @return [EngTagger]
    def self.tagger
      require 'engtagger'
      @tagger ||= EngTagger.new
    end

-    # Gingerice text correction service
-    def self.gingerice
-      require 'gingerice'
-      Gingerice::Parser.new # No caching for this one
-    end
-
-    # For decoding html entities
+    # Lazily load HTML entity decoder
+    # @return [HTMLEntities]
    def self.htmlentities
      require 'htmlentities'
      @htmlentities ||= HTMLEntities.new
@ -44,7 +48,9 @@ module Ebooks

    ### Utility functions

-    # We don't really want to deal with all this weird unicode punctuation
+    # Normalize some strange unicode punctuation variants
+    # @param text [String]
+    # @return [String]
    def self.normalize(text)
      htmlentities.decode text.gsub('“', '"').gsub('”', '"').gsub('’', "'").gsub('…', '...')
    end
@ -53,6 +59,8 @@ module Ebooks
    # We use ad hoc approach because fancy libraries do not deal
    # especially well with tweet formatting, and we can fake solving
    # the quote problem during generation
+    # @param text [String]
+    # @return [Array<String>]
    def self.sentences(text)
      text.split(/\n+|(?<=[.?!])\s+/)
    end
@ -60,15 +68,23 @@ module Ebooks
    # Split a sentence into word-level tokens
    # As above, this is ad hoc because tokenization libraries
    # do not behave well wrt. things like emoticons and timestamps
+    # @param sentence [String]
+    # @return [Array<String>]
    def self.tokenize(sentence)
      regex = /\s+|(?<=[#{PUNCTUATION}]\s)(?=[a-zA-Z])|(?<=[a-zA-Z])(?=[#{PUNCTUATION}]+\s)/
      sentence.split(regex)
    end

+    # Get the 'stem' form of a word e.g. 'cats' -> 'cat'
+    # @param word [String]
+    # @return [String]
    def self.stem(word)
      Stemmer::stem_word(word.downcase)
    end

+    # Use highscore gem to find interesting keywords in a corpus
+    # @param text [String]
+    # @return [Highscore::Keywords]
    def self.keywords(text)
      # Preprocess to remove stopwords (highscore's blacklist is v. slow)
      text = NLP.tokenize(text).reject { |t| stopword?(t) }.join(' ')
@ -90,7 +106,10 @@ module Ebooks
      text.keywords
    end

-    # Takes a list of tokens and builds a nice-looking sentence
+    # Builds a proper sentence from a list of tikis
+    # @param tikis [Array<Integer>]
+    # @param tokens [Array<String>]
+    # @return [String]
    def self.reconstruct(tikis, tokens)
      text = ""
      last_token = nil
@ -105,6 +124,9 @@ module Ebooks
    end

    # Determine if we need to insert a space between two tokens
+    # @param token1 [String]
+    # @param token2 [String]
+    # @return [Boolean]
    def self.space_between?(token1, token2)
      p1 = self.punctuation?(token1)
      p2 = self.punctuation?(token2)
@ -119,10 +141,16 @@ module Ebooks
      end
    end

+    # Is this token comprised of punctuation?
+    # @param token [String]
+    # @return [Boolean]
    def self.punctuation?(token)
      (token.chars.to_set - PUNCTUATION.chars.to_set).empty?
    end

+    # Is this token a stopword?
+    # @param token [String]
+    # @return [Boolean]
    def self.stopword?(token)
      @stopword_set ||= stopwords.map(&:downcase).to_set
      @stopword_set.include?(token.downcase)
@ -130,7 +158,9 @@ module Ebooks

    # Determine if a sample of text contains unmatched brackets or quotes
    # This is one of the more frequent and noticeable failure modes for
-    # the markov generator; we can just tell it to retry
+    # the generator; we can just tell it to retry
+    # @param text [String]
+    # @return [Boolean]
    def self.unmatched_enclosers?(text)
      enclosers = ['**', '""', '()', '[]', '``', "''"]
      enclosers.each do |pair|
@ -153,10 +183,13 @@ module Ebooks
    end

    # Determine if a2 is a subsequence of a1
+    # @param a1 [Array]
+    # @param a2 [Array]
+    # @return [Boolean]
    def self.subseq?(a1, a2)
-      a1.each_index.find do |i|
+      !a1.each_index.find do |i|
        a1[i...i+a2.length] == a2
-      end
+      end.nil?
    end
  end
 end
--- a/lib/twitter_ebooks/suffix.rb
+++ b/lib/twitter_ebooks/suffix.rb
@ -1,11 +1,14 @@
 # encoding: utf-8

 module Ebooks
-  # This generator uses data identical to the markov model, but
+  # This generator uses data identical to a markov model, but
  # instead of making a chain by looking up bigrams it uses the
  # positions to randomly replace suffixes in one sentence with
  # matching suffixes in another
  class SuffixGenerator
+    # Build a generator from a corpus of tikified sentences
+    # @param sentences [Array<Array<Integer>>]
+    # @return [SuffixGenerator]
    def self.build(sentences)
      SuffixGenerator.new(sentences)
    end
@ -39,6 +42,11 @@ module Ebooks
      self
    end

+
+    # Generate a recombined sequence of tikis
+    # @param passes [Integer] number of times to recombine
+    # @param n [Symbol] :unigrams or :bigrams (affects how conservative the model is)
+    # @return [Array<Integer>]
    def generate(passes=5, n=:unigrams)
      index = rand(@sentences.length)
      tikis = @sentences[index]
--- a/lib/twitter_ebooks/version.rb
+++ b/lib/twitter_ebooks/version.rb
@ -1,3 +1,3 @@
 module Ebooks
-  VERSION = "2.3.2"
+  VERSION = "3.0.0"
 end
--- a/skeleton/Gemfile
+++ b/skeleton/Gemfile
@ -1,4 +1,4 @@
 source 'http://rubygems.org'
-ruby '1.9.3'
+ruby '{{RUBY_VERSION}}'

 gem 'twitter_ebooks'
--- a/skeleton/Procfile
+++ b/skeleton/Procfile
@ -1 +1 @@
-worker: ruby run.rb start
+worker: ebooks start
--- a/skeleton/bots.rb
+++ b/skeleton/bots.rb
@ -1,42 +1,55 @@
-#!/usr/bin/env ruby
-
 require 'twitter_ebooks'

 # This is an example bot definition with event handlers commented out
-# You can define as many of these as you like; they will run simultaneously
+# You can define and instantiate as many bots as you like

-Ebooks::Bot.new("{{BOT_NAME}}") do |bot|
-  # Consumer details come from registering an app at https://dev.twitter.com/
-  # OAuth details can be fetched with https://github.com/marcel/twurl
-  bot.consumer_key = "" # Your app consumer key
-  bot.consumer_secret = "" # Your app consumer secret
-  bot.oauth_token = "" # Token connecting the app to this account
-  bot.oauth_token_secret = "" # Secret connecting the app to this account
+class MyBot < Ebooks::Bot
+  # Configuration here applies to all MyBots
+  def configure
+    # Consumer details come from registering an app at https://dev.twitter.com/
+    # Once you have consumer details, use "ebooks auth" for new access tokens
+    self.consumer_key = '' # Your app consumer key
+    self.consumer_secret = '' # Your app consumer secret

-  bot.on_message do |dm|
+    # Users to block instead of interacting with
+    self.blacklist = ['tnietzschequote']
+
+    # Range in seconds to randomize delay when bot.delay is called
+    self.delay_range = 1..6
+  end
+
+  def on_startup
+    scheduler.every '24h' do
+      # Tweet something every 24 hours
+      # See https://github.com/jmettraux/rufus-scheduler
+      # bot.tweet("hi")
+      # bot.pictweet("hi", "cuteselfie.jpg")
+    end
+  end
+
+  def on_message(dm)
    # Reply to a DM
    # bot.reply(dm, "secret secrets")
  end

-  bot.on_follow do |user|
+  def on_follow(user)
    # Follow a user back
    # bot.follow(user[:screen_name])
  end

-  bot.on_mention do |tweet, meta|
+  def on_mention(tweet)
    # Reply to a mention
-    # bot.reply(tweet, meta[:reply_prefix] + "oh hullo")
+    # bot.reply(tweet, meta(tweet)[:reply_prefix] + "oh hullo")
  end

-  bot.on_timeline do |tweet, meta|
+  def on_timeline(tweet)
    # Reply to a tweet in the bot's timeline
-    # bot.reply(tweet, meta[:reply_prefix] + "nice tweet")
-  end
-
-  bot.scheduler.every '24h' do
-    # Tweet something every 24 hours
-    # See https://github.com/jmettraux/rufus-scheduler
-    # bot.tweet("hi")
-    # bot.pictweet("hi", "cuteselfie.jpg", ":possibly_sensitive => true")
+    # bot.reply(tweet, meta(tweet)[:reply_prefix] + "nice tweet")
  end
 end
+
+# Make a MyBot and attach it to an account
+MyBot.new("{{BOT_NAME}}") do |bot|
+  bot.access_token = "" # Token connecting the app to this account
+  bot.access_token_secret = "" # Secret connecting the app to this account
+end
--- a/skeleton/run.rb
+++ b/skeleton/run.rb
@ -1,9 +0,0 @@
-#!/usr/bin/env ruby
-
-require_relative 'bots'
-
-EM.run do
- Ebooks::Bot.all.each do |bot|
-    bot.start
-  end
-end
--- a/spec/bot_spec.rb
+++ b/spec/bot_spec.rb
@ -3,13 +3,10 @@ require 'memory_profiler'
 require 'tempfile'
 require 'timecop'

-def Process.rss; `ps -o rss= -p #{Process.pid}`.chomp.to_i; end
-
 class TestBot < Ebooks::Bot
  attr_accessor :twitter

  def configure
-    self.username = "test_ebooks"
  end

  def on_direct_message(dm)
@ -17,7 +14,7 @@ class TestBot < Ebooks::Bot
  end

  def on_mention(tweet, meta)
-    reply tweet, "echo: #{meta[:mentionless]}"
+    reply tweet, "echo: #{meta.mentionless}"
  end

  def on_timeline(tweet, meta)
@ -43,10 +40,11 @@ module Ebooks::Test
  # Creates a mock tweet
  # @param username User sending the tweet
  # @param text Tweet content
-  def mock_tweet(username, text)
+  def mock_tweet(username, text, extra={})
    mentions = text.split.find_all { |x| x.start_with?('@') }
-    Twitter::Tweet.new(
+    tweet = Twitter::Tweet.new({
      id: twitter_id,
+      in_reply_to_status_id: 'mock-link',
      user: { id: twitter_id, screen_name: username },
      text: text,
      created_at: Time.now.to_s,
@ -56,29 +54,36 @@ module Ebooks::Test
            indices: [text.index(m), text.index(m)+m.length] }
        }
      }
-    )
+    }.merge!(extra))
+    tweet
+  end
+
+  def twitter_spy(bot)
+    twitter = spy("twitter")
+    allow(twitter).to receive(:update).and_return(mock_tweet(bot.username, "test tweet"))
+    twitter
  end

  def simulate(bot, &b)
-    bot.twitter = spy("twitter")
+    bot.twitter = twitter_spy(bot)
    b.call
  end

  def expect_direct_message(bot, content)
    expect(bot.twitter).to have_received(:create_direct_message).with(anything(), content, {})
-    bot.twitter = spy("twitter")
+    bot.twitter = twitter_spy(bot)
  end

  def expect_tweet(bot, content)
    expect(bot.twitter).to have_received(:update).with(content, anything())
-    bot.twitter = spy("twitter")
+    bot.twitter = twitter_spy(bot)
  end
 end


 describe Ebooks::Bot do
  include Ebooks::Test
-  let(:bot) { TestBot.new }
+  let(:bot) { TestBot.new('test_ebooks') }

  before { Timecop.freeze }
  after { Timecop.return }
@ -104,6 +109,20 @@ describe Ebooks::Bot do
    end
  end

+  it "links tweets to conversations correctly" do
+    tweet1 = mock_tweet("m1sp", "tweet 1", id: 1, in_reply_to_status_id: nil)
+
+    tweet2 = mock_tweet("m1sp", "tweet 2", id: 2, in_reply_to_status_id: 1)
+
+    tweet3 = mock_tweet("m1sp", "tweet 3", id: 3, in_reply_to_status_id: nil)
+
+    bot.conversation(tweet1).add(tweet1)
+    expect(bot.conversation(tweet2)).to eq(bot.conversation(tweet1))
+
+    bot.conversation(tweet2).add(tweet2)
+    expect(bot.conversation(tweet3)).to_not eq(bot.conversation(tweet2))
+  end
+
  it "stops mentioning people after a certain limit" do
    simulate(bot) do
      bot.receive_event(mock_tweet("spammer", "@test_ebooks @m1sp 1"))
--- a/test/corpus/0xabad1dea.tweets
+++ b/test/corpus/0xabad1dea.tweets
--- a/test/keywords.rb
+++ b/test/keywords.rb
@ -1,18 +0,0 @@
-#!/usr/bin/env ruby
-# encoding: utf-8
-
-require 'twitter_ebooks'
-require 'minitest/autorun'
-require 'benchmark'
-
-module Ebooks
-  class TestKeywords < Minitest::Test
-    corpus = NLP.normalize(File.read(ARGV[0]))
-    puts "Finding and ranking keywords"
-    puts Benchmark.measure {
-      NLP.keywords(corpus).top(50).each do |keyword|
-        puts "#{keyword.text} #{keyword.weight}"
-      end
-    }
-  end
-end
--- a/test/tokenize.rb
+++ b/test/tokenize.rb
@ -1,18 +0,0 @@
-#!/usr/bin/env ruby
-# encoding: utf-8
-
-require 'twitter_ebooks'
-require 'minitest/autorun'
-
-module Ebooks
-  class TestTokenize < Minitest::Test
-    corpus = NLP.normalize(File.read(TEST_CORPUS_PATH))
-    sents = NLP.sentences(corpus).sample(10)
-
-    NLP.sentences(corpus).sample(10).each do |sent|
-      p sent
-      p NLP.tokenize(sent)
-      puts
-    end
-  end
-end
--- a/twitter_ebooks.gemspec
+++ b/twitter_ebooks.gemspec
@ -18,8 +18,9 @@ Gem::Specification.new do |gem|
  gem.add_development_dependency 'rspec'
  gem.add_development_dependency 'rspec-mocks'
  gem.add_development_dependency 'memory_profiler'
-  gem.add_development_dependency 'pry-byebug'
  gem.add_development_dependency 'timecop'
+  gem.add_development_dependency 'pry-byebug'
+  gem.add_development_dependency 'yard'

  gem.add_runtime_dependency 'twitter', '~> 5.0'
  gem.add_runtime_dependency 'simple_oauth'
@ -30,4 +31,5 @@ Gem::Specification.new do |gem|
  gem.add_runtime_dependency 'engtagger'
  gem.add_runtime_dependency 'fast-stemmer'
  gem.add_runtime_dependency 'highscore'
+  gem.add_runtime_dependency 'pry'
 end