Merge branch '3.0'
This commit is contained in:
commit
56aadea555
20 changed files with 738 additions and 15203 deletions
2
.gitignore
vendored
2
.gitignore
vendored
|
@ -1,3 +1,5 @@
|
|||
.*.swp
|
||||
Gemfile.lock
|
||||
pkg
|
||||
.yardoc
|
||||
doc
|
||||
|
|
79
README.md
79
README.md
|
@ -4,8 +4,16 @@
|
|||
[](https://travis-ci.org/mispy/twitter_ebooks)
|
||||
[](https://gemnasium.com/mispy/twitter_ebooks)
|
||||
|
||||
A framework for building interactive twitterbots which respond to mentions/DMs. twitter_ebooks tries to be a good friendly bot citizen by avoiding infinite conversations and spamming people, so you only have to write the interesting parts.
|
||||
|
||||
Rewrite of my twitter\_ebooks code. While the original was solely a tweeting Markov generator, this framework helps you build any kind of interactive twitterbot which responds to mentions/DMs. See [ebooks\_example](https://github.com/mispy/ebooks_example) for an example of a full bot.
|
||||
## New in 3.0
|
||||
|
||||
- Bots run in their own threads (no eventmachine), and startup is parallelized
|
||||
- Bots start with `ebooks start`, and no longer die on unhandled exceptions
|
||||
- `ebooks auth` command will create new access tokens, for running multiple bots
|
||||
- `ebooks console` starts a ruby interpreter with bots loaded (see Ebooks::Bot.all)
|
||||
- Replies are slightly rate-limited to prevent infinite bot convos
|
||||
- Non-participating users in a mention chain will be dropped after a few tweets
|
||||
|
||||
## Installation
|
||||
|
||||
|
@ -21,53 +29,63 @@ Run `ebooks new <reponame>` to generate a new repository containing a sample bot
|
|||
|
||||
``` ruby
|
||||
# This is an example bot definition with event handlers commented out
|
||||
# You can define as many of these as you like; they will run simultaneously
|
||||
# You can define and instantiate as many bots as you like
|
||||
|
||||
Ebooks::Bot.new("abby_ebooks") do |bot|
|
||||
# Consumer details come from registering an app at https://dev.twitter.com/
|
||||
# OAuth details can be fetched with https://github.com/marcel/twurl
|
||||
bot.consumer_key = "" # Your app consumer key
|
||||
bot.consumer_secret = "" # Your app consumer secret
|
||||
bot.oauth_token = "" # Token connecting the app to this account
|
||||
bot.oauth_token_secret = "" # Secret connecting the app to this account
|
||||
class MyBot < Ebooks::Bot
|
||||
# Configuration here applies to all MyBots
|
||||
def configure
|
||||
# Consumer details come from registering an app at https://dev.twitter.com/
|
||||
# Once you have consumer details, use "ebooks auth" for new access tokens
|
||||
self.consumer_key = '' # Your app consumer key
|
||||
self.consumer_secret = '' # Your app consumer secret
|
||||
|
||||
bot.on_startup do
|
||||
# Run some startup task
|
||||
# puts "I'm ready!"
|
||||
# Users to block instead of interacting with
|
||||
self.blacklist = ['tnietzschequote']
|
||||
|
||||
# Range in seconds to randomize delay when bot.delay is called
|
||||
self.delay_range = 1..6
|
||||
end
|
||||
|
||||
bot.on_message do |dm|
|
||||
def on_startup
|
||||
scheduler.every '24h' do
|
||||
# Tweet something every 24 hours
|
||||
# See https://github.com/jmettraux/rufus-scheduler
|
||||
# bot.tweet("hi")
|
||||
# bot.pictweet("hi", "cuteselfie.jpg")
|
||||
end
|
||||
end
|
||||
|
||||
def on_message(dm)
|
||||
# Reply to a DM
|
||||
# bot.reply(dm, "secret secrets")
|
||||
end
|
||||
|
||||
bot.on_follow do |user|
|
||||
def on_follow(user)
|
||||
# Follow a user back
|
||||
# bot.follow(user[:screen_name])
|
||||
end
|
||||
|
||||
bot.on_mention do |tweet, meta|
|
||||
def on_mention(tweet)
|
||||
# Reply to a mention
|
||||
# bot.reply(tweet, meta[:reply_prefix] + "oh hullo")
|
||||
# bot.reply(tweet, meta(tweet)[:reply_prefix] + "oh hullo")
|
||||
end
|
||||
|
||||
bot.on_timeline do |tweet, meta|
|
||||
def on_timeline(tweet)
|
||||
# Reply to a tweet in the bot's timeline
|
||||
# bot.reply(tweet, meta[:reply_prefix] + "nice tweet")
|
||||
# bot.reply(tweet, meta(tweet)[:reply_prefix] + "nice tweet")
|
||||
end
|
||||
end
|
||||
|
||||
bot.scheduler.every '24h' do
|
||||
# Tweet something every 24 hours
|
||||
# See https://github.com/jmettraux/rufus-scheduler
|
||||
# bot.tweet("hi")
|
||||
# bot.pictweet("hi", "cuteselfie.jpg", ":possibly_sensitive => true")
|
||||
end
|
||||
# Make a MyBot and attach it to an account
|
||||
MyBot.new("{{BOT_NAME}}") do |bot|
|
||||
bot.access_token = "" # Token connecting the app to this account
|
||||
bot.access_token_secret = "" # Secret connecting the app to this account
|
||||
end
|
||||
```
|
||||
|
||||
Bots defined like this can be spawned by executing `run.rb` in the same directory, and will operate together in a single eventmachine loop. The easiest way to run bots in a semi-permanent fashion is with [Heroku](https://www.heroku.com); just make an app, push the bot repository to it, enable a worker process in the web interface and it ought to chug along merrily forever.
|
||||
'ebooks start' will run all defined bots in their own threads. The easiest way to run bots in a semi-permanent fashion is with [Heroku](https://www.heroku.com); just make an app, push the bot repository to it, enable a worker process in the web interface and it ought to chug along merrily forever.
|
||||
|
||||
The underlying [tweetstream](https://github.com/tweetstream/tweetstream) and [twitter gem](https://github.com/sferik/twitter) client objects can be accessed at `bot.stream` and `bot.twitter` respectively.
|
||||
The underlying streaming and REST clients from the [twitter gem](https://github.com/sferik/twitter) can be accessed at `bot.stream` and `bot.twitter` respectively.
|
||||
|
||||
## Archiving accounts
|
||||
|
||||
|
@ -102,7 +120,6 @@ Text files use newlines and full stops to seperate statements.
|
|||
Once you have a model, the primary use is to produce statements and related responses to input, using a pseudo-Markov generator:
|
||||
|
||||
``` ruby
|
||||
> require 'twitter_ebooks'
|
||||
> model = Ebooks::Model.load("model/0xabad1dea.model")
|
||||
> model.make_statement(140)
|
||||
=> "My Terrible Netbook may be the kind of person who buys Starbucks, but this Rackspace vuln is pretty straight up a backdoor"
|
||||
|
@ -113,14 +130,18 @@ Once you have a model, the primary use is to produce statements and related resp
|
|||
The secondary function is the "interesting keywords" list. For example, I use this to determine whether a bot wants to fav/retweet/reply to something in its timeline:
|
||||
|
||||
``` ruby
|
||||
top100 = model.keywords.top(100)
|
||||
top100 = model.keywords.take(100)
|
||||
tokens = Ebooks::NLP.tokenize(tweet[:text])
|
||||
|
||||
if tokens.find { |t| top100.include?(t) }
|
||||
bot.twitter.favorite(tweet[:id])
|
||||
bot.favorite(tweet[:id])
|
||||
end
|
||||
```
|
||||
|
||||
## Bot niceness
|
||||
|
||||
|
||||
|
||||
## Other notes
|
||||
|
||||
If you're using Heroku, which has no persistent filesystem, automating the process of archiving, consuming and updating can be tricky. My current solution is just a daily cron job which commits and pushes for me, which is pretty hacky.
|
||||
|
|
348
bin/ebooks
348
bin/ebooks
|
@ -2,54 +2,85 @@
|
|||
# encoding: utf-8
|
||||
|
||||
require 'twitter_ebooks'
|
||||
require 'csv'
|
||||
require 'ostruct'
|
||||
|
||||
$debug = true
|
||||
module Ebooks::Util
|
||||
def pretty_exception(e)
|
||||
|
||||
module Ebooks
|
||||
end
|
||||
end
|
||||
|
||||
module Ebooks::CLI
|
||||
APP_PATH = Dir.pwd # XXX do some recursive thing instead
|
||||
HELP = OpenStruct.new
|
||||
|
||||
def self.new(reponame)
|
||||
usage = <<STR
|
||||
Usage: ebooks new <reponame>
|
||||
HELP.default = <<STR
|
||||
Usage:
|
||||
ebooks help <command>
|
||||
|
||||
Creates a new skeleton repository defining a template bot in
|
||||
the current working directory specified by <reponame>.
|
||||
ebooks new <reponame>
|
||||
ebooks auth
|
||||
ebooks consume <corpus_path> [corpus_path2] [...]
|
||||
ebooks consume-all <corpus_path> [corpus_path2] [...]
|
||||
ebooks gen <model_path> [input]
|
||||
ebooks archive <username> [path]
|
||||
ebooks tweet <model_path> <botname>
|
||||
STR
|
||||
|
||||
def self.help(command=nil)
|
||||
if command.nil?
|
||||
log HELP.default
|
||||
else
|
||||
log HELP[command].gsub(/^ {4}/, '')
|
||||
end
|
||||
end
|
||||
|
||||
HELP.new = <<-STR
|
||||
Usage: ebooks new <reponame>
|
||||
|
||||
Creates a new skeleton repository defining a template bot in
|
||||
the current working directory specified by <reponame>.
|
||||
STR
|
||||
|
||||
def self.new(reponame)
|
||||
if reponame.nil?
|
||||
log usage
|
||||
exit
|
||||
help :new
|
||||
exit 1
|
||||
end
|
||||
|
||||
path = "./#{reponame}"
|
||||
|
||||
if File.exists?(path)
|
||||
log "#{path} already exists. Please remove if you want to recreate."
|
||||
exit
|
||||
exit 1
|
||||
end
|
||||
|
||||
FileUtils.cp_r(SKELETON_PATH, path)
|
||||
FileUtils.cp_r(Ebooks::SKELETON_PATH, path)
|
||||
|
||||
File.open(File.join(path, 'bots.rb'), 'w') do |f|
|
||||
template = File.read(File.join(SKELETON_PATH, 'bots.rb'))
|
||||
template = File.read(File.join(Ebooks::SKELETON_PATH, 'bots.rb'))
|
||||
f.write(template.gsub("{{BOT_NAME}}", reponame))
|
||||
end
|
||||
|
||||
File.open(File.join(path, 'Gemfile'), 'w') do |f|
|
||||
template = File.read(File.join(Ebooks::SKELETON_PATH, 'Gemfile'))
|
||||
f.write(template.gsub("{{RUBY_VERSION}}", RUBY_VERSION))
|
||||
end
|
||||
|
||||
log "New twitter_ebooks app created at #{reponame}"
|
||||
end
|
||||
|
||||
HELP.consume = <<-STR
|
||||
Usage: ebooks consume <corpus_path> [corpus_path2] [...]
|
||||
|
||||
Processes some number of text files or json tweet corpuses
|
||||
into usable models. These will be output at model/<name>.model
|
||||
STR
|
||||
|
||||
def self.consume(pathes)
|
||||
usage = <<STR
|
||||
Usage: ebooks consume <corpus_path> [corpus_path2] [...]
|
||||
|
||||
Processes some number of text files or json tweet corpuses
|
||||
into usable models. These will be output at model/<name>.model
|
||||
STR
|
||||
|
||||
if pathes.empty?
|
||||
log usage
|
||||
exit
|
||||
help :consume
|
||||
exit 1
|
||||
end
|
||||
|
||||
pathes.each do |path|
|
||||
|
@ -57,50 +88,43 @@ STR
|
|||
shortname = filename.split('.')[0..-2].join('.')
|
||||
|
||||
outpath = File.join(APP_PATH, 'model', "#{shortname}.model")
|
||||
Model.consume(path).save(outpath)
|
||||
Ebooks::Model.consume(path).save(outpath)
|
||||
log "Corpus consumed to #{outpath}"
|
||||
end
|
||||
end
|
||||
|
||||
HELP.consume_all = <<-STR
|
||||
Usage: ebooks consume-all <name> <corpus_path> [corpus_path2] [...]
|
||||
|
||||
Processes some number of text files or json tweet corpuses
|
||||
into one usable model. It will be output at model/<name>.model
|
||||
STR
|
||||
|
||||
def self.consume_all(name, paths)
|
||||
usage = <<STR
|
||||
Usage: ebooks consume-all <name> <corpus_path> [corpus_path2] [...]
|
||||
|
||||
Processes some number of text files or json tweet corpuses
|
||||
into one usable model. It will be output at model/<name>.model
|
||||
STR
|
||||
|
||||
if paths.empty?
|
||||
log usage
|
||||
exit
|
||||
help :consume_all
|
||||
exit 1
|
||||
end
|
||||
|
||||
outpath = File.join(APP_PATH, 'model', "#{name}.model")
|
||||
#pathes.each do |path|
|
||||
# filename = File.basename(path)
|
||||
# shortname = filename.split('.')[0..-2].join('.')
|
||||
#
|
||||
# outpath = File.join(APP_PATH, 'model', "#{shortname}.model")
|
||||
# Model.consume(path).save(outpath)
|
||||
# log "Corpus consumed to #{outpath}"
|
||||
#end
|
||||
Model.consume_all(paths).save(outpath)
|
||||
Ebooks::Model.consume_all(paths).save(outpath)
|
||||
log "Corpuses consumed to #{outpath}"
|
||||
end
|
||||
|
||||
def self.gen(model_path, input)
|
||||
usage = <<STR
|
||||
Usage: ebooks gen <model_path> [input]
|
||||
HELP.gen = <<-STR
|
||||
Usage: ebooks gen <model_path> [input]
|
||||
|
||||
Make a test tweet from the processed model at <model_path>.
|
||||
Will respond to input if provided.
|
||||
STR
|
||||
Make a test tweet from the processed model at <model_path>.
|
||||
Will respond to input if provided.
|
||||
STR
|
||||
|
||||
def self.gen(model_path, input)
|
||||
if model_path.nil?
|
||||
log usage
|
||||
exit
|
||||
help :gen
|
||||
exit 1
|
||||
end
|
||||
|
||||
model = Model.load(model_path)
|
||||
model = Ebooks::Model.load(model_path)
|
||||
if input && !input.empty?
|
||||
puts "@cmd " + model.make_response(input, 135)
|
||||
else
|
||||
|
@ -108,81 +132,186 @@ STR
|
|||
end
|
||||
end
|
||||
|
||||
def self.score(model_path, input)
|
||||
usage = <<STR
|
||||
Usage: ebooks score <model_path> <input>
|
||||
HELP.archive = <<-STR
|
||||
Usage: ebooks archive <username> [outpath]
|
||||
|
||||
Scores "interest" in some text input according to how
|
||||
well unique keywords match the model.
|
||||
STR
|
||||
if model_path.nil? || input.nil?
|
||||
log usage
|
||||
exit
|
||||
Downloads a json corpus of the <username>'s tweets.
|
||||
Output defaults to corpus/<username>.json
|
||||
Due to API limitations, this can only receive up to ~3000 tweets
|
||||
into the past.
|
||||
STR
|
||||
|
||||
def self.archive(username, outpath=nil)
|
||||
if username.nil?
|
||||
help :archive
|
||||
exit 1
|
||||
end
|
||||
|
||||
model = Model.load(model_path)
|
||||
model.score_interest(input)
|
||||
Ebooks::Archive.new(username, outpath).sync
|
||||
end
|
||||
|
||||
def self.archive(username, outpath)
|
||||
usage = <<STR
|
||||
Usage: ebooks archive <username> <outpath>
|
||||
HELP.tweet = <<-STR
|
||||
Usage: ebooks tweet <model_path> <botname>
|
||||
|
||||
Downloads a json corpus of the <username>'s tweets to <outpath>.
|
||||
Due to API limitations, this can only receive up to ~3000 tweets
|
||||
into the past.
|
||||
STR
|
||||
|
||||
if username.nil? || outpath.nil?
|
||||
log usage
|
||||
exit
|
||||
end
|
||||
|
||||
Archive.new(username, outpath).sync
|
||||
end
|
||||
Sends a public tweet from the specified bot using text
|
||||
from the processed model at <model_path>.
|
||||
STR
|
||||
|
||||
def self.tweet(modelpath, botname)
|
||||
usage = <<STR
|
||||
Usage: ebooks tweet <model_path> <botname>
|
||||
|
||||
Sends a public tweet from the specified bot using text
|
||||
from the processed model at <model_path>.
|
||||
STR
|
||||
|
||||
if modelpath.nil? || botname.nil?
|
||||
log usage
|
||||
exit
|
||||
help :tweet
|
||||
exit 1
|
||||
end
|
||||
|
||||
load File.join(APP_PATH, 'bots.rb')
|
||||
model = Model.load(modelpath)
|
||||
model = Ebooks::Model.load(modelpath)
|
||||
statement = model.make_statement
|
||||
log "@#{botname}: #{statement}"
|
||||
bot = Bot.get(botname)
|
||||
bot = Ebooks::Bot.get(botname)
|
||||
bot.configure
|
||||
bot.tweet(statement)
|
||||
end
|
||||
|
||||
def self.c
|
||||
HELP.auth = <<-STR
|
||||
Usage: ebooks auth
|
||||
|
||||
Authenticates your Twitter app for any account. By default, will
|
||||
use the consumer key and secret from the first defined bot. You
|
||||
can specify another by setting the CONSUMER_KEY and CONSUMER_SECRET
|
||||
environment variables.
|
||||
STR
|
||||
|
||||
def self.auth
|
||||
consumer_key, consumer_secret = find_consumer
|
||||
require 'oauth'
|
||||
|
||||
consumer = OAuth::Consumer.new(
|
||||
consumer_key,
|
||||
consumer_secret,
|
||||
site: 'https://twitter.com/',
|
||||
scheme: :header
|
||||
)
|
||||
|
||||
request_token = consumer.get_request_token
|
||||
auth_url = request_token.authorize_url()
|
||||
|
||||
pin = nil
|
||||
loop do
|
||||
log auth_url
|
||||
|
||||
log "Go to the above url and follow the prompts, then enter the PIN code here."
|
||||
print "> "
|
||||
|
||||
pin = STDIN.gets.chomp
|
||||
|
||||
break unless pin.empty?
|
||||
end
|
||||
|
||||
access_token = request_token.get_access_token(oauth_verifier: pin)
|
||||
|
||||
log "Account authorized successfully. Make sure to put these in your bots.rb!\n" +
|
||||
" access token: #{access_token.token}\n" +
|
||||
" access token secret: #{access_token.secret}"
|
||||
end
|
||||
|
||||
HELP.console = <<-STR
|
||||
Usage: ebooks c[onsole]
|
||||
|
||||
Starts an interactive ruby session with your bots loaded
|
||||
and configured.
|
||||
STR
|
||||
|
||||
def self.console
|
||||
load_bots
|
||||
require 'pry'; Ebooks.module_exec { pry }
|
||||
end
|
||||
|
||||
HELP.start = <<-STR
|
||||
Usage: ebooks s[tart] [botname]
|
||||
|
||||
Starts running bots. If botname is provided, only runs that bot.
|
||||
STR
|
||||
|
||||
def self.start(botname=nil)
|
||||
load_bots
|
||||
|
||||
if botname.nil?
|
||||
bots = Ebooks::Bot.all
|
||||
else
|
||||
bots = Ebooks::Bot.all.select { |bot| bot.username == botname }
|
||||
if bots.empty?
|
||||
log "Couldn't find a defined bot for @#{botname}!"
|
||||
exit 1
|
||||
end
|
||||
end
|
||||
|
||||
threads = []
|
||||
bots.each do |bot|
|
||||
threads << Thread.new { bot.prepare }
|
||||
end
|
||||
threads.each(&:join)
|
||||
|
||||
threads = []
|
||||
bots.each do |bot|
|
||||
threads << Thread.new do
|
||||
loop do
|
||||
begin
|
||||
bot.start
|
||||
rescue Exception => e
|
||||
bot.log e.inspect
|
||||
puts e.backtrace.map { |s| "\t"+s }.join("\n")
|
||||
end
|
||||
bot.log "Sleeping before reconnect"
|
||||
sleep 5
|
||||
end
|
||||
end
|
||||
end
|
||||
threads.each(&:join)
|
||||
end
|
||||
|
||||
# Non-command methods
|
||||
|
||||
def self.find_consumer
|
||||
if ENV['CONSUMER_KEY'] && ENV['CONSUMER_SECRET']
|
||||
log "Using consumer details from environment variables:\n" +
|
||||
" consumer key: #{ENV['CONSUMER_KEY']}\n" +
|
||||
" consumer secret: #{ENV['CONSUMER_SECRET']}"
|
||||
return [ENV['CONSUMER_KEY'], ENV['CONSUMER_SECRET']]
|
||||
end
|
||||
|
||||
load_bots
|
||||
consumer_key = nil
|
||||
consumer_secret = nil
|
||||
Ebooks::Bot.all.each do |bot|
|
||||
if bot.consumer_key && bot.consumer_secret
|
||||
consumer_key = bot.consumer_key
|
||||
consumer_secret = bot.consumer_secret
|
||||
log "Using consumer details from @#{bot.username}:\n" +
|
||||
" consumer key: #{bot.consumer_key}\n" +
|
||||
" consumer secret: #{bot.consumer_secret}\n"
|
||||
return consumer_key, consumer_secret
|
||||
end
|
||||
end
|
||||
|
||||
if consumer_key.nil? || consumer_secret.nil?
|
||||
log "Couldn't find any consumer details to auth an account with.\n" +
|
||||
"Please either configure a bot with consumer_key and consumer_secret\n" +
|
||||
"or provide the CONSUMER_KEY and CONSUMER_SECRET environment variables."
|
||||
exit 1
|
||||
end
|
||||
end
|
||||
|
||||
def self.load_bots
|
||||
load 'bots.rb'
|
||||
require 'pry'; pry
|
||||
|
||||
if Ebooks::Bot.all.empty?
|
||||
puts "Couldn't find any bots! Please make sure bots.rb instantiates at least one bot."
|
||||
end
|
||||
end
|
||||
|
||||
def self.command(args)
|
||||
usage = <<STR
|
||||
Usage:
|
||||
ebooks new <reponame>
|
||||
ebooks consume <corpus_path> [corpus_path2] [...]
|
||||
ebooks consume-all <corpus_path> [corpus_path2] [...]
|
||||
ebooks gen <model_path> [input]
|
||||
ebooks score <model_path> <input>
|
||||
ebooks archive <@user> <outpath>
|
||||
ebooks tweet <model_path> <botname>
|
||||
STR
|
||||
|
||||
if args.length == 0
|
||||
log usage
|
||||
exit
|
||||
help
|
||||
exit 1
|
||||
end
|
||||
|
||||
case args[0]
|
||||
|
@ -190,16 +319,21 @@ STR
|
|||
when "consume" then consume(args[1..-1])
|
||||
when "consume-all" then consume_all(args[1], args[2..-1])
|
||||
when "gen" then gen(args[1], args[2..-1].join(' '))
|
||||
when "score" then score(args[1], args[2..-1].join(' '))
|
||||
when "archive" then archive(args[1], args[2])
|
||||
when "tweet" then tweet(args[1], args[2])
|
||||
when "jsonify" then jsonify(args[1..-1])
|
||||
when "c" then c
|
||||
when "auth" then auth
|
||||
when "console" then console
|
||||
when "c" then console
|
||||
when "start" then start(args[1])
|
||||
when "s" then start(args[1])
|
||||
when "help" then help(args[1])
|
||||
else
|
||||
log usage
|
||||
log "No such command '#{args[0]}'"
|
||||
help
|
||||
exit 1
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
Ebooks.command(ARGV)
|
||||
Ebooks::CLI.command(ARGV)
|
||||
|
|
|
@ -11,11 +11,11 @@ module Ebooks
|
|||
SKELETON_PATH = File.join(GEM_PATH, 'skeleton')
|
||||
TEST_PATH = File.join(GEM_PATH, 'test')
|
||||
TEST_CORPUS_PATH = File.join(TEST_PATH, 'corpus/0xabad1dea.tweets')
|
||||
INTERIM = :interim
|
||||
end
|
||||
|
||||
require 'twitter_ebooks/nlp'
|
||||
require 'twitter_ebooks/archive'
|
||||
require 'twitter_ebooks/markov'
|
||||
require 'twitter_ebooks/suffix'
|
||||
require 'twitter_ebooks/model'
|
||||
require 'twitter_ebooks/bot'
|
||||
|
|
|
@ -39,9 +39,14 @@ module Ebooks
|
|||
end
|
||||
end
|
||||
|
||||
def initialize(username, path, client=nil)
|
||||
def initialize(username, path=nil, client=nil)
|
||||
@username = username
|
||||
@path = path || "#{username}.json"
|
||||
@path = path || "corpus/#{username}.json"
|
||||
|
||||
if File.directory?(@path)
|
||||
@path = File.join(@path, "#{username}.json")
|
||||
end
|
||||
|
||||
@client = client || make_client
|
||||
|
||||
if File.exists?(@path)
|
||||
|
|
409
lib/twitter_ebooks/bot.rb
Executable file → Normal file
409
lib/twitter_ebooks/bot.rb
Executable file → Normal file
|
@ -6,143 +6,91 @@ module Ebooks
|
|||
class ConfigurationError < Exception
|
||||
end
|
||||
|
||||
# We track how many unprompted interactions the bot has had with
|
||||
# each user and start dropping them from mentions after two in a row
|
||||
class UserInfo
|
||||
attr_reader :username
|
||||
attr_accessor :pesters_left
|
||||
# Represents a single reply tree of tweets
|
||||
class Conversation
|
||||
attr_reader :last_update
|
||||
|
||||
def initialize(username)
|
||||
@username = username
|
||||
@pesters_left = 1
|
||||
end
|
||||
|
||||
def can_pester?
|
||||
@pesters_left > 0
|
||||
end
|
||||
end
|
||||
|
||||
# Represents a current "interaction state" with another user
|
||||
class Interaction
|
||||
attr_reader :userinfo, :received, :last_update
|
||||
|
||||
def initialize(userinfo)
|
||||
@userinfo = userinfo
|
||||
@received = []
|
||||
# @param bot [Ebooks::Bot]
|
||||
def initialize(bot)
|
||||
@bot = bot
|
||||
@tweets = []
|
||||
@last_update = Time.now
|
||||
end
|
||||
|
||||
def receive(tweet)
|
||||
@received << tweet
|
||||
# @param tweet [Twitter::Tweet] tweet to add
|
||||
def add(tweet)
|
||||
@tweets << tweet
|
||||
@last_update = Time.now
|
||||
@userinfo.pesters_left += 2
|
||||
end
|
||||
|
||||
# Make an informed guess as to whether this user is a bot
|
||||
# based on its username and reply speed
|
||||
def is_bot?
|
||||
if @received.length > 2
|
||||
if (@received[-1].created_at - @received[-3].created_at) < 30
|
||||
# Make an informed guess as to whether a user is a bot based
|
||||
# on their behavior in this conversation
|
||||
def is_bot?(username)
|
||||
usertweets = @tweets.select { |t| t.user.screen_name == username }
|
||||
|
||||
if usertweets.length > 2
|
||||
if (usertweets[-1].created_at - usertweets[-3].created_at) < 30
|
||||
return true
|
||||
end
|
||||
end
|
||||
|
||||
@userinfo.username.include?("ebooks")
|
||||
username.include?("ebooks")
|
||||
end
|
||||
|
||||
def continue?
|
||||
if is_bot?
|
||||
true if @received.length < 2
|
||||
else
|
||||
true
|
||||
end
|
||||
# Figure out whether to keep this user in the reply prefix
|
||||
# We want to avoid spamming non-participating users
|
||||
def can_include?(username)
|
||||
@tweets.length <= 4 ||
|
||||
!@tweets[-4..-1].select { |t| t.user.screen_name == username }.empty?
|
||||
end
|
||||
end
|
||||
|
||||
class Bot
|
||||
attr_accessor :consumer_key, :consumer_secret,
|
||||
:access_token, :access_token_secret
|
||||
# Meta information about a tweet that we calculate for ourselves
|
||||
class TweetMeta
|
||||
# @return [Array<String>] usernames mentioned in tweet
|
||||
attr_accessor :mentions
|
||||
# @return [String] text of tweets with mentions removed
|
||||
attr_accessor :mentionless
|
||||
# @return [Array<String>] usernames to include in a reply
|
||||
attr_accessor :reply_mentions
|
||||
# @return [String] mentions to start reply with
|
||||
attr_accessor :reply_prefix
|
||||
# @return [Integer] available chars for reply
|
||||
attr_accessor :limit
|
||||
|
||||
attr_reader :twitter, :stream, :thread
|
||||
|
||||
# Configuration
|
||||
attr_accessor :username, :delay_range, :blacklist
|
||||
|
||||
@@all = [] # List of all defined bots
|
||||
def self.all; @@all; end
|
||||
|
||||
def self.get(name)
|
||||
all.find { |bot| bot.username == name }
|
||||
end
|
||||
|
||||
def log(*args)
|
||||
STDOUT.print "@#{@username}: " + args.map(&:to_s).join(' ') + "\n"
|
||||
STDOUT.flush
|
||||
end
|
||||
|
||||
def initialize(*args, &b)
|
||||
@username ||= nil
|
||||
@blacklist ||= []
|
||||
@delay_range ||= 0
|
||||
|
||||
@users ||= {}
|
||||
@interactions ||= {}
|
||||
configure(*args, &b)
|
||||
|
||||
# Tweet ids we've already observed, to avoid duplication
|
||||
@seen_tweets ||= {}
|
||||
end
|
||||
|
||||
def userinfo(username)
|
||||
@users[username] ||= UserInfo.new(username)
|
||||
end
|
||||
|
||||
def interaction(username)
|
||||
if @interactions[username] &&
|
||||
Time.now - @interactions[username].last_update < 600
|
||||
@interactions[username]
|
||||
else
|
||||
@interactions[username] = Interaction.new(userinfo(username))
|
||||
end
|
||||
end
|
||||
|
||||
def twitter
|
||||
@twitter ||= Twitter::REST::Client.new do |config|
|
||||
config.consumer_key = @consumer_key
|
||||
config.consumer_secret = @consumer_secret
|
||||
config.access_token = @access_token
|
||||
config.access_token_secret = @access_token_secret
|
||||
end
|
||||
end
|
||||
|
||||
def stream
|
||||
@stream ||= Twitter::Streaming::Client.new do |config|
|
||||
config.consumer_key = @consumer_key
|
||||
config.consumer_secret = @consumer_secret
|
||||
config.access_token = @access_token
|
||||
config.access_token_secret = @access_token_secret
|
||||
end
|
||||
end
|
||||
|
||||
# Calculate some meta information about a tweet relevant for replying
|
||||
def calc_meta(ev)
|
||||
meta = {}
|
||||
meta[:mentions] = ev.attrs[:entities][:user_mentions].map { |x| x[:screen_name] }
|
||||
# @return [Ebooks::Bot] associated bot
|
||||
attr_accessor :bot
|
||||
# @return [Twitter::Tweet] associated tweet
|
||||
attr_accessor :tweet
|
||||
|
||||
# Check whether this tweet mentions our bot
|
||||
# @return [Boolean]
|
||||
def mentions_bot?
|
||||
# To check if this is someone talking to us, ensure:
|
||||
# - The tweet mentions list contains our username
|
||||
# - The tweet is not being retweeted by somebody else
|
||||
# - Or soft-retweeted by somebody else
|
||||
meta[:mentions_bot] = meta[:mentions].map(&:downcase).include?(@username.downcase) && !ev.retweeted_status? && !ev.text.start_with?('RT ')
|
||||
@mentions.map(&:downcase).include?(@bot.username.downcase) && !@tweet.retweeted_status? && !@tweet.text.start_with?('RT ')
|
||||
end
|
||||
|
||||
# @param bot [Ebooks::Bot]
|
||||
# @param ev [Twitter::Tweet]
|
||||
def initialize(bot, ev)
|
||||
@bot = bot
|
||||
@tweet = ev
|
||||
|
||||
@mentions = ev.attrs[:entities][:user_mentions].map { |x| x[:screen_name] }
|
||||
|
||||
# Process mentions to figure out who to reply to
|
||||
reply_mentions = meta[:mentions].reject { |m| m.downcase == @username.downcase }
|
||||
reply_mentions = reply_mentions.select { |username| userinfo(username).can_pester? }
|
||||
meta[:reply_mentions] = [ev.user.screen_name] + reply_mentions
|
||||
# i.e. not self and nobody who has seen too many secondary mentions
|
||||
reply_mentions = @mentions.reject do |m|
|
||||
username = m.downcase
|
||||
username == @bot.username || !@bot.conversation(ev).can_include?(username)
|
||||
end
|
||||
@reply_mentions = ([ev.user.screen_name] + reply_mentions).uniq
|
||||
|
||||
meta[:reply_prefix] = meta[:reply_mentions].uniq.map { |m| '@'+m }.join(' ') + ' '
|
||||
|
||||
meta[:limit] = 140 - meta[:reply_prefix].length
|
||||
@reply_prefix = @reply_mentions.map { |m| '@'+m }.join(' ') + ' '
|
||||
@limit = 140 - @reply_prefix.length
|
||||
|
||||
mless = ev.text
|
||||
begin
|
||||
|
@ -155,12 +103,116 @@ module Ebooks
|
|||
p ev.text
|
||||
raise
|
||||
end
|
||||
meta[:mentionless] = mless
|
||||
@mentionless = mless
|
||||
end
|
||||
end
|
||||
|
||||
meta
|
||||
class Bot
|
||||
# @return [String] OAuth consumer key for a Twitter app
|
||||
attr_accessor :consumer_key
|
||||
# @return [String] OAuth consumer secret for a Twitter app
|
||||
attr_accessor :consumer_secret
|
||||
# @return [String] OAuth access token from `ebooks auth`
|
||||
attr_accessor :access_token
|
||||
# @return [String] OAuth access secret from `ebooks auth`
|
||||
attr_accessor :access_token_secret
|
||||
# @return [String] Twitter username of bot
|
||||
attr_accessor :username
|
||||
# @return [Array<String>] list of usernames to block on contact
|
||||
attr_accessor :blacklist
|
||||
# @return [Hash{String => Ebooks::Conversation}] maps tweet ids to their conversation contexts
|
||||
attr_accessor :conversations
|
||||
# @return [Range, Integer] range of seconds to delay in delay method
|
||||
attr_accessor :delay_range
|
||||
|
||||
# @return [Array] list of all defined bots
|
||||
def self.all; @@all ||= []; end
|
||||
|
||||
# Fetches a bot by username
|
||||
# @param username [String]
|
||||
# @return [Ebooks::Bot]
|
||||
def self.get(username)
|
||||
all.find { |bot| bot.username == username }
|
||||
end
|
||||
|
||||
# Logs info to stdout in the context of this bot
|
||||
def log(*args)
|
||||
STDOUT.print "@#{@username}: " + args.map(&:to_s).join(' ') + "\n"
|
||||
STDOUT.flush
|
||||
end
|
||||
|
||||
# Initializes and configures bot
|
||||
# @param args Arguments passed to configure method
|
||||
# @param b Block to call with new bot
|
||||
def initialize(username, &b)
|
||||
@blacklist ||= []
|
||||
@conversations ||= {}
|
||||
# Tweet ids we've already observed, to avoid duplication
|
||||
@seen_tweets ||= {}
|
||||
|
||||
@username = username
|
||||
configure
|
||||
|
||||
b.call(self) unless b.nil?
|
||||
Bot.all << self
|
||||
end
|
||||
|
||||
# Find or create the conversation context for this tweet
|
||||
# @param tweet [Twitter::Tweet]
|
||||
# @return [Ebooks::Conversation]
|
||||
def conversation(tweet)
|
||||
conv = if tweet.in_reply_to_status_id?
|
||||
@conversations[tweet.in_reply_to_status_id]
|
||||
end
|
||||
|
||||
if conv.nil?
|
||||
conv = @conversations[tweet.id] || Conversation.new(self)
|
||||
end
|
||||
|
||||
if tweet.in_reply_to_status_id?
|
||||
@conversations[tweet.in_reply_to_status_id] = conv
|
||||
end
|
||||
@conversations[tweet.id] = conv
|
||||
|
||||
# Expire any old conversations to prevent memory growth
|
||||
@conversations.each do |k,v|
|
||||
if v != conv && Time.now - v.last_update > 3600
|
||||
@conversations.delete(k)
|
||||
end
|
||||
end
|
||||
|
||||
conv
|
||||
end
|
||||
|
||||
# @return [Twitter::REST::Client] underlying REST client from twitter gem
|
||||
def twitter
|
||||
@twitter ||= Twitter::REST::Client.new do |config|
|
||||
config.consumer_key = @consumer_key
|
||||
config.consumer_secret = @consumer_secret
|
||||
config.access_token = @access_token
|
||||
config.access_token_secret = @access_token_secret
|
||||
end
|
||||
end
|
||||
|
||||
# @return [Twitter::Streaming::Client] underlying streaming client from twitter gem
|
||||
def stream
|
||||
@stream ||= Twitter::Streaming::Client.new do |config|
|
||||
config.consumer_key = @consumer_key
|
||||
config.consumer_secret = @consumer_secret
|
||||
config.access_token = @access_token
|
||||
config.access_token_secret = @access_token_secret
|
||||
end
|
||||
end
|
||||
|
||||
# Calculate some meta information about a tweet relevant for replying
|
||||
# @param ev [Twitter::Tweet]
|
||||
# @return [Ebooks::TweetMeta]
|
||||
def meta(ev)
|
||||
TweetMeta.new(self, ev)
|
||||
end
|
||||
|
||||
# Receive an event from the twitter stream
|
||||
# @param ev [Object] Twitter streaming event
|
||||
def receive_event(ev)
|
||||
if ev.is_a? Array # Initial array sent on first connection
|
||||
log "Online!"
|
||||
|
@ -181,7 +233,7 @@ module Ebooks
|
|||
return unless ev.text # If it's not a text-containing tweet, ignore it
|
||||
return if ev.user.screen_name == @username # Ignore our own tweets
|
||||
|
||||
meta = calc_meta(ev)
|
||||
meta = meta(ev)
|
||||
|
||||
if blacklisted?(ev.user.screen_name)
|
||||
log "Blocking blacklisted user @#{ev.user.screen_name}"
|
||||
|
@ -190,17 +242,18 @@ module Ebooks
|
|||
|
||||
# Avoid responding to duplicate tweets
|
||||
if @seen_tweets[ev.id]
|
||||
log "Not firing event for duplicate tweet #{ev.id}"
|
||||
return
|
||||
else
|
||||
@seen_tweets[ev.id] = true
|
||||
end
|
||||
|
||||
if meta[:mentions_bot]
|
||||
if meta.mentions_bot?
|
||||
log "Mention from @#{ev.user.screen_name}: #{ev.text}"
|
||||
interaction(ev.user.screen_name).receive(ev)
|
||||
fire(:mention, ev, meta)
|
||||
conversation(ev).add(ev)
|
||||
fire(:mention, ev)
|
||||
else
|
||||
fire(:timeline, ev, meta)
|
||||
fire(:timeline, ev)
|
||||
end
|
||||
|
||||
elsif ev.is_a?(Twitter::Streaming::DeletedTweet) ||
|
||||
|
@ -211,7 +264,31 @@ module Ebooks
|
|||
end
|
||||
end
|
||||
|
||||
def start_stream
|
||||
# Configures client and fires startup event
|
||||
def prepare
|
||||
# Sanity check
|
||||
if @username.nil?
|
||||
raise ConfigurationError, "bot username cannot be nil"
|
||||
end
|
||||
|
||||
if @consumer_key.nil? || @consumer_key.empty? ||
|
||||
@consumer_secret.nil? || @consumer_key.empty?
|
||||
log "Missing consumer_key or consumer_secret. These details can be acquired by registering a Twitter app at https://apps.twitter.com/"
|
||||
exit 1
|
||||
end
|
||||
|
||||
if @access_token.nil? || @access_token.empty? ||
|
||||
@access_token_secret.nil? || @access_token_secret.empty?
|
||||
log "Missing access_token or access_token_secret. Please run `ebooks auth`."
|
||||
exit 1
|
||||
end
|
||||
|
||||
twitter
|
||||
fire(:startup)
|
||||
end
|
||||
|
||||
# Start running user event stream
|
||||
def start
|
||||
log "starting tweet stream"
|
||||
|
||||
stream.user do |ev|
|
||||
|
@ -219,22 +296,9 @@ module Ebooks
|
|||
end
|
||||
end
|
||||
|
||||
def prepare
|
||||
# Sanity check
|
||||
if @username.nil?
|
||||
raise ConfigurationError, "bot.username cannot be nil"
|
||||
end
|
||||
|
||||
twitter
|
||||
fire(:startup)
|
||||
end
|
||||
|
||||
# Connects to tweetstream and opens event handlers for this bot
|
||||
def start
|
||||
start_stream
|
||||
end
|
||||
|
||||
# Fire an event
|
||||
# @param event [Symbol] event to fire
|
||||
# @param args arguments for event handler
|
||||
def fire(event, *args)
|
||||
handler = "on_#{event}".to_sym
|
||||
if respond_to? handler
|
||||
|
@ -242,11 +306,17 @@ module Ebooks
|
|||
end
|
||||
end
|
||||
|
||||
def delay(&b)
|
||||
time = @delay.to_a.sample unless @delay.is_a? Integer
|
||||
# Delay an action for a variable period of time
|
||||
# @param range [Range, Integer] range of seconds to choose for delay
|
||||
def delay(range=@delay_range, &b)
|
||||
time = range.to_a.sample unless range.is_a? Integer
|
||||
sleep time
|
||||
b.call
|
||||
end
|
||||
|
||||
# Check if a username is blacklisted
|
||||
# @param username [String]
|
||||
# @return [Boolean]
|
||||
def blacklisted?(username)
|
||||
if @blacklist.include?(username)
|
||||
true
|
||||
|
@ -256,46 +326,37 @@ module Ebooks
|
|||
end
|
||||
|
||||
# Reply to a tweet or a DM.
|
||||
# @param ev [Twitter::Tweet, Twitter::DirectMessage]
|
||||
# @param text [String] contents of reply excluding reply_prefix
|
||||
# @param opts [Hash] additional params to pass to twitter gem
|
||||
def reply(ev, text, opts={})
|
||||
opts = opts.clone
|
||||
|
||||
if ev.is_a? Twitter::DirectMessage
|
||||
return if blacklisted?(ev.sender.screen_name)
|
||||
log "Sending DM to @#{ev.sender.screen_name}: #{text}"
|
||||
twitter.create_direct_message(ev.sender.screen_name, text, opts)
|
||||
elsif ev.is_a? Twitter::Tweet
|
||||
meta = calc_meta(ev)
|
||||
meta = meta(ev)
|
||||
|
||||
if !interaction(ev.user.screen_name).continue?
|
||||
if conversation(ev).is_bot?(ev.user.screen_name)
|
||||
log "Not replying to suspected bot @#{ev.user.screen_name}"
|
||||
return
|
||||
return false
|
||||
end
|
||||
|
||||
if !meta[:mentions_bot]
|
||||
if !userinfo(ev.user.screen_name).can_pester?
|
||||
log "Not replying: leaving @#{ev.user.screen_name} alone"
|
||||
return
|
||||
else
|
||||
userinfo(ev.user.screen_name).pesters_left -= 1
|
||||
end
|
||||
end
|
||||
|
||||
log "Replying to @#{ev.user.screen_name} with: #{meta[:reply_prefix] + text}"
|
||||
twitter.update(meta[:reply_prefix] + text, in_reply_to_status_id: ev.id)
|
||||
log "Replying to @#{ev.user.screen_name} with: #{meta.reply_prefix + text}"
|
||||
tweet = twitter.update(meta.reply_prefix + text, in_reply_to_status_id: ev.id)
|
||||
conversation(tweet).add(tweet)
|
||||
tweet
|
||||
else
|
||||
raise Exception("Don't know how to reply to a #{ev.class}")
|
||||
end
|
||||
end
|
||||
|
||||
# Favorite a tweet
|
||||
# @param tweet [Twitter::Tweet]
|
||||
def favorite(tweet)
|
||||
return if blacklisted?(tweet.user.screen_name)
|
||||
log "Favoriting @#{tweet.user.screen_name}: #{tweet.text}"
|
||||
|
||||
meta = calc_meta(tweet)
|
||||
if !meta[:mentions_bot] && !userinfo(ev.user.screen_name).can_pester?
|
||||
log "Not favoriting: leaving @#{ev.user.screen_name} alone"
|
||||
end
|
||||
|
||||
begin
|
||||
twitter.favorite(tweet.id)
|
||||
rescue Twitter::Error::Forbidden
|
||||
|
@ -303,8 +364,9 @@ module Ebooks
|
|||
end
|
||||
end
|
||||
|
||||
# Retweet a tweet
|
||||
# @param tweet [Twitter::Tweet]
|
||||
def retweet(tweet)
|
||||
return if blacklisted?(tweet.user.screen_name)
|
||||
log "Retweeting @#{tweet.user.screen_name}: #{tweet.text}"
|
||||
|
||||
begin
|
||||
|
@ -314,21 +376,36 @@ module Ebooks
|
|||
end
|
||||
end
|
||||
|
||||
def follow(*args)
|
||||
log "Following #{args}"
|
||||
twitter.follow(*args)
|
||||
# Follow a user
|
||||
# @param user [String] username or user id
|
||||
def follow(user, *args)
|
||||
log "Following #{user}"
|
||||
twitter.follow(user, *args)
|
||||
end
|
||||
|
||||
def tweet(*args)
|
||||
log "Tweeting #{args.inspect}"
|
||||
twitter.update(*args)
|
||||
# Unfollow a user
|
||||
# @param user [String] username or user id
|
||||
def unfollow(user, *args)
|
||||
log "Unfollowing #{user}"
|
||||
twiter.unfollow(user, *args)
|
||||
end
|
||||
|
||||
# Tweet something
|
||||
# @param text [String]
|
||||
def tweet(text, *args)
|
||||
log "Tweeting '#{text}'"
|
||||
twitter.update(text, *args)
|
||||
end
|
||||
|
||||
# Get a scheduler for this bot
|
||||
# @return [Rufus::Scheduler]
|
||||
def scheduler
|
||||
@scheduler ||= Rufus::Scheduler.new
|
||||
end
|
||||
|
||||
# could easily just be *args however the separation keeps it clean.
|
||||
# Tweet some text with an image
|
||||
# @param txt [String]
|
||||
# @param pic [String] filename
|
||||
def pictweet(txt, pic, *args)
|
||||
log "Tweeting #{txt.inspect} - #{pic} #{args}"
|
||||
twitter.update_with_media(txt, File.new(pic), *args)
|
||||
|
|
|
@ -1,82 +0,0 @@
|
|||
module Ebooks
|
||||
# Special INTERIM token represents sentence boundaries
|
||||
# This is so we can include start and end of statements in model
|
||||
# Due to the way the sentence tokenizer works, can correspond
|
||||
# to multiple actual parts of text (such as ^, $, \n and .?!)
|
||||
INTERIM = :interim
|
||||
|
||||
# This is an ngram-based Markov model optimized to build from a
|
||||
# tokenized sentence list without requiring too much transformation
|
||||
class MarkovModel
|
||||
def self.build(sentences)
|
||||
MarkovModel.new.consume(sentences)
|
||||
end
|
||||
|
||||
def consume(sentences)
|
||||
# These models are of the form ngram => [[sentence_pos, token_pos] || INTERIM, ...]
|
||||
# We map by both bigrams and unigrams so we can fall back to the latter in
|
||||
# cases where an input bigram is unavailable, such as starting a sentence
|
||||
@sentences = sentences
|
||||
@unigrams = {}
|
||||
@bigrams = {}
|
||||
|
||||
sentences.each_with_index do |tokens, i|
|
||||
last_token = INTERIM
|
||||
tokens.each_with_index do |token, j|
|
||||
@unigrams[last_token] ||= []
|
||||
@unigrams[last_token] << [i, j]
|
||||
|
||||
@bigrams[last_token] ||= {}
|
||||
@bigrams[last_token][token] ||= []
|
||||
|
||||
if j == tokens.length-1 # Mark sentence endings
|
||||
@unigrams[token] ||= []
|
||||
@unigrams[token] << INTERIM
|
||||
@bigrams[last_token][token] << INTERIM
|
||||
else
|
||||
@bigrams[last_token][token] << [i, j+1]
|
||||
end
|
||||
|
||||
last_token = token
|
||||
end
|
||||
end
|
||||
|
||||
self
|
||||
end
|
||||
|
||||
def find_token(index)
|
||||
if index == INTERIM
|
||||
INTERIM
|
||||
else
|
||||
@sentences[index[0]][index[1]]
|
||||
end
|
||||
end
|
||||
|
||||
def chain(tokens)
|
||||
if tokens.length == 1
|
||||
matches = @unigrams[tokens[-1]]
|
||||
else
|
||||
matches = @bigrams[tokens[-2]][tokens[-1]]
|
||||
matches = @unigrams[tokens[-1]] if matches.length < 2
|
||||
end
|
||||
|
||||
if matches.empty?
|
||||
# This should never happen unless a strange token is
|
||||
# supplied from outside the dataset
|
||||
raise ArgumentError, "Unable to continue chain for: #{tokens.inspect}"
|
||||
end
|
||||
|
||||
next_token = find_token(matches.sample)
|
||||
|
||||
if next_token == INTERIM # We chose to end the sentence
|
||||
return tokens
|
||||
else
|
||||
return chain(tokens + [next_token])
|
||||
end
|
||||
end
|
||||
|
||||
def generate
|
||||
NLP.reconstruct(chain([INTERIM]))
|
||||
end
|
||||
end
|
||||
end
|
|
@ -8,16 +8,41 @@ require 'csv'
|
|||
|
||||
module Ebooks
|
||||
class Model
|
||||
attr_accessor :hash, :tokens, :sentences, :mentions, :keywords
|
||||
# @return [Array<String>]
|
||||
# An array of unique tokens. This is the main source of actual strings
|
||||
# in the model. Manipulation of a token is done using its index
|
||||
# in this array, which we call a "tiki"
|
||||
attr_accessor :tokens
|
||||
|
||||
def self.consume(txtpath)
|
||||
Model.new.consume(txtpath)
|
||||
# @return [Array<Array<Integer>>]
|
||||
# Sentences represented by arrays of tikis
|
||||
attr_accessor :sentences
|
||||
|
||||
# @return [Array<Array<Integer>>]
|
||||
# Sentences derived from Twitter mentions
|
||||
attr_accessor :mentions
|
||||
|
||||
# @return [Array<String>]
|
||||
# The top 200 most important keywords, in descending order
|
||||
attr_accessor :keywords
|
||||
|
||||
# Generate a new model from a corpus file
|
||||
# @param path [String]
|
||||
# @return [Ebooks::Model]
|
||||
def self.consume(path)
|
||||
Model.new.consume(path)
|
||||
end
|
||||
|
||||
# Generate a new model from multiple corpus files
|
||||
# @param paths [Array<String>]
|
||||
# @return [Ebooks::Model]
|
||||
def self.consume_all(paths)
|
||||
Model.new.consume_all(paths)
|
||||
end
|
||||
|
||||
# Load a saved model
|
||||
# @param path [String]
|
||||
# @return [Ebooks::Model]
|
||||
def self.load(path)
|
||||
model = Model.new
|
||||
model.instance_eval do
|
||||
|
@ -30,6 +55,8 @@ module Ebooks
|
|||
model
|
||||
end
|
||||
|
||||
# Save model to a file
|
||||
# @param path [String]
|
||||
def save(path)
|
||||
File.open(path, 'wb') do |f|
|
||||
f.write(Marshal.dump({
|
||||
|
@ -43,19 +70,22 @@ module Ebooks
|
|||
end
|
||||
|
||||
def initialize
|
||||
# This is the only source of actual strings in the model. It is
|
||||
# an array of unique tokens. Manipulation of a token is mostly done
|
||||
# using its index in this array, which we call a "tiki"
|
||||
@tokens = []
|
||||
|
||||
# Reverse lookup tiki by token, for faster generation
|
||||
@tikis = {}
|
||||
end
|
||||
|
||||
# Reverse lookup a token index from a token
|
||||
# @param token [String]
|
||||
# @return [Integer]
|
||||
def tikify(token)
|
||||
@tikis[token] or (@tokens << token and @tikis[token] = @tokens.length-1)
|
||||
end
|
||||
|
||||
# Convert a body of text into arrays of tikis
|
||||
# @param text [String]
|
||||
# @return [Array<Array<Integer>>]
|
||||
def mass_tikify(text)
|
||||
sentences = NLP.sentences(text)
|
||||
|
||||
|
@ -69,9 +99,10 @@ module Ebooks
|
|||
end
|
||||
end
|
||||
|
||||
# Consume a corpus into this model
|
||||
# @param path [String]
|
||||
def consume(path)
|
||||
content = File.read(path, :encoding => 'utf-8')
|
||||
@hash = Digest::MD5.hexdigest(content)
|
||||
|
||||
if path.split('.')[-1] == "json"
|
||||
log "Reading json corpus from #{path}"
|
||||
|
@ -94,6 +125,8 @@ module Ebooks
|
|||
consume_lines(lines)
|
||||
end
|
||||
|
||||
# Consume a sequence of lines
|
||||
# @param lines [Array<String>]
|
||||
def consume_lines(lines)
|
||||
log "Removing commented lines and sorting mentions"
|
||||
|
||||
|
@ -126,11 +159,12 @@ module Ebooks
|
|||
self
|
||||
end
|
||||
|
||||
# Consume multiple corpuses into this model
|
||||
# @param paths [Array<String>]
|
||||
def consume_all(paths)
|
||||
lines = []
|
||||
paths.each do |path|
|
||||
content = File.read(path, :encoding => 'utf-8')
|
||||
@hash = Digest::MD5.hexdigest(content)
|
||||
|
||||
if path.split('.')[-1] == "json"
|
||||
log "Reading json corpus from #{path}"
|
||||
|
@ -156,25 +190,26 @@ module Ebooks
|
|||
consume_lines(lines)
|
||||
end
|
||||
|
||||
def fix(tweet)
|
||||
# This seems to require an external api call
|
||||
#begin
|
||||
# fixer = NLP.gingerice.parse(tweet)
|
||||
# log fixer if fixer['corrections']
|
||||
# tweet = fixer['result']
|
||||
#rescue Exception => e
|
||||
# log e.message
|
||||
# log e.backtrace
|
||||
#end
|
||||
|
||||
NLP.htmlentities.decode tweet
|
||||
# Correct encoding issues in generated text
|
||||
# @param text [String]
|
||||
# @return [String]
|
||||
def fix(text)
|
||||
NLP.htmlentities.decode text
|
||||
end
|
||||
|
||||
# Check if an array of tikis comprises a valid tweet
|
||||
# @param tikis [Array<Integer>]
|
||||
# @param limit Integer how many chars we have left
|
||||
def valid_tweet?(tikis, limit)
|
||||
tweet = NLP.reconstruct(tikis, @tokens)
|
||||
tweet.length <= limit && !NLP.unmatched_enclosers?(tweet)
|
||||
end
|
||||
|
||||
# Generate some text
|
||||
# @param limit [Integer] available characters
|
||||
# @param generator [SuffixGenerator, nil]
|
||||
# @param retry_limit [Integer] how many times to retry on duplicates
|
||||
# @return [String]
|
||||
def make_statement(limit=140, generator=nil, retry_limit=10)
|
||||
responding = !generator.nil?
|
||||
generator ||= SuffixGenerator.build(@sentences)
|
||||
|
@ -209,12 +244,17 @@ module Ebooks
|
|||
end
|
||||
|
||||
# Test if a sentence has been copied verbatim from original
|
||||
def verbatim?(tokens)
|
||||
@sentences.include?(tokens) || @mentions.include?(tokens)
|
||||
# @param tikis [Array<Integer>]
|
||||
# @return [Boolean]
|
||||
def verbatim?(tikis)
|
||||
@sentences.include?(tikis) || @mentions.include?(tikis)
|
||||
end
|
||||
|
||||
# Finds all relevant tokenized sentences to given input by
|
||||
# Finds relevant and slightly relevant tokenized sentences to input
|
||||
# comparing non-stopword token overlaps
|
||||
# @param sentences [Array<Array<Integer>>]
|
||||
# @param input [String]
|
||||
# @return [Array<Array<Array<Integer>>, Array<Array<Integer>>>]
|
||||
def find_relevant(sentences, input)
|
||||
relevant = []
|
||||
slightly_relevant = []
|
||||
|
@ -235,6 +275,10 @@ module Ebooks
|
|||
|
||||
# Generates a response by looking for related sentences
|
||||
# in the corpus and building a smaller generator from these
|
||||
# @param input [String]
|
||||
# @param limit [Integer] characters available for response
|
||||
# @param sentences [Array<Array<Integer>>]
|
||||
# @return [String]
|
||||
def make_response(input, limit=140, sentences=@mentions)
|
||||
# Prefer mentions
|
||||
relevant, slightly_relevant = find_relevant(sentences, input)
|
||||
|
|
|
@ -12,31 +12,35 @@ module Ebooks
|
|||
# Some of this stuff is pretty heavy and we don't necessarily need
|
||||
# to be using it all of the time
|
||||
|
||||
# Lazily loads an array of stopwords
|
||||
# Stopwords are common English words that should often be ignored
|
||||
# @return [Array<String>]
|
||||
def self.stopwords
|
||||
@stopwords ||= File.read(File.join(DATA_PATH, 'stopwords.txt')).split
|
||||
end
|
||||
|
||||
# Lazily loads an array of known English nouns
|
||||
# @return [Array<String>]
|
||||
def self.nouns
|
||||
@nouns ||= File.read(File.join(DATA_PATH, 'nouns.txt')).split
|
||||
end
|
||||
|
||||
# Lazily loads an array of known English adjectives
|
||||
# @return [Array<String>]
|
||||
def self.adjectives
|
||||
@adjectives ||= File.read(File.join(DATA_PATH, 'adjectives.txt')).split
|
||||
end
|
||||
|
||||
# POS tagger
|
||||
# Lazily load part-of-speech tagging library
|
||||
# This can determine whether a word is being used as a noun/adjective/verb
|
||||
# @return [EngTagger]
|
||||
def self.tagger
|
||||
require 'engtagger'
|
||||
@tagger ||= EngTagger.new
|
||||
end
|
||||
|
||||
# Gingerice text correction service
|
||||
def self.gingerice
|
||||
require 'gingerice'
|
||||
Gingerice::Parser.new # No caching for this one
|
||||
end
|
||||
|
||||
# For decoding html entities
|
||||
# Lazily load HTML entity decoder
|
||||
# @return [HTMLEntities]
|
||||
def self.htmlentities
|
||||
require 'htmlentities'
|
||||
@htmlentities ||= HTMLEntities.new
|
||||
|
@ -44,7 +48,9 @@ module Ebooks
|
|||
|
||||
### Utility functions
|
||||
|
||||
# We don't really want to deal with all this weird unicode punctuation
|
||||
# Normalize some strange unicode punctuation variants
|
||||
# @param text [String]
|
||||
# @return [String]
|
||||
def self.normalize(text)
|
||||
htmlentities.decode text.gsub('“', '"').gsub('”', '"').gsub('’', "'").gsub('…', '...')
|
||||
end
|
||||
|
@ -53,6 +59,8 @@ module Ebooks
|
|||
# We use ad hoc approach because fancy libraries do not deal
|
||||
# especially well with tweet formatting, and we can fake solving
|
||||
# the quote problem during generation
|
||||
# @param text [String]
|
||||
# @return [Array<String>]
|
||||
def self.sentences(text)
|
||||
text.split(/\n+|(?<=[.?!])\s+/)
|
||||
end
|
||||
|
@ -60,15 +68,23 @@ module Ebooks
|
|||
# Split a sentence into word-level tokens
|
||||
# As above, this is ad hoc because tokenization libraries
|
||||
# do not behave well wrt. things like emoticons and timestamps
|
||||
# @param sentence [String]
|
||||
# @return [Array<String>]
|
||||
def self.tokenize(sentence)
|
||||
regex = /\s+|(?<=[#{PUNCTUATION}]\s)(?=[a-zA-Z])|(?<=[a-zA-Z])(?=[#{PUNCTUATION}]+\s)/
|
||||
sentence.split(regex)
|
||||
end
|
||||
|
||||
# Get the 'stem' form of a word e.g. 'cats' -> 'cat'
|
||||
# @param word [String]
|
||||
# @return [String]
|
||||
def self.stem(word)
|
||||
Stemmer::stem_word(word.downcase)
|
||||
end
|
||||
|
||||
# Use highscore gem to find interesting keywords in a corpus
|
||||
# @param text [String]
|
||||
# @return [Highscore::Keywords]
|
||||
def self.keywords(text)
|
||||
# Preprocess to remove stopwords (highscore's blacklist is v. slow)
|
||||
text = NLP.tokenize(text).reject { |t| stopword?(t) }.join(' ')
|
||||
|
@ -90,7 +106,10 @@ module Ebooks
|
|||
text.keywords
|
||||
end
|
||||
|
||||
# Takes a list of tokens and builds a nice-looking sentence
|
||||
# Builds a proper sentence from a list of tikis
|
||||
# @param tikis [Array<Integer>]
|
||||
# @param tokens [Array<String>]
|
||||
# @return [String]
|
||||
def self.reconstruct(tikis, tokens)
|
||||
text = ""
|
||||
last_token = nil
|
||||
|
@ -105,6 +124,9 @@ module Ebooks
|
|||
end
|
||||
|
||||
# Determine if we need to insert a space between two tokens
|
||||
# @param token1 [String]
|
||||
# @param token2 [String]
|
||||
# @return [Boolean]
|
||||
def self.space_between?(token1, token2)
|
||||
p1 = self.punctuation?(token1)
|
||||
p2 = self.punctuation?(token2)
|
||||
|
@ -119,10 +141,16 @@ module Ebooks
|
|||
end
|
||||
end
|
||||
|
||||
# Is this token comprised of punctuation?
|
||||
# @param token [String]
|
||||
# @return [Boolean]
|
||||
def self.punctuation?(token)
|
||||
(token.chars.to_set - PUNCTUATION.chars.to_set).empty?
|
||||
end
|
||||
|
||||
# Is this token a stopword?
|
||||
# @param token [String]
|
||||
# @return [Boolean]
|
||||
def self.stopword?(token)
|
||||
@stopword_set ||= stopwords.map(&:downcase).to_set
|
||||
@stopword_set.include?(token.downcase)
|
||||
|
@ -130,7 +158,9 @@ module Ebooks
|
|||
|
||||
# Determine if a sample of text contains unmatched brackets or quotes
|
||||
# This is one of the more frequent and noticeable failure modes for
|
||||
# the markov generator; we can just tell it to retry
|
||||
# the generator; we can just tell it to retry
|
||||
# @param text [String]
|
||||
# @return [Boolean]
|
||||
def self.unmatched_enclosers?(text)
|
||||
enclosers = ['**', '""', '()', '[]', '``', "''"]
|
||||
enclosers.each do |pair|
|
||||
|
@ -153,10 +183,13 @@ module Ebooks
|
|||
end
|
||||
|
||||
# Determine if a2 is a subsequence of a1
|
||||
# @param a1 [Array]
|
||||
# @param a2 [Array]
|
||||
# @return [Boolean]
|
||||
def self.subseq?(a1, a2)
|
||||
a1.each_index.find do |i|
|
||||
!a1.each_index.find do |i|
|
||||
a1[i...i+a2.length] == a2
|
||||
end
|
||||
end.nil?
|
||||
end
|
||||
end
|
||||
end
|
||||
|
|
|
@ -1,11 +1,14 @@
|
|||
# encoding: utf-8
|
||||
|
||||
module Ebooks
|
||||
# This generator uses data identical to the markov model, but
|
||||
# This generator uses data identical to a markov model, but
|
||||
# instead of making a chain by looking up bigrams it uses the
|
||||
# positions to randomly replace suffixes in one sentence with
|
||||
# matching suffixes in another
|
||||
class SuffixGenerator
|
||||
# Build a generator from a corpus of tikified sentences
|
||||
# @param sentences [Array<Array<Integer>>]
|
||||
# @return [SuffixGenerator]
|
||||
def self.build(sentences)
|
||||
SuffixGenerator.new(sentences)
|
||||
end
|
||||
|
@ -39,6 +42,11 @@ module Ebooks
|
|||
self
|
||||
end
|
||||
|
||||
|
||||
# Generate a recombined sequence of tikis
|
||||
# @param passes [Integer] number of times to recombine
|
||||
# @param n [Symbol] :unigrams or :bigrams (affects how conservative the model is)
|
||||
# @return [Array<Integer>]
|
||||
def generate(passes=5, n=:unigrams)
|
||||
index = rand(@sentences.length)
|
||||
tikis = @sentences[index]
|
||||
|
|
|
@ -1,3 +1,3 @@
|
|||
module Ebooks
|
||||
VERSION = "2.3.2"
|
||||
VERSION = "3.0.0"
|
||||
end
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
source 'http://rubygems.org'
|
||||
ruby '1.9.3'
|
||||
ruby '{{RUBY_VERSION}}'
|
||||
|
||||
gem 'twitter_ebooks'
|
||||
|
|
|
@ -1 +1 @@
|
|||
worker: ruby run.rb start
|
||||
worker: ebooks start
|
||||
|
|
59
skeleton/bots.rb
Executable file → Normal file
59
skeleton/bots.rb
Executable file → Normal file
|
@ -1,42 +1,55 @@
|
|||
#!/usr/bin/env ruby
|
||||
|
||||
require 'twitter_ebooks'
|
||||
|
||||
# This is an example bot definition with event handlers commented out
|
||||
# You can define as many of these as you like; they will run simultaneously
|
||||
# You can define and instantiate as many bots as you like
|
||||
|
||||
Ebooks::Bot.new("{{BOT_NAME}}") do |bot|
|
||||
# Consumer details come from registering an app at https://dev.twitter.com/
|
||||
# OAuth details can be fetched with https://github.com/marcel/twurl
|
||||
bot.consumer_key = "" # Your app consumer key
|
||||
bot.consumer_secret = "" # Your app consumer secret
|
||||
bot.oauth_token = "" # Token connecting the app to this account
|
||||
bot.oauth_token_secret = "" # Secret connecting the app to this account
|
||||
class MyBot < Ebooks::Bot
|
||||
# Configuration here applies to all MyBots
|
||||
def configure
|
||||
# Consumer details come from registering an app at https://dev.twitter.com/
|
||||
# Once you have consumer details, use "ebooks auth" for new access tokens
|
||||
self.consumer_key = '' # Your app consumer key
|
||||
self.consumer_secret = '' # Your app consumer secret
|
||||
|
||||
bot.on_message do |dm|
|
||||
# Users to block instead of interacting with
|
||||
self.blacklist = ['tnietzschequote']
|
||||
|
||||
# Range in seconds to randomize delay when bot.delay is called
|
||||
self.delay_range = 1..6
|
||||
end
|
||||
|
||||
def on_startup
|
||||
scheduler.every '24h' do
|
||||
# Tweet something every 24 hours
|
||||
# See https://github.com/jmettraux/rufus-scheduler
|
||||
# bot.tweet("hi")
|
||||
# bot.pictweet("hi", "cuteselfie.jpg")
|
||||
end
|
||||
end
|
||||
|
||||
def on_message(dm)
|
||||
# Reply to a DM
|
||||
# bot.reply(dm, "secret secrets")
|
||||
end
|
||||
|
||||
bot.on_follow do |user|
|
||||
def on_follow(user)
|
||||
# Follow a user back
|
||||
# bot.follow(user[:screen_name])
|
||||
end
|
||||
|
||||
bot.on_mention do |tweet, meta|
|
||||
def on_mention(tweet)
|
||||
# Reply to a mention
|
||||
# bot.reply(tweet, meta[:reply_prefix] + "oh hullo")
|
||||
# bot.reply(tweet, meta(tweet)[:reply_prefix] + "oh hullo")
|
||||
end
|
||||
|
||||
bot.on_timeline do |tweet, meta|
|
||||
def on_timeline(tweet)
|
||||
# Reply to a tweet in the bot's timeline
|
||||
# bot.reply(tweet, meta[:reply_prefix] + "nice tweet")
|
||||
end
|
||||
|
||||
bot.scheduler.every '24h' do
|
||||
# Tweet something every 24 hours
|
||||
# See https://github.com/jmettraux/rufus-scheduler
|
||||
# bot.tweet("hi")
|
||||
# bot.pictweet("hi", "cuteselfie.jpg", ":possibly_sensitive => true")
|
||||
# bot.reply(tweet, meta(tweet)[:reply_prefix] + "nice tweet")
|
||||
end
|
||||
end
|
||||
|
||||
# Make a MyBot and attach it to an account
|
||||
MyBot.new("{{BOT_NAME}}") do |bot|
|
||||
bot.access_token = "" # Token connecting the app to this account
|
||||
bot.access_token_secret = "" # Secret connecting the app to this account
|
||||
end
|
||||
|
|
|
@ -1,9 +0,0 @@
|
|||
#!/usr/bin/env ruby
|
||||
|
||||
require_relative 'bots'
|
||||
|
||||
EM.run do
|
||||
Ebooks::Bot.all.each do |bot|
|
||||
bot.start
|
||||
end
|
||||
end
|
|
@ -3,13 +3,10 @@ require 'memory_profiler'
|
|||
require 'tempfile'
|
||||
require 'timecop'
|
||||
|
||||
def Process.rss; `ps -o rss= -p #{Process.pid}`.chomp.to_i; end
|
||||
|
||||
class TestBot < Ebooks::Bot
|
||||
attr_accessor :twitter
|
||||
|
||||
def configure
|
||||
self.username = "test_ebooks"
|
||||
end
|
||||
|
||||
def on_direct_message(dm)
|
||||
|
@ -17,7 +14,7 @@ class TestBot < Ebooks::Bot
|
|||
end
|
||||
|
||||
def on_mention(tweet, meta)
|
||||
reply tweet, "echo: #{meta[:mentionless]}"
|
||||
reply tweet, "echo: #{meta.mentionless}"
|
||||
end
|
||||
|
||||
def on_timeline(tweet, meta)
|
||||
|
@ -43,10 +40,11 @@ module Ebooks::Test
|
|||
# Creates a mock tweet
|
||||
# @param username User sending the tweet
|
||||
# @param text Tweet content
|
||||
def mock_tweet(username, text)
|
||||
def mock_tweet(username, text, extra={})
|
||||
mentions = text.split.find_all { |x| x.start_with?('@') }
|
||||
Twitter::Tweet.new(
|
||||
tweet = Twitter::Tweet.new({
|
||||
id: twitter_id,
|
||||
in_reply_to_status_id: 'mock-link',
|
||||
user: { id: twitter_id, screen_name: username },
|
||||
text: text,
|
||||
created_at: Time.now.to_s,
|
||||
|
@ -56,29 +54,36 @@ module Ebooks::Test
|
|||
indices: [text.index(m), text.index(m)+m.length] }
|
||||
}
|
||||
}
|
||||
)
|
||||
}.merge!(extra))
|
||||
tweet
|
||||
end
|
||||
|
||||
def twitter_spy(bot)
|
||||
twitter = spy("twitter")
|
||||
allow(twitter).to receive(:update).and_return(mock_tweet(bot.username, "test tweet"))
|
||||
twitter
|
||||
end
|
||||
|
||||
def simulate(bot, &b)
|
||||
bot.twitter = spy("twitter")
|
||||
bot.twitter = twitter_spy(bot)
|
||||
b.call
|
||||
end
|
||||
|
||||
def expect_direct_message(bot, content)
|
||||
expect(bot.twitter).to have_received(:create_direct_message).with(anything(), content, {})
|
||||
bot.twitter = spy("twitter")
|
||||
bot.twitter = twitter_spy(bot)
|
||||
end
|
||||
|
||||
def expect_tweet(bot, content)
|
||||
expect(bot.twitter).to have_received(:update).with(content, anything())
|
||||
bot.twitter = spy("twitter")
|
||||
bot.twitter = twitter_spy(bot)
|
||||
end
|
||||
end
|
||||
|
||||
|
||||
describe Ebooks::Bot do
|
||||
include Ebooks::Test
|
||||
let(:bot) { TestBot.new }
|
||||
let(:bot) { TestBot.new('test_ebooks') }
|
||||
|
||||
before { Timecop.freeze }
|
||||
after { Timecop.return }
|
||||
|
@ -104,6 +109,20 @@ describe Ebooks::Bot do
|
|||
end
|
||||
end
|
||||
|
||||
it "links tweets to conversations correctly" do
|
||||
tweet1 = mock_tweet("m1sp", "tweet 1", id: 1, in_reply_to_status_id: nil)
|
||||
|
||||
tweet2 = mock_tweet("m1sp", "tweet 2", id: 2, in_reply_to_status_id: 1)
|
||||
|
||||
tweet3 = mock_tweet("m1sp", "tweet 3", id: 3, in_reply_to_status_id: nil)
|
||||
|
||||
bot.conversation(tweet1).add(tweet1)
|
||||
expect(bot.conversation(tweet2)).to eq(bot.conversation(tweet1))
|
||||
|
||||
bot.conversation(tweet2).add(tweet2)
|
||||
expect(bot.conversation(tweet3)).to_not eq(bot.conversation(tweet2))
|
||||
end
|
||||
|
||||
it "stops mentioning people after a certain limit" do
|
||||
simulate(bot) do
|
||||
bot.receive_event(mock_tweet("spammer", "@test_ebooks @m1sp 1"))
|
||||
|
|
File diff suppressed because it is too large
Load diff
|
@ -1,18 +0,0 @@
|
|||
#!/usr/bin/env ruby
|
||||
# encoding: utf-8
|
||||
|
||||
require 'twitter_ebooks'
|
||||
require 'minitest/autorun'
|
||||
require 'benchmark'
|
||||
|
||||
module Ebooks
|
||||
class TestKeywords < Minitest::Test
|
||||
corpus = NLP.normalize(File.read(ARGV[0]))
|
||||
puts "Finding and ranking keywords"
|
||||
puts Benchmark.measure {
|
||||
NLP.keywords(corpus).top(50).each do |keyword|
|
||||
puts "#{keyword.text} #{keyword.weight}"
|
||||
end
|
||||
}
|
||||
end
|
||||
end
|
|
@ -1,18 +0,0 @@
|
|||
#!/usr/bin/env ruby
|
||||
# encoding: utf-8
|
||||
|
||||
require 'twitter_ebooks'
|
||||
require 'minitest/autorun'
|
||||
|
||||
module Ebooks
|
||||
class TestTokenize < Minitest::Test
|
||||
corpus = NLP.normalize(File.read(TEST_CORPUS_PATH))
|
||||
sents = NLP.sentences(corpus).sample(10)
|
||||
|
||||
NLP.sentences(corpus).sample(10).each do |sent|
|
||||
p sent
|
||||
p NLP.tokenize(sent)
|
||||
puts
|
||||
end
|
||||
end
|
||||
end
|
|
@ -18,8 +18,9 @@ Gem::Specification.new do |gem|
|
|||
gem.add_development_dependency 'rspec'
|
||||
gem.add_development_dependency 'rspec-mocks'
|
||||
gem.add_development_dependency 'memory_profiler'
|
||||
gem.add_development_dependency 'pry-byebug'
|
||||
gem.add_development_dependency 'timecop'
|
||||
gem.add_development_dependency 'pry-byebug'
|
||||
gem.add_development_dependency 'yard'
|
||||
|
||||
gem.add_runtime_dependency 'twitter', '~> 5.0'
|
||||
gem.add_runtime_dependency 'simple_oauth'
|
||||
|
@ -30,4 +31,5 @@ Gem::Specification.new do |gem|
|
|||
gem.add_runtime_dependency 'engtagger'
|
||||
gem.add_runtime_dependency 'fast-stemmer'
|
||||
gem.add_runtime_dependency 'highscore'
|
||||
gem.add_runtime_dependency 'pry'
|
||||
end
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue