Merge remote-tracking branch 'mispy/master'
This commit is contained in:
commit
5888d771e8
31 changed files with 216797 additions and 4 deletions
6
.gitignore
vendored
6
.gitignore
vendored
|
@ -1 +1,5 @@
|
|||
corpus/
|
||||
.*.swp
|
||||
Gemfile.lock
|
||||
pkg
|
||||
.yardoc
|
||||
doc
|
||||
|
|
1
.rspec
Normal file
1
.rspec
Normal file
|
@ -0,0 +1 @@
|
|||
--color
|
7
.travis.yml
Normal file
7
.travis.yml
Normal file
|
@ -0,0 +1,7 @@
|
|||
rvm:
|
||||
- 2.1.4
|
||||
script:
|
||||
- rspec spec
|
||||
notifications:
|
||||
email:
|
||||
- ebooks@mispy.me
|
6
Gemfile
6
Gemfile
|
@ -1,4 +1,4 @@
|
|||
source 'http://rubygems.org'
|
||||
ruby '2.2.0'
|
||||
source 'https://rubygems.org'
|
||||
|
||||
gem 'twitter_ebooks'
|
||||
# Specify your gem's dependencies in twitter_ebooks.gemspec
|
||||
gemspec
|
||||
|
|
22
LICENSE
Normal file
22
LICENSE
Normal file
|
@ -0,0 +1,22 @@
|
|||
Copyright (c) 2013 Jaiden Mispy
|
||||
|
||||
MIT License
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining
|
||||
a copy of this software and associated documentation files (the
|
||||
"Software"), to deal in the Software without restriction, including
|
||||
without limitation the rights to use, copy, modify, merge, publish,
|
||||
distribute, sublicense, and/or sell copies of the Software, and to
|
||||
permit persons to whom the Software is furnished to do so, subject to
|
||||
the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be
|
||||
included in all copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
||||
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
||||
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
||||
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
||||
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
147
README.md
Normal file
147
README.md
Normal file
|
@ -0,0 +1,147 @@
|
|||
# twitter\_ebooks
|
||||
|
||||
[](http://badge.fury.io/rb/twitter_ebooks)
|
||||
[](https://travis-ci.org/mispy/twitter_ebooks)
|
||||
[](https://gemnasium.com/mispy/twitter_ebooks)
|
||||
|
||||
A framework for building interactive twitterbots which respond to mentions/DMs. See [ebooks_example](https://github.com/mispy/ebooks_example) for a fully-fledged bot definition.
|
||||
|
||||
## New in 3.0
|
||||
|
||||
- About 80% less memory and storage use for models
|
||||
- Bots run in their own threads (no eventmachine), and startup is parallelized
|
||||
- Bots start with `ebooks start`, and no longer die on unhandled exceptions
|
||||
- `ebooks auth` command will create new access tokens, for running multiple bots
|
||||
- `ebooks console` starts a ruby interpreter with bots loaded (see Ebooks::Bot.all)
|
||||
- Replies are slightly rate-limited to prevent infinite bot convos
|
||||
- Non-participating users in a mention chain will be dropped after a few tweets
|
||||
- [API documentation](http://rdoc.info/github/mispy/twitter_ebooks) and tests
|
||||
|
||||
Note that 3.0 is not backwards compatible with 2.x, so upgrade carefully! In particular, **make sure to regenerate your models** since the storage format changed.
|
||||
|
||||
## Installation
|
||||
|
||||
Requires Ruby 2.0+
|
||||
|
||||
```bash
|
||||
gem install twitter_ebooks
|
||||
```
|
||||
|
||||
## Setting up a bot
|
||||
|
||||
Run `ebooks new <reponame>` to generate a new repository containing a sample bots.rb file, which looks like this:
|
||||
|
||||
``` ruby
|
||||
# This is an example bot definition with event handlers commented out
|
||||
# You can define and instantiate as many bots as you like
|
||||
|
||||
class MyBot < Ebooks::Bot
|
||||
# Configuration here applies to all MyBots
|
||||
def configure
|
||||
# Consumer details come from registering an app at https://dev.twitter.com/
|
||||
# Once you have consumer details, use "ebooks auth" for new access tokens
|
||||
self.consumer_key = "" # Your app consumer key
|
||||
self.consumer_secret = "" # Your app consumer secret
|
||||
|
||||
# Users to block instead of interacting with
|
||||
self.blacklist = ['tnietzschequote']
|
||||
|
||||
# Range in seconds to randomize delay when bot.delay is called
|
||||
self.delay_range = 1..6
|
||||
end
|
||||
|
||||
def on_startup
|
||||
scheduler.every '24h' do
|
||||
# Tweet something every 24 hours
|
||||
# See https://github.com/jmettraux/rufus-scheduler
|
||||
# tweet("hi")
|
||||
# pictweet("hi", "cuteselfie.jpg")
|
||||
end
|
||||
end
|
||||
|
||||
def on_message(dm)
|
||||
# Reply to a DM
|
||||
# reply(dm, "secret secrets")
|
||||
end
|
||||
|
||||
def on_follow(user)
|
||||
# Follow a user back
|
||||
# follow(user.screen_name)
|
||||
end
|
||||
|
||||
def on_mention(tweet)
|
||||
# Reply to a mention
|
||||
# reply(tweet, meta(tweet).reply_prefix + "oh hullo")
|
||||
end
|
||||
|
||||
def on_timeline(tweet)
|
||||
# Reply to a tweet in the bot's timeline
|
||||
# reply(tweet, meta(tweet).reply_prefix + "nice tweet")
|
||||
end
|
||||
end
|
||||
|
||||
# Make a MyBot and attach it to an account
|
||||
MyBot.new("abby_ebooks") do |bot|
|
||||
bot.access_token = "" # Token connecting the app to this account
|
||||
bot.access_token_secret = "" # Secret connecting the app to this account
|
||||
end
|
||||
```
|
||||
|
||||
`ebooks start` will run all defined bots in their own threads. The easiest way to run bots in a semi-permanent fashion is with [Heroku](https://www.heroku.com); just make an app, push the bot repository to it, enable a worker process in the web interface and it ought to chug along merrily forever.
|
||||
|
||||
The underlying streaming and REST clients from the [twitter gem](https://github.com/sferik/twitter) can be accessed at `bot.stream` and `bot.twitter` respectively.
|
||||
|
||||
## Archiving accounts
|
||||
|
||||
twitter\_ebooks comes with a syncing tool to download and then incrementally update a local json archive of a user's tweets (in this case, my good friend @0xabad1dea):
|
||||
|
||||
``` zsh
|
||||
➜ ebooks archive 0xabad1dea corpus/0xabad1dea.json
|
||||
Currently 20209 tweets for 0xabad1dea
|
||||
Received 67 new tweets
|
||||
```
|
||||
|
||||
The first time you'll run this, it'll ask for auth details to connect with. Due to API limitations, for users with high numbers of tweets it may not be possible to get their entire history in the initial download. However, so long as you run it frequently enough you can maintain a perfect copy indefinitely into the future.
|
||||
|
||||
## Text models
|
||||
|
||||
In order to use the included text modeling, you'll first need to preprocess your archive into a more efficient form:
|
||||
|
||||
``` zsh
|
||||
➜ ebooks consume corpus/0xabad1dea.json
|
||||
Reading json corpus from corpus/0xabad1dea.json
|
||||
Removing commented lines and sorting mentions
|
||||
Segmenting text into sentences
|
||||
Tokenizing 7075 statements and 17947 mentions
|
||||
Ranking keywords
|
||||
Corpus consumed to model/0xabad1dea.model
|
||||
```
|
||||
|
||||
Notably, this works with both json tweet archives and plaintext files (based on file extension), so you can make a model out of any kind of text.
|
||||
|
||||
Text files use newlines and full stops to seperate statements.
|
||||
|
||||
Once you have a model, the primary use is to produce statements and related responses to input, using a pseudo-Markov generator:
|
||||
|
||||
``` ruby
|
||||
> model = Ebooks::Model.load("model/0xabad1dea.model")
|
||||
> model.make_statement(140)
|
||||
=> "My Terrible Netbook may be the kind of person who buys Starbucks, but this Rackspace vuln is pretty straight up a backdoor"
|
||||
> model.make_response("The NSA is coming!", 130)
|
||||
=> "Hey - someone who claims to be an NSA conspiracy"
|
||||
```
|
||||
|
||||
The secondary function is the "interesting keywords" list. For example, I use this to determine whether a bot wants to fav/retweet/reply to something in its timeline:
|
||||
|
||||
``` ruby
|
||||
top100 = model.keywords.take(100)
|
||||
tokens = Ebooks::NLP.tokenize(tweet[:text])
|
||||
|
||||
if tokens.find { |t| top100.include?(t) }
|
||||
bot.favorite(tweet[:id])
|
||||
end
|
||||
```
|
||||
|
||||
## Bot niceness
|
||||
|
||||
twitter_ebooks will drop bystanders from mentions for you and avoid infinite bot conversations, but it won't prevent you from doing a lot of other spammy things. Make sure your bot is a good and polite citizen!
|
2
Rakefile
Normal file
2
Rakefile
Normal file
|
@ -0,0 +1,2 @@
|
|||
#!/usr/bin/env rake
|
||||
require "bundler/gem_tasks"
|
389
bin/ebooks
Executable file
389
bin/ebooks
Executable file
|
@ -0,0 +1,389 @@
|
|||
#!/usr/bin/env ruby
|
||||
# encoding: utf-8
|
||||
|
||||
require 'twitter_ebooks'
|
||||
require 'ostruct'
|
||||
require 'fileutils'
|
||||
|
||||
module Ebooks::Util
|
||||
def pretty_exception(e)
|
||||
|
||||
end
|
||||
end
|
||||
|
||||
module Ebooks::CLI
|
||||
APP_PATH = Dir.pwd # XXX do some recursive thing instead
|
||||
HELP = OpenStruct.new
|
||||
|
||||
HELP.default = <<STR
|
||||
Usage:
|
||||
ebooks help <command>
|
||||
|
||||
ebooks new <reponame>
|
||||
ebooks s[tart]
|
||||
ebooks c[onsole]
|
||||
ebooks auth
|
||||
ebooks consume <corpus_path> [corpus_path2] [...]
|
||||
ebooks consume-all <model_name> <corpus_path> [corpus_path2] [...]
|
||||
ebooks gen <model_path> [input]
|
||||
ebooks archive <username> [path]
|
||||
ebooks tweet <model_path> <botname>
|
||||
STR
|
||||
|
||||
def self.help(command=nil)
|
||||
if command.nil?
|
||||
log HELP.default
|
||||
else
|
||||
log HELP[command].gsub(/^ {4}/, '')
|
||||
end
|
||||
end
|
||||
|
||||
HELP.new = <<-STR
|
||||
Usage: ebooks new <reponame>
|
||||
|
||||
Creates a new skeleton repository defining a template bot in
|
||||
the current working directory specified by <reponame>.
|
||||
STR
|
||||
|
||||
def self.new(reponame)
|
||||
if reponame.nil?
|
||||
help :new
|
||||
exit 1
|
||||
end
|
||||
|
||||
path = "./#{reponame}"
|
||||
|
||||
if File.exists?(path)
|
||||
log "#{path} already exists. Please remove if you want to recreate."
|
||||
exit 1
|
||||
end
|
||||
|
||||
FileUtils.cp_r(Ebooks::SKELETON_PATH, path)
|
||||
FileUtils.mv(File.join(path, 'gitignore'), File.join(path, '.gitignore'))
|
||||
|
||||
File.open(File.join(path, 'bots.rb'), 'w') do |f|
|
||||
template = File.read(File.join(Ebooks::SKELETON_PATH, 'bots.rb'))
|
||||
f.write(template.gsub("{{BOT_NAME}}", reponame))
|
||||
end
|
||||
|
||||
File.open(File.join(path, 'Gemfile'), 'w') do |f|
|
||||
template = File.read(File.join(Ebooks::SKELETON_PATH, 'Gemfile'))
|
||||
f.write(template.gsub("{{RUBY_VERSION}}", RUBY_VERSION))
|
||||
end
|
||||
|
||||
log "New twitter_ebooks app created at #{reponame}"
|
||||
end
|
||||
|
||||
HELP.consume = <<-STR
|
||||
Usage: ebooks consume <corpus_path> [corpus_path2] [...]
|
||||
|
||||
Processes some number of text files or json tweet corpuses
|
||||
into usable models. These will be output at model/<corpus_name>.model
|
||||
STR
|
||||
|
||||
def self.consume(pathes)
|
||||
if pathes.empty?
|
||||
help :consume
|
||||
exit 1
|
||||
end
|
||||
|
||||
pathes.each do |path|
|
||||
filename = File.basename(path)
|
||||
shortname = filename.split('.')[0..-2].join('.')
|
||||
|
||||
outpath = File.join(APP_PATH, 'model', "#{shortname}.model")
|
||||
Ebooks::Model.consume(path).save(outpath)
|
||||
log "Corpus consumed to #{outpath}"
|
||||
end
|
||||
end
|
||||
|
||||
HELP.consume_all = <<-STR
|
||||
Usage: ebooks consume-all <model_name> <corpus_path> [corpus_path2] [...]
|
||||
|
||||
Processes some number of text files or json tweet corpuses
|
||||
into one usable model. It will be output at model/<model_name>.model
|
||||
STR
|
||||
|
||||
def self.consume_all(name, paths)
|
||||
if paths.empty?
|
||||
help :consume_all
|
||||
exit 1
|
||||
end
|
||||
|
||||
outpath = File.join(APP_PATH, 'model', "#{name}.model")
|
||||
Ebooks::Model.consume_all(paths).save(outpath)
|
||||
log "Corpuses consumed to #{outpath}"
|
||||
end
|
||||
|
||||
HELP.jsonify = <<-STR
|
||||
Usage: ebooks jsonify <tweets.csv> [tweets.csv2] [...]
|
||||
|
||||
Takes a csv twitter archive and converts it to json.
|
||||
STR
|
||||
|
||||
def self.jsonify(paths)
|
||||
if paths.empty?
|
||||
log usage
|
||||
exit
|
||||
end
|
||||
|
||||
paths.each do |path|
|
||||
name = File.basename(path).split('.')[0]
|
||||
new_path = name + ".json"
|
||||
|
||||
tweets = []
|
||||
id = nil
|
||||
if path.split('.')[-1] == "csv" #from twitter archive
|
||||
csv_archive = CSV.read(path, :headers=>:first_row)
|
||||
tweets = csv_archive.map do |tweet|
|
||||
{ text: tweet['text'], id: tweet['tweet_id'] }
|
||||
end
|
||||
else
|
||||
File.read(path).split("\n").each do |l|
|
||||
if l.start_with?('# ')
|
||||
id = l.split('# ')[-1]
|
||||
else
|
||||
tweet = { text: l }
|
||||
if id
|
||||
tweet[:id] = id
|
||||
id = nil
|
||||
end
|
||||
tweets << tweet
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
File.open(new_path, 'w') do |f|
|
||||
log "Writing #{tweets.length} tweets to #{new_path}"
|
||||
f.write(JSON.pretty_generate(tweets))
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
|
||||
HELP.gen = <<-STR
|
||||
Usage: ebooks gen <model_path> [input]
|
||||
|
||||
Make a test tweet from the processed model at <model_path>.
|
||||
Will respond to input if provided.
|
||||
STR
|
||||
|
||||
def self.gen(model_path, input)
|
||||
if model_path.nil?
|
||||
help :gen
|
||||
exit 1
|
||||
end
|
||||
|
||||
model = Ebooks::Model.load(model_path)
|
||||
if input && !input.empty?
|
||||
puts "@cmd " + model.make_response(input, 135)
|
||||
else
|
||||
puts model.make_statement
|
||||
end
|
||||
end
|
||||
|
||||
HELP.archive = <<-STR
|
||||
Usage: ebooks archive <username> [outpath]
|
||||
|
||||
Downloads a json corpus of the <username>'s tweets.
|
||||
Output defaults to corpus/<username>.json
|
||||
Due to API limitations, this can only receive up to ~3000 tweets
|
||||
into the past.
|
||||
STR
|
||||
|
||||
def self.archive(username, outpath=nil)
|
||||
if username.nil?
|
||||
help :archive
|
||||
exit 1
|
||||
end
|
||||
|
||||
Ebooks::Archive.new(username, outpath).sync
|
||||
end
|
||||
|
||||
HELP.tweet = <<-STR
|
||||
Usage: ebooks tweet <model_path> <botname>
|
||||
|
||||
Sends a public tweet from the specified bot using text
|
||||
from the processed model at <model_path>.
|
||||
STR
|
||||
|
||||
def self.tweet(modelpath, botname)
|
||||
if modelpath.nil? || botname.nil?
|
||||
help :tweet
|
||||
exit 1
|
||||
end
|
||||
|
||||
load File.join(APP_PATH, 'bots.rb')
|
||||
model = Ebooks::Model.load(modelpath)
|
||||
statement = model.make_statement
|
||||
bot = Ebooks::Bot.get(botname)
|
||||
bot.configure
|
||||
bot.tweet(statement)
|
||||
end
|
||||
|
||||
HELP.auth = <<-STR
|
||||
Usage: ebooks auth
|
||||
|
||||
Authenticates your Twitter app for any account. By default, will
|
||||
use the consumer key and secret from the first defined bot. You
|
||||
can specify another by setting the CONSUMER_KEY and CONSUMER_SECRET
|
||||
environment variables.
|
||||
STR
|
||||
|
||||
def self.auth
|
||||
consumer_key, consumer_secret = find_consumer
|
||||
require 'oauth'
|
||||
|
||||
consumer = OAuth::Consumer.new(
|
||||
consumer_key,
|
||||
consumer_secret,
|
||||
site: 'https://twitter.com/',
|
||||
scheme: :header
|
||||
)
|
||||
|
||||
request_token = consumer.get_request_token
|
||||
auth_url = request_token.authorize_url()
|
||||
|
||||
pin = nil
|
||||
loop do
|
||||
log auth_url
|
||||
|
||||
log "Go to the above url and follow the prompts, then enter the PIN code here."
|
||||
print "> "
|
||||
|
||||
pin = STDIN.gets.chomp
|
||||
|
||||
break unless pin.empty?
|
||||
end
|
||||
|
||||
access_token = request_token.get_access_token(oauth_verifier: pin)
|
||||
|
||||
log "Account authorized successfully. Make sure to put these in your bots.rb!\n" +
|
||||
" access token: #{access_token.token}\n" +
|
||||
" access token secret: #{access_token.secret}"
|
||||
end
|
||||
|
||||
HELP.console = <<-STR
|
||||
Usage: ebooks c[onsole]
|
||||
|
||||
Starts an interactive ruby session with your bots loaded
|
||||
and configured.
|
||||
STR
|
||||
|
||||
def self.console
|
||||
load_bots
|
||||
require 'pry'; Ebooks.module_exec { pry }
|
||||
end
|
||||
|
||||
HELP.start = <<-STR
|
||||
Usage: ebooks s[tart] [botname]
|
||||
|
||||
Starts running bots. If botname is provided, only runs that bot.
|
||||
STR
|
||||
|
||||
def self.start(botname=nil)
|
||||
load_bots
|
||||
|
||||
if botname.nil?
|
||||
bots = Ebooks::Bot.all
|
||||
else
|
||||
bots = Ebooks::Bot.all.select { |bot| bot.username == botname }
|
||||
if bots.empty?
|
||||
log "Couldn't find a defined bot for @#{botname}!"
|
||||
exit 1
|
||||
end
|
||||
end
|
||||
|
||||
threads = []
|
||||
bots.each do |bot|
|
||||
threads << Thread.new { bot.prepare }
|
||||
end
|
||||
threads.each(&:join)
|
||||
|
||||
threads = []
|
||||
bots.each do |bot|
|
||||
threads << Thread.new do
|
||||
loop do
|
||||
begin
|
||||
bot.start
|
||||
rescue Exception => e
|
||||
bot.log e.inspect
|
||||
puts e.backtrace.map { |s| "\t"+s }.join("\n")
|
||||
end
|
||||
bot.log "Sleeping before reconnect"
|
||||
sleep 60
|
||||
end
|
||||
end
|
||||
end
|
||||
threads.each(&:join)
|
||||
end
|
||||
|
||||
# Non-command methods
|
||||
|
||||
def self.find_consumer
|
||||
if ENV['CONSUMER_KEY'] && ENV['CONSUMER_SECRET']
|
||||
log "Using consumer details from environment variables:\n" +
|
||||
" consumer key: #{ENV['CONSUMER_KEY']}\n" +
|
||||
" consumer secret: #{ENV['CONSUMER_SECRET']}"
|
||||
return [ENV['CONSUMER_KEY'], ENV['CONSUMER_SECRET']]
|
||||
end
|
||||
|
||||
load_bots
|
||||
consumer_key = nil
|
||||
consumer_secret = nil
|
||||
Ebooks::Bot.all.each do |bot|
|
||||
if bot.consumer_key && bot.consumer_secret
|
||||
consumer_key = bot.consumer_key
|
||||
consumer_secret = bot.consumer_secret
|
||||
log "Using consumer details from @#{bot.username}:\n" +
|
||||
" consumer key: #{bot.consumer_key}\n" +
|
||||
" consumer secret: #{bot.consumer_secret}\n"
|
||||
return consumer_key, consumer_secret
|
||||
end
|
||||
end
|
||||
|
||||
if consumer_key.nil? || consumer_secret.nil?
|
||||
log "Couldn't find any consumer details to auth an account with.\n" +
|
||||
"Please either configure a bot with consumer_key and consumer_secret\n" +
|
||||
"or provide the CONSUMER_KEY and CONSUMER_SECRET environment variables."
|
||||
exit 1
|
||||
end
|
||||
end
|
||||
|
||||
def self.load_bots
|
||||
load 'bots.rb'
|
||||
|
||||
if Ebooks::Bot.all.empty?
|
||||
puts "Couldn't find any bots! Please make sure bots.rb instantiates at least one bot."
|
||||
end
|
||||
end
|
||||
|
||||
def self.command(args)
|
||||
if args.length == 0
|
||||
help
|
||||
exit 1
|
||||
end
|
||||
|
||||
case args[0]
|
||||
when "new" then new(args[1])
|
||||
when "consume" then consume(args[1..-1])
|
||||
when "consume-all" then consume_all(args[1], args[2..-1])
|
||||
when "gen" then gen(args[1], args[2..-1].join(' '))
|
||||
when "archive" then archive(args[1], args[2])
|
||||
when "tweet" then tweet(args[1], args[2])
|
||||
when "jsonify" then jsonify(args[1..-1])
|
||||
when "auth" then auth
|
||||
when "console" then console
|
||||
when "c" then console
|
||||
when "start" then start(args[1])
|
||||
when "s" then start(args[1])
|
||||
when "help" then help(args[1])
|
||||
else
|
||||
log "No such command '#{args[0]}'"
|
||||
help
|
||||
exit 1
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
Ebooks::CLI.command(ARGV)
|
1466
data/adjectives.txt
Normal file
1466
data/adjectives.txt
Normal file
File diff suppressed because it is too large
Load diff
2193
data/nouns.txt
Normal file
2193
data/nouns.txt
Normal file
File diff suppressed because it is too large
Load diff
843
data/stopwords.txt
Normal file
843
data/stopwords.txt
Normal file
|
@ -0,0 +1,843 @@
|
|||
a
|
||||
able
|
||||
about
|
||||
above
|
||||
abst
|
||||
accordance
|
||||
according
|
||||
accordingly
|
||||
across
|
||||
act
|
||||
actually
|
||||
added
|
||||
adj
|
||||
affected
|
||||
affecting
|
||||
affects
|
||||
after
|
||||
afterwards
|
||||
again
|
||||
against
|
||||
ah
|
||||
all
|
||||
almost
|
||||
alone
|
||||
along
|
||||
already
|
||||
also
|
||||
although
|
||||
always
|
||||
am
|
||||
among
|
||||
amongst
|
||||
an
|
||||
and
|
||||
announce
|
||||
another
|
||||
any
|
||||
anybody
|
||||
anyhow
|
||||
anymore
|
||||
anyone
|
||||
anything
|
||||
anyway
|
||||
anyways
|
||||
anywhere
|
||||
apparently
|
||||
approximately
|
||||
are
|
||||
aren
|
||||
arent
|
||||
arise
|
||||
around
|
||||
as
|
||||
aside
|
||||
ask
|
||||
asking
|
||||
at
|
||||
auth
|
||||
available
|
||||
away
|
||||
awfully
|
||||
b
|
||||
back
|
||||
be
|
||||
became
|
||||
because
|
||||
become
|
||||
becomes
|
||||
becoming
|
||||
been
|
||||
before
|
||||
beforehand
|
||||
begin
|
||||
beginning
|
||||
beginnings
|
||||
begins
|
||||
behind
|
||||
being
|
||||
believe
|
||||
below
|
||||
beside
|
||||
besides
|
||||
between
|
||||
beyond
|
||||
biol
|
||||
both
|
||||
brief
|
||||
briefly
|
||||
but
|
||||
by
|
||||
c
|
||||
ca
|
||||
came
|
||||
can
|
||||
cannot
|
||||
can't
|
||||
cause
|
||||
causes
|
||||
certain
|
||||
certainly
|
||||
co
|
||||
com
|
||||
come
|
||||
comes
|
||||
contain
|
||||
containing
|
||||
contains
|
||||
could
|
||||
couldnt
|
||||
d
|
||||
date
|
||||
did
|
||||
didn't
|
||||
different
|
||||
do
|
||||
does
|
||||
doesn't
|
||||
doing
|
||||
done
|
||||
don't
|
||||
down
|
||||
downwards
|
||||
due
|
||||
during
|
||||
e
|
||||
each
|
||||
ed
|
||||
edu
|
||||
effect
|
||||
eg
|
||||
eight
|
||||
eighty
|
||||
either
|
||||
else
|
||||
elsewhere
|
||||
end
|
||||
ending
|
||||
enough
|
||||
especially
|
||||
et
|
||||
et-al
|
||||
etc
|
||||
even
|
||||
ever
|
||||
every
|
||||
everybody
|
||||
everyone
|
||||
everything
|
||||
everywhere
|
||||
ex
|
||||
except
|
||||
f
|
||||
far
|
||||
few
|
||||
ff
|
||||
fifth
|
||||
first
|
||||
five
|
||||
fix
|
||||
followed
|
||||
following
|
||||
follows
|
||||
for
|
||||
former
|
||||
formerly
|
||||
forth
|
||||
found
|
||||
four
|
||||
from
|
||||
further
|
||||
furthermore
|
||||
g
|
||||
gave
|
||||
get
|
||||
gets
|
||||
getting
|
||||
give
|
||||
given
|
||||
gives
|
||||
giving
|
||||
go
|
||||
goes
|
||||
gone
|
||||
got
|
||||
gotten
|
||||
h
|
||||
had
|
||||
happens
|
||||
hardly
|
||||
has
|
||||
hasn't
|
||||
have
|
||||
haven't
|
||||
having
|
||||
he
|
||||
hed
|
||||
hence
|
||||
her
|
||||
here
|
||||
hereafter
|
||||
hereby
|
||||
herein
|
||||
heres
|
||||
hereupon
|
||||
hers
|
||||
herself
|
||||
hes
|
||||
hi
|
||||
hid
|
||||
him
|
||||
himself
|
||||
his
|
||||
hither
|
||||
home
|
||||
how
|
||||
howbeit
|
||||
however
|
||||
hundred
|
||||
i
|
||||
id
|
||||
ie
|
||||
if
|
||||
i'll
|
||||
im
|
||||
immediate
|
||||
immediately
|
||||
importance
|
||||
important
|
||||
in
|
||||
inc
|
||||
indeed
|
||||
index
|
||||
information
|
||||
instead
|
||||
into
|
||||
invention
|
||||
inward
|
||||
is
|
||||
isn't
|
||||
it
|
||||
itd
|
||||
it'll
|
||||
its
|
||||
itself
|
||||
i've
|
||||
j
|
||||
just
|
||||
k
|
||||
keep
|
||||
keeps
|
||||
kept
|
||||
kg
|
||||
km
|
||||
know
|
||||
known
|
||||
knows
|
||||
l
|
||||
largely
|
||||
last
|
||||
lately
|
||||
later
|
||||
latter
|
||||
latterly
|
||||
least
|
||||
less
|
||||
lest
|
||||
let
|
||||
lets
|
||||
like
|
||||
liked
|
||||
likely
|
||||
line
|
||||
little
|
||||
'll
|
||||
look
|
||||
looking
|
||||
looks
|
||||
ltd
|
||||
m
|
||||
made
|
||||
mainly
|
||||
make
|
||||
makes
|
||||
many
|
||||
may
|
||||
maybe
|
||||
me
|
||||
mean
|
||||
means
|
||||
meantime
|
||||
meanwhile
|
||||
merely
|
||||
mg
|
||||
might
|
||||
million
|
||||
miss
|
||||
ml
|
||||
more
|
||||
moreover
|
||||
most
|
||||
mostly
|
||||
mr
|
||||
mrs
|
||||
much
|
||||
mug
|
||||
must
|
||||
my
|
||||
myself
|
||||
n
|
||||
na
|
||||
name
|
||||
namely
|
||||
nay
|
||||
nd
|
||||
near
|
||||
nearly
|
||||
necessarily
|
||||
necessary
|
||||
need
|
||||
needs
|
||||
neither
|
||||
never
|
||||
nevertheless
|
||||
new
|
||||
next
|
||||
nine
|
||||
ninety
|
||||
no
|
||||
nobody
|
||||
non
|
||||
none
|
||||
nonetheless
|
||||
noone
|
||||
nor
|
||||
normally
|
||||
nos
|
||||
not
|
||||
noted
|
||||
nothing
|
||||
now
|
||||
nowhere
|
||||
o
|
||||
obtain
|
||||
obtained
|
||||
obviously
|
||||
of
|
||||
off
|
||||
often
|
||||
oh
|
||||
ok
|
||||
okay
|
||||
old
|
||||
omitted
|
||||
on
|
||||
once
|
||||
one
|
||||
ones
|
||||
only
|
||||
onto
|
||||
or
|
||||
ord
|
||||
other
|
||||
others
|
||||
otherwise
|
||||
ought
|
||||
our
|
||||
ours
|
||||
ourselves
|
||||
out
|
||||
outside
|
||||
over
|
||||
overall
|
||||
owing
|
||||
own
|
||||
p
|
||||
page
|
||||
pages
|
||||
part
|
||||
particular
|
||||
particularly
|
||||
past
|
||||
per
|
||||
perhaps
|
||||
placed
|
||||
please
|
||||
plus
|
||||
poorly
|
||||
possible
|
||||
possibly
|
||||
potentially
|
||||
pp
|
||||
predominantly
|
||||
present
|
||||
previously
|
||||
primarily
|
||||
probably
|
||||
promptly
|
||||
proud
|
||||
provides
|
||||
put
|
||||
q
|
||||
que
|
||||
quickly
|
||||
quite
|
||||
qv
|
||||
r
|
||||
ran
|
||||
rather
|
||||
rd
|
||||
re
|
||||
readily
|
||||
really
|
||||
recent
|
||||
recently
|
||||
ref
|
||||
refs
|
||||
regarding
|
||||
regardless
|
||||
regards
|
||||
related
|
||||
relatively
|
||||
research
|
||||
respectively
|
||||
resulted
|
||||
resulting
|
||||
results
|
||||
right
|
||||
run
|
||||
s
|
||||
said
|
||||
same
|
||||
saw
|
||||
say
|
||||
saying
|
||||
says
|
||||
sec
|
||||
section
|
||||
see
|
||||
seeing
|
||||
seem
|
||||
seemed
|
||||
seeming
|
||||
seems
|
||||
seen
|
||||
self
|
||||
selves
|
||||
sent
|
||||
seven
|
||||
several
|
||||
shall
|
||||
she
|
||||
shed
|
||||
she'll
|
||||
shes
|
||||
should
|
||||
shouldn't
|
||||
show
|
||||
showed
|
||||
shown
|
||||
showns
|
||||
shows
|
||||
significant
|
||||
significantly
|
||||
similar
|
||||
similarly
|
||||
since
|
||||
six
|
||||
slightly
|
||||
so
|
||||
some
|
||||
somebody
|
||||
somehow
|
||||
someone
|
||||
somethan
|
||||
something
|
||||
sometime
|
||||
sometimes
|
||||
somewhat
|
||||
somewhere
|
||||
soon
|
||||
sorry
|
||||
specifically
|
||||
specified
|
||||
specify
|
||||
specifying
|
||||
still
|
||||
stop
|
||||
strongly
|
||||
sub
|
||||
substantially
|
||||
successfully
|
||||
such
|
||||
sufficiently
|
||||
suggest
|
||||
sup
|
||||
sure
|
||||
t
|
||||
take
|
||||
taken
|
||||
taking
|
||||
tell
|
||||
tends
|
||||
th
|
||||
than
|
||||
thank
|
||||
thanks
|
||||
thanx
|
||||
that
|
||||
that'll
|
||||
thats
|
||||
that've
|
||||
the
|
||||
their
|
||||
theirs
|
||||
them
|
||||
themselves
|
||||
then
|
||||
thence
|
||||
there
|
||||
thereafter
|
||||
thereby
|
||||
thered
|
||||
therefore
|
||||
therein
|
||||
there'll
|
||||
thereof
|
||||
therere
|
||||
theres
|
||||
thereto
|
||||
thereupon
|
||||
there've
|
||||
these
|
||||
they
|
||||
theyd
|
||||
they'll
|
||||
theyre
|
||||
they've
|
||||
think
|
||||
this
|
||||
those
|
||||
thou
|
||||
though
|
||||
thoughh
|
||||
thousand
|
||||
throug
|
||||
through
|
||||
throughout
|
||||
thru
|
||||
thus
|
||||
til
|
||||
tip
|
||||
to
|
||||
together
|
||||
too
|
||||
took
|
||||
toward
|
||||
towards
|
||||
tried
|
||||
tries
|
||||
truly
|
||||
try
|
||||
trying
|
||||
ts
|
||||
twice
|
||||
two
|
||||
u
|
||||
un
|
||||
under
|
||||
unfortunately
|
||||
unless
|
||||
unlike
|
||||
unlikely
|
||||
until
|
||||
unto
|
||||
up
|
||||
upon
|
||||
ups
|
||||
us
|
||||
use
|
||||
used
|
||||
useful
|
||||
usefully
|
||||
usefulness
|
||||
uses
|
||||
using
|
||||
usually
|
||||
v
|
||||
value
|
||||
various
|
||||
've
|
||||
very
|
||||
via
|
||||
viz
|
||||
vol
|
||||
vols
|
||||
vs
|
||||
w
|
||||
want
|
||||
wants
|
||||
was
|
||||
wasn't
|
||||
way
|
||||
we
|
||||
wed
|
||||
welcome
|
||||
we'll
|
||||
went
|
||||
were
|
||||
weren't
|
||||
we've
|
||||
what
|
||||
whatever
|
||||
what'll
|
||||
whats
|
||||
when
|
||||
whence
|
||||
whenever
|
||||
where
|
||||
whereafter
|
||||
whereas
|
||||
whereby
|
||||
wherein
|
||||
wheres
|
||||
whereupon
|
||||
wherever
|
||||
whether
|
||||
which
|
||||
while
|
||||
whim
|
||||
whither
|
||||
who
|
||||
whod
|
||||
whoever
|
||||
whole
|
||||
who'll
|
||||
whom
|
||||
whomever
|
||||
whos
|
||||
whose
|
||||
why
|
||||
widely
|
||||
willing
|
||||
wish
|
||||
with
|
||||
within
|
||||
without
|
||||
won't
|
||||
words
|
||||
world
|
||||
would
|
||||
wouldn't
|
||||
www
|
||||
x
|
||||
y
|
||||
yes
|
||||
yet
|
||||
you
|
||||
youd
|
||||
you'll
|
||||
your
|
||||
youre
|
||||
yours
|
||||
yourself
|
||||
yourselves
|
||||
you've
|
||||
z
|
||||
zero
|
||||
.
|
||||
?
|
||||
!
|
||||
|
||||
http
|
||||
don
|
||||
people
|
||||
well
|
||||
will
|
||||
https
|
||||
time
|
||||
good
|
||||
thing
|
||||
twitter
|
||||
pretty
|
||||
it's
|
||||
i'm
|
||||
that's
|
||||
you're
|
||||
they're
|
||||
there's
|
||||
things
|
||||
yeah
|
||||
find
|
||||
going
|
||||
work
|
||||
point
|
||||
years
|
||||
guess
|
||||
bad
|
||||
problem
|
||||
real
|
||||
kind
|
||||
day
|
||||
better
|
||||
lot
|
||||
stuff
|
||||
i'd
|
||||
read
|
||||
thought
|
||||
idea
|
||||
case
|
||||
word
|
||||
hey
|
||||
person
|
||||
long
|
||||
Dear
|
||||
internet
|
||||
tweet
|
||||
he's
|
||||
feel
|
||||
wrong
|
||||
call
|
||||
hard
|
||||
phone
|
||||
ago
|
||||
literally
|
||||
remember
|
||||
reason
|
||||
called
|
||||
course
|
||||
bit
|
||||
question
|
||||
high
|
||||
today
|
||||
told
|
||||
man
|
||||
actual
|
||||
year
|
||||
three
|
||||
book
|
||||
assume
|
||||
life
|
||||
true
|
||||
best
|
||||
wow
|
||||
video
|
||||
times
|
||||
works
|
||||
fact
|
||||
completely
|
||||
totally
|
||||
imo
|
||||
open
|
||||
lol
|
||||
haha
|
||||
cool
|
||||
yep
|
||||
ooh
|
||||
great
|
||||
ugh
|
||||
tonight
|
||||
talk
|
||||
sounds
|
||||
hahaha
|
||||
whoa
|
||||
cool
|
||||
we're
|
||||
guys
|
||||
sweet
|
||||
fortunately
|
||||
hmm
|
||||
aren't
|
||||
sadly
|
||||
talking
|
||||
you'd
|
||||
place
|
||||
yup
|
||||
what's
|
||||
y'know
|
||||
basically
|
||||
god
|
||||
shit
|
||||
holy
|
||||
interesting
|
||||
news
|
||||
guy
|
||||
wait
|
||||
oooh
|
||||
gonna
|
||||
current
|
||||
let's
|
||||
tomorrow
|
||||
omg
|
||||
hate
|
||||
hope
|
||||
fuck
|
||||
oops
|
||||
night
|
||||
wear
|
||||
wanna
|
||||
fun
|
||||
finally
|
||||
whoops
|
||||
nevermind
|
||||
definitely
|
||||
context
|
||||
screen
|
||||
free
|
||||
exactly
|
||||
big
|
||||
house
|
||||
half
|
||||
working
|
||||
play
|
||||
heard
|
||||
hmmm
|
||||
damn
|
||||
woah
|
||||
tho
|
||||
set
|
||||
idk
|
||||
sort
|
||||
understand
|
||||
kinda
|
||||
seriously
|
||||
btw
|
||||
she's
|
||||
hah
|
||||
aww
|
||||
ffs
|
||||
it'd
|
||||
that'd
|
||||
hopefully
|
||||
non
|
||||
entirely
|
||||
lots
|
||||
entire
|
||||
tend
|
||||
hullo
|
||||
clearly
|
||||
surely
|
||||
weird
|
||||
start
|
||||
help
|
||||
nope
|
21
lib/twitter_ebooks.rb
Normal file
21
lib/twitter_ebooks.rb
Normal file
|
@ -0,0 +1,21 @@
|
|||
$debug = false
|
||||
|
||||
def log(*args)
|
||||
STDERR.print args.map(&:to_s).join(' ') + "\n"
|
||||
STDERR.flush
|
||||
end
|
||||
|
||||
module Ebooks
|
||||
GEM_PATH = File.expand_path(File.join(File.dirname(__FILE__), '..'))
|
||||
DATA_PATH = File.join(GEM_PATH, 'data')
|
||||
SKELETON_PATH = File.join(GEM_PATH, 'skeleton')
|
||||
TEST_PATH = File.join(GEM_PATH, 'test')
|
||||
TEST_CORPUS_PATH = File.join(TEST_PATH, 'corpus/0xabad1dea.tweets')
|
||||
INTERIM = :interim
|
||||
end
|
||||
|
||||
require 'twitter_ebooks/nlp'
|
||||
require 'twitter_ebooks/archive'
|
||||
require 'twitter_ebooks/suffix'
|
||||
require 'twitter_ebooks/model'
|
||||
require 'twitter_ebooks/bot'
|
102
lib/twitter_ebooks/archive.rb
Normal file
102
lib/twitter_ebooks/archive.rb
Normal file
|
@ -0,0 +1,102 @@
|
|||
#!/usr/bin/env ruby
|
||||
# encoding: utf-8
|
||||
|
||||
require 'twitter'
|
||||
require 'json'
|
||||
|
||||
CONFIG_PATH = "#{ENV['HOME']}/.ebooksrc"
|
||||
|
||||
module Ebooks
|
||||
class Archive
|
||||
attr_reader :tweets
|
||||
|
||||
def make_client
|
||||
if File.exists?(CONFIG_PATH)
|
||||
@config = JSON.parse(File.read(CONFIG_PATH), symbolize_names: true)
|
||||
else
|
||||
@config = {}
|
||||
|
||||
puts "As Twitter no longer allows anonymous API access, you'll need to enter the auth details of any account to use for archiving. These will be stored in #{CONFIG_PATH} if you need to change them later."
|
||||
print "Consumer key: "
|
||||
@config[:consumer_key] = STDIN.gets.chomp
|
||||
print "Consumer secret: "
|
||||
@config[:consumer_secret] = STDIN.gets.chomp
|
||||
print "Access token: "
|
||||
@config[:oauth_token] = STDIN.gets.chomp
|
||||
print "Access secret: "
|
||||
@config[:oauth_token_secret] = STDIN.gets.chomp
|
||||
|
||||
File.open(CONFIG_PATH, 'w') do |f|
|
||||
f.write(JSON.pretty_generate(@config))
|
||||
end
|
||||
end
|
||||
|
||||
Twitter::REST::Client.new do |config|
|
||||
config.consumer_key = @config[:consumer_key]
|
||||
config.consumer_secret = @config[:consumer_secret]
|
||||
config.access_token = @config[:oauth_token]
|
||||
config.access_token_secret = @config[:oauth_token_secret]
|
||||
end
|
||||
end
|
||||
|
||||
def initialize(username, path=nil, client=nil)
|
||||
@username = username
|
||||
@path = path || "corpus/#{username}.json"
|
||||
|
||||
if File.directory?(@path)
|
||||
@path = File.join(@path, "#{username}.json")
|
||||
end
|
||||
|
||||
@client = client || make_client
|
||||
|
||||
if File.exists?(@path)
|
||||
@tweets = JSON.parse(File.read(@path, :encoding => 'utf-8'), symbolize_names: true)
|
||||
log "Currently #{@tweets.length} tweets for #{@username}"
|
||||
else
|
||||
@tweets.nil?
|
||||
log "New archive for @#{username} at #{@path}"
|
||||
end
|
||||
end
|
||||
|
||||
def sync
|
||||
retries = 0
|
||||
tweets = []
|
||||
max_id = nil
|
||||
|
||||
opts = {
|
||||
count: 200,
|
||||
#include_rts: false,
|
||||
trim_user: true
|
||||
}
|
||||
|
||||
opts[:since_id] = @tweets[0][:id] unless @tweets.nil?
|
||||
|
||||
loop do
|
||||
opts[:max_id] = max_id unless max_id.nil?
|
||||
begin
|
||||
new = @client.user_timeline(@username, opts)
|
||||
rescue Twitter::Error::TooManyRequests
|
||||
log "Rate limit exceeded. Waiting for 5 mins before retry."
|
||||
sleep 60*5
|
||||
retry
|
||||
end
|
||||
break if new.length <= 1
|
||||
tweets += new
|
||||
log "Received #{tweets.length} new tweets"
|
||||
max_id = new.last.id
|
||||
end
|
||||
|
||||
if tweets.length == 0
|
||||
log "No new tweets"
|
||||
else
|
||||
@tweets ||= []
|
||||
@tweets = tweets.map(&:attrs).each { |tw|
|
||||
tw.delete(:entities)
|
||||
} + @tweets
|
||||
File.open(@path, 'w') do |f|
|
||||
f.write(JSON.pretty_generate(@tweets))
|
||||
end
|
||||
end
|
||||
end
|
||||
end
|
||||
end
|
469
lib/twitter_ebooks/bot.rb
Normal file
469
lib/twitter_ebooks/bot.rb
Normal file
|
@ -0,0 +1,469 @@
|
|||
# encoding: utf-8
|
||||
require 'twitter'
|
||||
require 'rufus/scheduler'
|
||||
|
||||
module Ebooks
|
||||
class ConfigurationError < Exception
|
||||
end
|
||||
|
||||
# Represents a single reply tree of tweets
|
||||
class Conversation
|
||||
attr_reader :last_update
|
||||
|
||||
# @param bot [Ebooks::Bot]
|
||||
def initialize(bot)
|
||||
@bot = bot
|
||||
@tweets = []
|
||||
@last_update = Time.now
|
||||
end
|
||||
|
||||
# @param tweet [Twitter::Tweet] tweet to add
|
||||
def add(tweet)
|
||||
@tweets << tweet
|
||||
@last_update = Time.now
|
||||
end
|
||||
|
||||
# Make an informed guess as to whether a user is a bot based
|
||||
# on their behavior in this conversation
|
||||
def is_bot?(username)
|
||||
usertweets = @tweets.select { |t| t.user.screen_name.downcase == username.downcase }
|
||||
|
||||
if usertweets.length > 2
|
||||
if (usertweets[-1].created_at - usertweets[-3].created_at) < 10
|
||||
return true
|
||||
end
|
||||
end
|
||||
|
||||
username.include?("ebooks")
|
||||
end
|
||||
|
||||
# Figure out whether to keep this user in the reply prefix
|
||||
# We want to avoid spamming non-participating users
|
||||
def can_include?(username)
|
||||
@tweets.length <= 4 ||
|
||||
!@tweets.select { |t| t.user.screen_name.downcase == username.downcase }.empty?
|
||||
end
|
||||
end
|
||||
|
||||
# Meta information about a tweet that we calculate for ourselves
|
||||
class TweetMeta
|
||||
# @return [Array<String>] usernames mentioned in tweet
|
||||
attr_accessor :mentions
|
||||
# @return [String] text of tweets with mentions removed
|
||||
attr_accessor :mentionless
|
||||
# @return [Array<String>] usernames to include in a reply
|
||||
attr_accessor :reply_mentions
|
||||
# @return [String] mentions to start reply with
|
||||
attr_accessor :reply_prefix
|
||||
# @return [Integer] available chars for reply
|
||||
attr_accessor :limit
|
||||
|
||||
# @return [Ebooks::Bot] associated bot
|
||||
attr_accessor :bot
|
||||
# @return [Twitter::Tweet] associated tweet
|
||||
attr_accessor :tweet
|
||||
|
||||
# Check whether this tweet mentions our bot
|
||||
# @return [Boolean]
|
||||
def mentions_bot?
|
||||
# To check if this is someone talking to us, ensure:
|
||||
# - The tweet mentions list contains our username
|
||||
# - The tweet is not being retweeted by somebody else
|
||||
# - Or soft-retweeted by somebody else
|
||||
@mentions.map(&:downcase).include?(@bot.username.downcase) && !@tweet.retweeted_status? && !@tweet.text.match(/([`'‘’"“”]|RT|via|by|from)\s*@/i)
|
||||
end
|
||||
|
||||
# @param bot [Ebooks::Bot]
|
||||
# @param ev [Twitter::Tweet]
|
||||
def initialize(bot, ev)
|
||||
@bot = bot
|
||||
@tweet = ev
|
||||
|
||||
@mentions = ev.attrs[:entities][:user_mentions].map { |x| x[:screen_name] }
|
||||
|
||||
# Process mentions to figure out who to reply to
|
||||
# i.e. not self and nobody who has seen too many secondary mentions
|
||||
reply_mentions = @mentions.reject do |m|
|
||||
m.downcase == @bot.username.downcase || !@bot.conversation(ev).can_include?(m)
|
||||
end
|
||||
@reply_mentions = ([ev.user.screen_name] + reply_mentions).uniq
|
||||
|
||||
@reply_prefix = @reply_mentions.map { |m| '@'+m }.join(' ') + ' '
|
||||
@limit = 140 - @reply_prefix.length
|
||||
|
||||
mless = ev.text
|
||||
begin
|
||||
ev.attrs[:entities][:user_mentions].reverse.each do |entity|
|
||||
last = mless[entity[:indices][1]..-1]||''
|
||||
mless = mless[0...entity[:indices][0]] + last.strip
|
||||
end
|
||||
rescue Exception
|
||||
p ev.attrs[:entities][:user_mentions]
|
||||
p ev.text
|
||||
raise
|
||||
end
|
||||
@mentionless = mless
|
||||
end
|
||||
|
||||
# Get an array of media uris in tweet.
|
||||
# @param size [String] A twitter image size to return. Supported sizes are thumb, small, medium (default), large
|
||||
# @return [Array<String>] image URIs included in tweet
|
||||
def media_uris(size_input = '')
|
||||
case size_input
|
||||
when 'thumb'
|
||||
size = ':thumb'
|
||||
when 'small'
|
||||
size = ':small'
|
||||
when 'medium'
|
||||
size = ':medium'
|
||||
when 'large'
|
||||
size = ':large'
|
||||
else
|
||||
size = ''
|
||||
end
|
||||
|
||||
# Start collecting uris.
|
||||
uris = []
|
||||
if @tweet.media?
|
||||
@tweet.media.each do |each_media|
|
||||
uris << each_media.media_url.to_s + size
|
||||
end
|
||||
end
|
||||
|
||||
# and that's pretty much it!
|
||||
uris
|
||||
end
|
||||
end
|
||||
|
||||
class Bot
|
||||
# @return [String] OAuth consumer key for a Twitter app
|
||||
attr_accessor :consumer_key
|
||||
# @return [String] OAuth consumer secret for a Twitter app
|
||||
attr_accessor :consumer_secret
|
||||
# @return [String] OAuth access token from `ebooks auth`
|
||||
attr_accessor :access_token
|
||||
# @return [String] OAuth access secret from `ebooks auth`
|
||||
attr_accessor :access_token_secret
|
||||
# @return [Twitter::User] Twitter user object of bot
|
||||
attr_accessor :user
|
||||
# @return [String] Twitter username of bot
|
||||
attr_accessor :username
|
||||
# @return [Array<String>] list of usernames to block on contact
|
||||
attr_accessor :blacklist
|
||||
# @return [Hash{String => Ebooks::Conversation}] maps tweet ids to their conversation contexts
|
||||
attr_accessor :conversations
|
||||
# @return [Range, Integer] range of seconds to delay in delay method
|
||||
attr_accessor :delay_range
|
||||
|
||||
# @return [Array] list of all defined bots
|
||||
def self.all; @@all ||= []; end
|
||||
|
||||
# Fetches a bot by username
|
||||
# @param username [String]
|
||||
# @return [Ebooks::Bot]
|
||||
def self.get(username)
|
||||
all.find { |bot| bot.username == username }
|
||||
end
|
||||
|
||||
# Logs info to stdout in the context of this bot
|
||||
def log(*args)
|
||||
STDOUT.print "@#{@username}: " + args.map(&:to_s).join(' ') + "\n"
|
||||
STDOUT.flush
|
||||
end
|
||||
|
||||
# Initializes and configures bot
|
||||
# @param args Arguments passed to configure method
|
||||
# @param b Block to call with new bot
|
||||
def initialize(username, &b)
|
||||
@blacklist ||= []
|
||||
@conversations ||= {}
|
||||
# Tweet ids we've already observed, to avoid duplication
|
||||
@seen_tweets ||= {}
|
||||
|
||||
@username = username
|
||||
@delay_range ||= 1..6
|
||||
configure
|
||||
|
||||
b.call(self) unless b.nil?
|
||||
Bot.all << self
|
||||
end
|
||||
|
||||
def configure
|
||||
raise ConfigurationError, "Please override the 'configure' method for subclasses of Ebooks::Bot."
|
||||
end
|
||||
|
||||
# Find or create the conversation context for this tweet
|
||||
# @param tweet [Twitter::Tweet]
|
||||
# @return [Ebooks::Conversation]
|
||||
def conversation(tweet)
|
||||
conv = if tweet.in_reply_to_status_id?
|
||||
@conversations[tweet.in_reply_to_status_id]
|
||||
end
|
||||
|
||||
if conv.nil?
|
||||
conv = @conversations[tweet.id] || Conversation.new(self)
|
||||
end
|
||||
|
||||
if tweet.in_reply_to_status_id?
|
||||
@conversations[tweet.in_reply_to_status_id] = conv
|
||||
end
|
||||
@conversations[tweet.id] = conv
|
||||
|
||||
# Expire any old conversations to prevent memory growth
|
||||
@conversations.each do |k,v|
|
||||
if v != conv && Time.now - v.last_update > 3600
|
||||
@conversations.delete(k)
|
||||
end
|
||||
end
|
||||
|
||||
conv
|
||||
end
|
||||
|
||||
# @return [Twitter::REST::Client] underlying REST client from twitter gem
|
||||
def twitter
|
||||
@twitter ||= Twitter::REST::Client.new do |config|
|
||||
config.consumer_key = @consumer_key
|
||||
config.consumer_secret = @consumer_secret
|
||||
config.access_token = @access_token
|
||||
config.access_token_secret = @access_token_secret
|
||||
end
|
||||
end
|
||||
|
||||
# @return [Twitter::Streaming::Client] underlying streaming client from twitter gem
|
||||
def stream
|
||||
@stream ||= Twitter::Streaming::Client.new do |config|
|
||||
config.consumer_key = @consumer_key
|
||||
config.consumer_secret = @consumer_secret
|
||||
config.access_token = @access_token
|
||||
config.access_token_secret = @access_token_secret
|
||||
end
|
||||
end
|
||||
|
||||
# Calculate some meta information about a tweet relevant for replying
|
||||
# @param ev [Twitter::Tweet]
|
||||
# @return [Ebooks::TweetMeta]
|
||||
def meta(ev)
|
||||
TweetMeta.new(self, ev)
|
||||
end
|
||||
|
||||
# Receive an event from the twitter stream
|
||||
# @param ev [Object] Twitter streaming event
|
||||
def receive_event(ev)
|
||||
case ev
|
||||
when Array # Initial array sent on first connection
|
||||
log "Online!"
|
||||
fire(:connect, ev)
|
||||
return
|
||||
when Twitter::DirectMessage
|
||||
return if ev.sender.id == @user.id # Don't reply to self
|
||||
log "DM from @#{ev.sender.screen_name}: #{ev.text}"
|
||||
fire(:message, ev)
|
||||
when Twitter::Tweet
|
||||
return unless ev.text # If it's not a text-containing tweet, ignore it
|
||||
return if ev.user.id == @user.id # Ignore our own tweets
|
||||
|
||||
meta = meta(ev)
|
||||
|
||||
if blacklisted?(ev.user.screen_name)
|
||||
log "Blocking blacklisted user @#{ev.user.screen_name}"
|
||||
@twitter.block(ev.user.screen_name)
|
||||
end
|
||||
|
||||
# Avoid responding to duplicate tweets
|
||||
if @seen_tweets[ev.id]
|
||||
log "Not firing event for duplicate tweet #{ev.id}"
|
||||
return
|
||||
else
|
||||
@seen_tweets[ev.id] = true
|
||||
end
|
||||
|
||||
if meta.mentions_bot?
|
||||
log "Mention from @#{ev.user.screen_name}: #{ev.text}"
|
||||
conversation(ev).add(ev)
|
||||
fire(:mention, ev)
|
||||
else
|
||||
fire(:timeline, ev)
|
||||
end
|
||||
when Twitter::Streaming::Event
|
||||
case ev.name
|
||||
when :follow
|
||||
return if ev.source.id == @user.id
|
||||
log "Followed by #{ev.source.screen_name}"
|
||||
fire(:follow, ev.source)
|
||||
when :favorite, :unfavorite
|
||||
return if ev.source.id == @user.id # Ignore our own favorites
|
||||
log "@#{ev.source.screen_name} #{ev.name.to_s}d: #{ev.target_object.text}"
|
||||
fire(ev.name, ev.source, ev.target_object)
|
||||
when :user_update
|
||||
update_myself ev.source
|
||||
end
|
||||
when Twitter::Streaming::DeletedTweet
|
||||
# Pass
|
||||
else
|
||||
log ev
|
||||
end
|
||||
end
|
||||
|
||||
# Updates @user and calls on_user_update.
|
||||
def update_myself(new_me=twitter.user)
|
||||
@user = new_me if @user.nil? || new_me.id == @user.id
|
||||
@username = @user.screen_name
|
||||
log 'User information updated'
|
||||
fire(:user_update)
|
||||
end
|
||||
|
||||
# Configures client and fires startup event
|
||||
def prepare
|
||||
# Sanity check
|
||||
if @username.nil?
|
||||
raise ConfigurationError, "bot username cannot be nil"
|
||||
end
|
||||
|
||||
if @consumer_key.nil? || @consumer_key.empty? ||
|
||||
@consumer_secret.nil? || @consumer_key.empty?
|
||||
log "Missing consumer_key or consumer_secret. These details can be acquired by registering a Twitter app at https://apps.twitter.com/"
|
||||
exit 1
|
||||
end
|
||||
|
||||
if @access_token.nil? || @access_token.empty? ||
|
||||
@access_token_secret.nil? || @access_token_secret.empty?
|
||||
log "Missing access_token or access_token_secret. Please run `ebooks auth`."
|
||||
exit 1
|
||||
end
|
||||
|
||||
# Save old name
|
||||
old_name = username
|
||||
# Load user object and actual username
|
||||
update_myself
|
||||
# Warn about mismatches unless it was clearly intentional
|
||||
log "warning: bot expected to be @#{old_name} but connected to @#{username}" unless username == old_name || old_name.empty?
|
||||
|
||||
fire(:startup)
|
||||
end
|
||||
|
||||
# Start running user event stream
|
||||
def start
|
||||
log "starting tweet stream"
|
||||
|
||||
stream.user do |ev|
|
||||
receive_event ev
|
||||
end
|
||||
end
|
||||
|
||||
# Fire an event
|
||||
# @param event [Symbol] event to fire
|
||||
# @param args arguments for event handler
|
||||
def fire(event, *args)
|
||||
handler = "on_#{event}".to_sym
|
||||
if respond_to? handler
|
||||
self.send(handler, *args)
|
||||
end
|
||||
end
|
||||
|
||||
# Delay an action for a variable period of time
|
||||
# @param range [Range, Integer] range of seconds to choose for delay
|
||||
def delay(range=@delay_range, &b)
|
||||
time = range.to_a.sample unless range.is_a? Integer
|
||||
sleep time
|
||||
b.call
|
||||
end
|
||||
|
||||
# Check if a username is blacklisted
|
||||
# @param username [String]
|
||||
# @return [Boolean]
|
||||
def blacklisted?(username)
|
||||
if @blacklist.map(&:downcase).include?(username.downcase)
|
||||
true
|
||||
else
|
||||
false
|
||||
end
|
||||
end
|
||||
|
||||
# Reply to a tweet or a DM.
|
||||
# @param ev [Twitter::Tweet, Twitter::DirectMessage]
|
||||
# @param text [String] contents of reply excluding reply_prefix
|
||||
# @param opts [Hash] additional params to pass to twitter gem
|
||||
def reply(ev, text, opts={})
|
||||
opts = opts.clone
|
||||
|
||||
if ev.is_a? Twitter::DirectMessage
|
||||
log "Sending DM to @#{ev.sender.screen_name}: #{text}"
|
||||
twitter.create_direct_message(ev.sender.screen_name, text, opts)
|
||||
elsif ev.is_a? Twitter::Tweet
|
||||
meta = meta(ev)
|
||||
|
||||
if conversation(ev).is_bot?(ev.user.screen_name)
|
||||
log "Not replying to suspected bot @#{ev.user.screen_name}"
|
||||
return false
|
||||
end
|
||||
|
||||
text = meta.reply_prefix + text unless text.match(/@#{Regexp.escape ev.user.screen_name}/i)
|
||||
log "Replying to @#{ev.user.screen_name} with: #{text}"
|
||||
tweet = twitter.update(text, opts.merge(in_reply_to_status_id: ev.id))
|
||||
conversation(tweet).add(tweet)
|
||||
tweet
|
||||
else
|
||||
raise Exception("Don't know how to reply to a #{ev.class}")
|
||||
end
|
||||
end
|
||||
|
||||
# Favorite a tweet
|
||||
# @param tweet [Twitter::Tweet]
|
||||
def favorite(tweet)
|
||||
log "Favoriting @#{tweet.user.screen_name}: #{tweet.text}"
|
||||
|
||||
begin
|
||||
twitter.favorite(tweet.id)
|
||||
rescue Twitter::Error::Forbidden
|
||||
log "Already favorited: #{tweet.user.screen_name}: #{tweet.text}"
|
||||
end
|
||||
end
|
||||
|
||||
# Retweet a tweet
|
||||
# @param tweet [Twitter::Tweet]
|
||||
def retweet(tweet)
|
||||
log "Retweeting @#{tweet.user.screen_name}: #{tweet.text}"
|
||||
|
||||
begin
|
||||
twitter.retweet(tweet.id)
|
||||
rescue Twitter::Error::Forbidden
|
||||
log "Already retweeted: #{tweet.user.screen_name}: #{tweet.text}"
|
||||
end
|
||||
end
|
||||
|
||||
# Follow a user
|
||||
# @param user [String] username or user id
|
||||
def follow(user, *args)
|
||||
log "Following #{user}"
|
||||
twitter.follow(user, *args)
|
||||
end
|
||||
|
||||
# Unfollow a user
|
||||
# @param user [String] username or user id
|
||||
def unfollow(user, *args)
|
||||
log "Unfollowing #{user}"
|
||||
twitter.unfollow(user, *args)
|
||||
end
|
||||
|
||||
# Tweet something
|
||||
# @param text [String]
|
||||
def tweet(text, *args)
|
||||
log "Tweeting '#{text}'"
|
||||
twitter.update(text, *args)
|
||||
end
|
||||
|
||||
# Get a scheduler for this bot
|
||||
# @return [Rufus::Scheduler]
|
||||
def scheduler
|
||||
@scheduler ||= Rufus::Scheduler.new
|
||||
end
|
||||
|
||||
# Tweet some text with an image
|
||||
# @param txt [String]
|
||||
# @param pic [String] filename
|
||||
def pictweet(txt, pic, *args)
|
||||
log "Tweeting #{txt.inspect} - #{pic} #{args}"
|
||||
twitter.update_with_media(txt, File.new(pic), *args)
|
||||
end
|
||||
end
|
||||
end
|
299
lib/twitter_ebooks/model.rb
Normal file
299
lib/twitter_ebooks/model.rb
Normal file
|
@ -0,0 +1,299 @@
|
|||
#!/usr/bin/env ruby
|
||||
# encoding: utf-8
|
||||
|
||||
require 'json'
|
||||
require 'set'
|
||||
require 'digest/md5'
|
||||
require 'csv'
|
||||
|
||||
module Ebooks
|
||||
class Model
|
||||
# @return [Array<String>]
|
||||
# An array of unique tokens. This is the main source of actual strings
|
||||
# in the model. Manipulation of a token is done using its index
|
||||
# in this array, which we call a "tiki"
|
||||
attr_accessor :tokens
|
||||
|
||||
# @return [Array<Array<Integer>>]
|
||||
# Sentences represented by arrays of tikis
|
||||
attr_accessor :sentences
|
||||
|
||||
# @return [Array<Array<Integer>>]
|
||||
# Sentences derived from Twitter mentions
|
||||
attr_accessor :mentions
|
||||
|
||||
# @return [Array<String>]
|
||||
# The top 200 most important keywords, in descending order
|
||||
attr_accessor :keywords
|
||||
|
||||
# Generate a new model from a corpus file
|
||||
# @param path [String]
|
||||
# @return [Ebooks::Model]
|
||||
def self.consume(path)
|
||||
Model.new.consume(path)
|
||||
end
|
||||
|
||||
# Generate a new model from multiple corpus files
|
||||
# @param paths [Array<String>]
|
||||
# @return [Ebooks::Model]
|
||||
def self.consume_all(paths)
|
||||
Model.new.consume_all(paths)
|
||||
end
|
||||
|
||||
# Load a saved model
|
||||
# @param path [String]
|
||||
# @return [Ebooks::Model]
|
||||
def self.load(path)
|
||||
model = Model.new
|
||||
model.instance_eval do
|
||||
props = Marshal.load(File.open(path, 'rb') { |f| f.read })
|
||||
@tokens = props[:tokens]
|
||||
@sentences = props[:sentences]
|
||||
@mentions = props[:mentions]
|
||||
@keywords = props[:keywords]
|
||||
end
|
||||
model
|
||||
end
|
||||
|
||||
# Save model to a file
|
||||
# @param path [String]
|
||||
def save(path)
|
||||
File.open(path, 'wb') do |f|
|
||||
f.write(Marshal.dump({
|
||||
tokens: @tokens,
|
||||
sentences: @sentences,
|
||||
mentions: @mentions,
|
||||
keywords: @keywords
|
||||
}))
|
||||
end
|
||||
self
|
||||
end
|
||||
|
||||
def initialize
|
||||
@tokens = []
|
||||
|
||||
# Reverse lookup tiki by token, for faster generation
|
||||
@tikis = {}
|
||||
end
|
||||
|
||||
# Reverse lookup a token index from a token
|
||||
# @param token [String]
|
||||
# @return [Integer]
|
||||
def tikify(token)
|
||||
@tikis[token] or (@tokens << token and @tikis[token] = @tokens.length-1)
|
||||
end
|
||||
|
||||
# Convert a body of text into arrays of tikis
|
||||
# @param text [String]
|
||||
# @return [Array<Array<Integer>>]
|
||||
def mass_tikify(text)
|
||||
sentences = NLP.sentences(text)
|
||||
|
||||
sentences.map do |s|
|
||||
tokens = NLP.tokenize(s).reject do |t|
|
||||
# Don't include usernames/urls as tokens
|
||||
t.include?('@') || t.include?('http')
|
||||
end
|
||||
|
||||
tokens.map { |t| tikify(t) }
|
||||
end
|
||||
end
|
||||
|
||||
# Consume a corpus into this model
|
||||
# @param path [String]
|
||||
def consume(path)
|
||||
content = File.read(path, :encoding => 'utf-8')
|
||||
|
||||
if path.split('.')[-1] == "json"
|
||||
log "Reading json corpus from #{path}"
|
||||
lines = JSON.parse(content).map do |tweet|
|
||||
tweet['text']
|
||||
end
|
||||
elsif path.split('.')[-1] == "csv"
|
||||
log "Reading CSV corpus from #{path}"
|
||||
content = CSV.parse(content)
|
||||
header = content.shift
|
||||
text_col = header.index('text')
|
||||
lines = content.map do |tweet|
|
||||
tweet[text_col]
|
||||
end
|
||||
else
|
||||
log "Reading plaintext corpus from #{path} (if this is a json or csv file, please rename the file with an extension and reconsume)"
|
||||
lines = content.split("\n")
|
||||
end
|
||||
|
||||
consume_lines(lines)
|
||||
end
|
||||
|
||||
# Consume a sequence of lines
|
||||
# @param lines [Array<String>]
|
||||
def consume_lines(lines)
|
||||
log "Removing commented lines and sorting mentions"
|
||||
|
||||
statements = []
|
||||
mentions = []
|
||||
lines.each do |l|
|
||||
next if l.start_with?('#') # Remove commented lines
|
||||
next if l.include?('RT') || l.include?('MT') # Remove soft retweets
|
||||
|
||||
if l.include?('@')
|
||||
mentions << NLP.normalize(l)
|
||||
else
|
||||
statements << NLP.normalize(l)
|
||||
end
|
||||
end
|
||||
|
||||
text = statements.join("\n")
|
||||
mention_text = mentions.join("\n")
|
||||
|
||||
lines = nil; statements = nil; mentions = nil # Allow garbage collection
|
||||
|
||||
log "Tokenizing #{text.count('\n')} statements and #{mention_text.count('\n')} mentions"
|
||||
|
||||
@sentences = mass_tikify(text)
|
||||
@mentions = mass_tikify(mention_text)
|
||||
|
||||
log "Ranking keywords"
|
||||
@keywords = NLP.keywords(text).top(200).map(&:to_s)
|
||||
|
||||
self
|
||||
end
|
||||
|
||||
# Consume multiple corpuses into this model
|
||||
# @param paths [Array<String>]
|
||||
def consume_all(paths)
|
||||
lines = []
|
||||
paths.each do |path|
|
||||
content = File.read(path, :encoding => 'utf-8')
|
||||
|
||||
if path.split('.')[-1] == "json"
|
||||
log "Reading json corpus from #{path}"
|
||||
l = JSON.parse(content).map do |tweet|
|
||||
tweet['text']
|
||||
end
|
||||
lines.concat(l)
|
||||
elsif path.split('.')[-1] == "csv"
|
||||
log "Reading CSV corpus from #{path}"
|
||||
content = CSV.parse(content)
|
||||
header = content.shift
|
||||
text_col = header.index('text')
|
||||
l = content.map do |tweet|
|
||||
tweet[text_col]
|
||||
end
|
||||
lines.concat(l)
|
||||
else
|
||||
log "Reading plaintext corpus from #{path}"
|
||||
l = content.split("\n")
|
||||
lines.concat(l)
|
||||
end
|
||||
end
|
||||
consume_lines(lines)
|
||||
end
|
||||
|
||||
# Correct encoding issues in generated text
|
||||
# @param text [String]
|
||||
# @return [String]
|
||||
def fix(text)
|
||||
NLP.htmlentities.decode text
|
||||
end
|
||||
|
||||
# Check if an array of tikis comprises a valid tweet
|
||||
# @param tikis [Array<Integer>]
|
||||
# @param limit Integer how many chars we have left
|
||||
def valid_tweet?(tikis, limit)
|
||||
tweet = NLP.reconstruct(tikis, @tokens)
|
||||
tweet.length <= limit && !NLP.unmatched_enclosers?(tweet)
|
||||
end
|
||||
|
||||
# Generate some text
|
||||
# @param limit [Integer] available characters
|
||||
# @param generator [SuffixGenerator, nil]
|
||||
# @param retry_limit [Integer] how many times to retry on invalid tweet
|
||||
# @return [String]
|
||||
def make_statement(limit=140, generator=nil, retry_limit=10)
|
||||
responding = !generator.nil?
|
||||
generator ||= SuffixGenerator.build(@sentences)
|
||||
|
||||
retries = 0
|
||||
tweet = ""
|
||||
|
||||
while (tikis = generator.generate(3, :bigrams)) do
|
||||
next if tikis.length <= 3 && !responding
|
||||
break if valid_tweet?(tikis, limit)
|
||||
|
||||
retries += 1
|
||||
break if retries >= retry_limit
|
||||
end
|
||||
|
||||
if verbatim?(tikis) && tikis.length > 3 # We made a verbatim tweet by accident
|
||||
while (tikis = generator.generate(3, :unigrams)) do
|
||||
break if valid_tweet?(tikis, limit) && !verbatim?(tikis)
|
||||
|
||||
retries += 1
|
||||
break if retries >= retry_limit
|
||||
end
|
||||
end
|
||||
|
||||
tweet = NLP.reconstruct(tikis, @tokens)
|
||||
|
||||
if retries >= retry_limit
|
||||
log "Unable to produce valid non-verbatim tweet; using \"#{tweet}\""
|
||||
end
|
||||
|
||||
fix tweet
|
||||
end
|
||||
|
||||
# Test if a sentence has been copied verbatim from original
|
||||
# @param tikis [Array<Integer>]
|
||||
# @return [Boolean]
|
||||
def verbatim?(tikis)
|
||||
@sentences.include?(tikis) || @mentions.include?(tikis)
|
||||
end
|
||||
|
||||
# Finds relevant and slightly relevant tokenized sentences to input
|
||||
# comparing non-stopword token overlaps
|
||||
# @param sentences [Array<Array<Integer>>]
|
||||
# @param input [String]
|
||||
# @return [Array<Array<Array<Integer>>, Array<Array<Integer>>>]
|
||||
def find_relevant(sentences, input)
|
||||
relevant = []
|
||||
slightly_relevant = []
|
||||
|
||||
tokenized = NLP.tokenize(input).map(&:downcase)
|
||||
|
||||
sentences.each do |sent|
|
||||
tokenized.each do |token|
|
||||
if sent.map { |tiki| @tokens[tiki].downcase }.include?(token)
|
||||
relevant << sent unless NLP.stopword?(token)
|
||||
slightly_relevant << sent
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
[relevant, slightly_relevant]
|
||||
end
|
||||
|
||||
# Generates a response by looking for related sentences
|
||||
# in the corpus and building a smaller generator from these
|
||||
# @param input [String]
|
||||
# @param limit [Integer] characters available for response
|
||||
# @param sentences [Array<Array<Integer>>]
|
||||
# @return [String]
|
||||
def make_response(input, limit=140, sentences=@mentions)
|
||||
# Prefer mentions
|
||||
relevant, slightly_relevant = find_relevant(sentences, input)
|
||||
|
||||
if relevant.length >= 3
|
||||
generator = SuffixGenerator.build(relevant)
|
||||
make_statement(limit, generator)
|
||||
elsif slightly_relevant.length >= 5
|
||||
generator = SuffixGenerator.build(slightly_relevant)
|
||||
make_statement(limit, generator)
|
||||
elsif sentences.equal?(@mentions)
|
||||
make_response(input, limit, @sentences)
|
||||
else
|
||||
make_statement(limit)
|
||||
end
|
||||
end
|
||||
end
|
||||
end
|
195
lib/twitter_ebooks/nlp.rb
Normal file
195
lib/twitter_ebooks/nlp.rb
Normal file
|
@ -0,0 +1,195 @@
|
|||
# encoding: utf-8
|
||||
require 'fast-stemmer'
|
||||
require 'highscore'
|
||||
|
||||
module Ebooks
|
||||
module NLP
|
||||
# We deliberately limit our punctuation handling to stuff we can do consistently
|
||||
# It'll just be a part of another token if we don't split it out, and that's fine
|
||||
PUNCTUATION = ".?!,"
|
||||
|
||||
# Lazy-load NLP libraries and resources
|
||||
# Some of this stuff is pretty heavy and we don't necessarily need
|
||||
# to be using it all of the time
|
||||
|
||||
# Lazily loads an array of stopwords
|
||||
# Stopwords are common English words that should often be ignored
|
||||
# @return [Array<String>]
|
||||
def self.stopwords
|
||||
@stopwords ||= File.read(File.join(DATA_PATH, 'stopwords.txt')).split
|
||||
end
|
||||
|
||||
# Lazily loads an array of known English nouns
|
||||
# @return [Array<String>]
|
||||
def self.nouns
|
||||
@nouns ||= File.read(File.join(DATA_PATH, 'nouns.txt')).split
|
||||
end
|
||||
|
||||
# Lazily loads an array of known English adjectives
|
||||
# @return [Array<String>]
|
||||
def self.adjectives
|
||||
@adjectives ||= File.read(File.join(DATA_PATH, 'adjectives.txt')).split
|
||||
end
|
||||
|
||||
# Lazily load part-of-speech tagging library
|
||||
# This can determine whether a word is being used as a noun/adjective/verb
|
||||
# @return [EngTagger]
|
||||
def self.tagger
|
||||
require 'engtagger'
|
||||
@tagger ||= EngTagger.new
|
||||
end
|
||||
|
||||
# Lazily load HTML entity decoder
|
||||
# @return [HTMLEntities]
|
||||
def self.htmlentities
|
||||
require 'htmlentities'
|
||||
@htmlentities ||= HTMLEntities.new
|
||||
end
|
||||
|
||||
### Utility functions
|
||||
|
||||
# Normalize some strange unicode punctuation variants
|
||||
# @param text [String]
|
||||
# @return [String]
|
||||
def self.normalize(text)
|
||||
htmlentities.decode text.gsub('“', '"').gsub('”', '"').gsub('’', "'").gsub('…', '...')
|
||||
end
|
||||
|
||||
# Split text into sentences
|
||||
# We use ad hoc approach because fancy libraries do not deal
|
||||
# especially well with tweet formatting, and we can fake solving
|
||||
# the quote problem during generation
|
||||
# @param text [String]
|
||||
# @return [Array<String>]
|
||||
def self.sentences(text)
|
||||
text.split(/\n+|(?<=[.?!])\s+/)
|
||||
end
|
||||
|
||||
# Split a sentence into word-level tokens
|
||||
# As above, this is ad hoc because tokenization libraries
|
||||
# do not behave well wrt. things like emoticons and timestamps
|
||||
# @param sentence [String]
|
||||
# @return [Array<String>]
|
||||
def self.tokenize(sentence)
|
||||
regex = /\s+|(?<=[#{PUNCTUATION}]\s)(?=[a-zA-Z])|(?<=[a-zA-Z])(?=[#{PUNCTUATION}]+\s)/
|
||||
sentence.split(regex)
|
||||
end
|
||||
|
||||
# Get the 'stem' form of a word e.g. 'cats' -> 'cat'
|
||||
# @param word [String]
|
||||
# @return [String]
|
||||
def self.stem(word)
|
||||
Stemmer::stem_word(word.downcase)
|
||||
end
|
||||
|
||||
# Use highscore gem to find interesting keywords in a corpus
|
||||
# @param text [String]
|
||||
# @return [Highscore::Keywords]
|
||||
def self.keywords(text)
|
||||
# Preprocess to remove stopwords (highscore's blacklist is v. slow)
|
||||
text = NLP.tokenize(text).reject { |t| stopword?(t) }.join(' ')
|
||||
|
||||
text = Highscore::Content.new(text)
|
||||
|
||||
text.configure do
|
||||
#set :multiplier, 2
|
||||
#set :upper_case, 3
|
||||
#set :long_words, 2
|
||||
#set :long_words_threshold, 15
|
||||
#set :vowels, 1 # => default: 0 = not considered
|
||||
#set :consonants, 5 # => default: 0 = not considered
|
||||
#set :ignore_case, true # => default: false
|
||||
set :word_pattern, /(?<!@)(?<=\s)[\w']+/ # => default: /\w+/
|
||||
#set :stemming, true # => default: false
|
||||
end
|
||||
|
||||
text.keywords
|
||||
end
|
||||
|
||||
# Builds a proper sentence from a list of tikis
|
||||
# @param tikis [Array<Integer>]
|
||||
# @param tokens [Array<String>]
|
||||
# @return [String]
|
||||
def self.reconstruct(tikis, tokens)
|
||||
text = ""
|
||||
last_token = nil
|
||||
tikis.each do |tiki|
|
||||
next if tiki == INTERIM
|
||||
token = tokens[tiki]
|
||||
text += ' ' if last_token && space_between?(last_token, token)
|
||||
text += token
|
||||
last_token = token
|
||||
end
|
||||
text
|
||||
end
|
||||
|
||||
# Determine if we need to insert a space between two tokens
|
||||
# @param token1 [String]
|
||||
# @param token2 [String]
|
||||
# @return [Boolean]
|
||||
def self.space_between?(token1, token2)
|
||||
p1 = self.punctuation?(token1)
|
||||
p2 = self.punctuation?(token2)
|
||||
if p1 && p2 # "foo?!"
|
||||
false
|
||||
elsif !p1 && p2 # "foo."
|
||||
false
|
||||
elsif p1 && !p2 # "foo. rah"
|
||||
true
|
||||
else # "foo rah"
|
||||
true
|
||||
end
|
||||
end
|
||||
|
||||
# Is this token comprised of punctuation?
|
||||
# @param token [String]
|
||||
# @return [Boolean]
|
||||
def self.punctuation?(token)
|
||||
(token.chars.to_set - PUNCTUATION.chars.to_set).empty?
|
||||
end
|
||||
|
||||
# Is this token a stopword?
|
||||
# @param token [String]
|
||||
# @return [Boolean]
|
||||
def self.stopword?(token)
|
||||
@stopword_set ||= stopwords.map(&:downcase).to_set
|
||||
@stopword_set.include?(token.downcase)
|
||||
end
|
||||
|
||||
# Determine if a sample of text contains unmatched brackets or quotes
|
||||
# This is one of the more frequent and noticeable failure modes for
|
||||
# the generator; we can just tell it to retry
|
||||
# @param text [String]
|
||||
# @return [Boolean]
|
||||
def self.unmatched_enclosers?(text)
|
||||
enclosers = ['**', '""', '()', '[]', '``', "''"]
|
||||
enclosers.each do |pair|
|
||||
starter = Regexp.new('(\W|^)' + Regexp.escape(pair[0]) + '\S')
|
||||
ender = Regexp.new('\S' + Regexp.escape(pair[1]) + '(\W|$)')
|
||||
|
||||
opened = 0
|
||||
|
||||
tokenize(text).each do |token|
|
||||
opened += 1 if token.match(starter)
|
||||
opened -= 1 if token.match(ender)
|
||||
|
||||
return true if opened < 0 # Too many ends!
|
||||
end
|
||||
|
||||
return true if opened != 0 # Mismatch somewhere.
|
||||
end
|
||||
|
||||
false
|
||||
end
|
||||
|
||||
# Determine if a2 is a subsequence of a1
|
||||
# @param a1 [Array]
|
||||
# @param a2 [Array]
|
||||
# @return [Boolean]
|
||||
def self.subseq?(a1, a2)
|
||||
!a1.each_index.find do |i|
|
||||
a1[i...i+a2.length] == a2
|
||||
end.nil?
|
||||
end
|
||||
end
|
||||
end
|
95
lib/twitter_ebooks/suffix.rb
Normal file
95
lib/twitter_ebooks/suffix.rb
Normal file
|
@ -0,0 +1,95 @@
|
|||
# encoding: utf-8
|
||||
|
||||
module Ebooks
|
||||
# This generator uses data identical to a markov model, but
|
||||
# instead of making a chain by looking up bigrams it uses the
|
||||
# positions to randomly replace suffixes in one sentence with
|
||||
# matching suffixes in another
|
||||
class SuffixGenerator
|
||||
# Build a generator from a corpus of tikified sentences
|
||||
# @param sentences [Array<Array<Integer>>]
|
||||
# @return [SuffixGenerator]
|
||||
def self.build(sentences)
|
||||
SuffixGenerator.new(sentences)
|
||||
end
|
||||
|
||||
def initialize(sentences)
|
||||
@sentences = sentences.reject { |s| s.length < 2 }
|
||||
@unigrams = {}
|
||||
@bigrams = {}
|
||||
|
||||
@sentences.each_with_index do |tikis, i|
|
||||
last_tiki = INTERIM
|
||||
tikis.each_with_index do |tiki, j|
|
||||
@unigrams[last_tiki] ||= []
|
||||
@unigrams[last_tiki] << [i, j]
|
||||
|
||||
@bigrams[last_tiki] ||= {}
|
||||
@bigrams[last_tiki][tiki] ||= []
|
||||
|
||||
if j == tikis.length-1 # Mark sentence endings
|
||||
@unigrams[tiki] ||= []
|
||||
@unigrams[tiki] << [i, INTERIM]
|
||||
@bigrams[last_tiki][tiki] << [i, INTERIM]
|
||||
else
|
||||
@bigrams[last_tiki][tiki] << [i, j+1]
|
||||
end
|
||||
|
||||
last_tiki = tiki
|
||||
end
|
||||
end
|
||||
|
||||
self
|
||||
end
|
||||
|
||||
|
||||
# Generate a recombined sequence of tikis
|
||||
# @param passes [Integer] number of times to recombine
|
||||
# @param n [Symbol] :unigrams or :bigrams (affects how conservative the model is)
|
||||
# @return [Array<Integer>]
|
||||
def generate(passes=5, n=:unigrams)
|
||||
index = rand(@sentences.length)
|
||||
tikis = @sentences[index]
|
||||
used = [index] # Sentences we've already used
|
||||
verbatim = [tikis] # Verbatim sentences to avoid reproducing
|
||||
|
||||
0.upto(passes-1) do
|
||||
varsites = {} # Map bigram start site => next tiki alternatives
|
||||
|
||||
tikis.each_with_index do |tiki, i|
|
||||
next_tiki = tikis[i+1]
|
||||
break if next_tiki.nil?
|
||||
|
||||
alternatives = (n == :unigrams) ? @unigrams[next_tiki] : @bigrams[tiki][next_tiki]
|
||||
# Filter out suffixes from previous sentences
|
||||
alternatives.reject! { |a| a[1] == INTERIM || used.include?(a[0]) }
|
||||
varsites[i] = alternatives unless alternatives.empty?
|
||||
end
|
||||
|
||||
variant = nil
|
||||
varsites.to_a.shuffle.each do |site|
|
||||
start = site[0]
|
||||
|
||||
site[1].shuffle.each do |alt|
|
||||
verbatim << @sentences[alt[0]]
|
||||
suffix = @sentences[alt[0]][alt[1]..-1]
|
||||
potential = tikis[0..start+1] + suffix
|
||||
|
||||
# Ensure we're not just rebuilding some segment of another sentence
|
||||
unless verbatim.find { |v| NLP.subseq?(v, potential) || NLP.subseq?(potential, v) }
|
||||
used << alt[0]
|
||||
variant = potential
|
||||
break
|
||||
end
|
||||
end
|
||||
|
||||
break if variant
|
||||
end
|
||||
|
||||
tikis = variant if variant
|
||||
end
|
||||
|
||||
tikis
|
||||
end
|
||||
end
|
||||
end
|
3
lib/twitter_ebooks/version.rb
Normal file
3
lib/twitter_ebooks/version.rb
Normal file
|
@ -0,0 +1,3 @@
|
|||
module Ebooks
|
||||
VERSION = "3.1.0"
|
||||
end
|
4
skeleton/Gemfile
Normal file
4
skeleton/Gemfile
Normal file
|
@ -0,0 +1,4 @@
|
|||
source 'http://rubygems.org'
|
||||
ruby '{{RUBY_VERSION}}'
|
||||
|
||||
gem 'twitter_ebooks'
|
1
skeleton/Procfile
Normal file
1
skeleton/Procfile
Normal file
|
@ -0,0 +1 @@
|
|||
worker: bundle exec ebooks start
|
60
skeleton/bots.rb
Normal file
60
skeleton/bots.rb
Normal file
|
@ -0,0 +1,60 @@
|
|||
require 'twitter_ebooks'
|
||||
|
||||
# This is an example bot definition with event handlers commented out
|
||||
# You can define and instantiate as many bots as you like
|
||||
|
||||
class MyBot < Ebooks::Bot
|
||||
# Configuration here applies to all MyBots
|
||||
def configure
|
||||
# Consumer details come from registering an app at https://dev.twitter.com/
|
||||
# Once you have consumer details, use "ebooks auth" for new access tokens
|
||||
self.consumer_key = '' # Your app consumer key
|
||||
self.consumer_secret = '' # Your app consumer secret
|
||||
|
||||
# Users to block instead of interacting with
|
||||
self.blacklist = ['tnietzschequote']
|
||||
|
||||
# Range in seconds to randomize delay when bot.delay is called
|
||||
self.delay_range = 1..6
|
||||
end
|
||||
|
||||
def on_startup
|
||||
scheduler.every '24h' do
|
||||
# Tweet something every 24 hours
|
||||
# See https://github.com/jmettraux/rufus-scheduler
|
||||
# tweet("hi")
|
||||
# pictweet("hi", "cuteselfie.jpg")
|
||||
end
|
||||
end
|
||||
|
||||
def on_message(dm)
|
||||
# Reply to a DM
|
||||
# reply(dm, "secret secrets")
|
||||
end
|
||||
|
||||
def on_follow(user)
|
||||
# Follow a user back
|
||||
# follow(user.screen_name)
|
||||
end
|
||||
|
||||
def on_mention(tweet)
|
||||
# Reply to a mention
|
||||
# reply(tweet, "oh hullo")
|
||||
end
|
||||
|
||||
def on_timeline(tweet)
|
||||
# Reply to a tweet in the bot's timeline
|
||||
# reply(tweet, "nice tweet")
|
||||
end
|
||||
|
||||
def on_favorite(user, tweet)
|
||||
# Follow user who just favorited bot's tweet
|
||||
# follow(user.screen_name)
|
||||
end
|
||||
end
|
||||
|
||||
# Make a MyBot and attach it to an account
|
||||
MyBot.new("{{BOT_NAME}}") do |bot|
|
||||
bot.access_token = "" # Token connecting the app to this account
|
||||
bot.access_token_secret = "" # Secret connecting the app to this account
|
||||
end
|
0
skeleton/corpus/.gitignore
vendored
Normal file
0
skeleton/corpus/.gitignore
vendored
Normal file
1
skeleton/gitignore
Normal file
1
skeleton/gitignore
Normal file
|
@ -0,0 +1 @@
|
|||
corpus/
|
0
skeleton/model/.gitignore
vendored
Normal file
0
skeleton/model/.gitignore
vendored
Normal file
216
spec/bot_spec.rb
Normal file
216
spec/bot_spec.rb
Normal file
|
@ -0,0 +1,216 @@
|
|||
require 'spec_helper'
|
||||
require 'memory_profiler'
|
||||
require 'tempfile'
|
||||
require 'timecop'
|
||||
|
||||
class TestBot < Ebooks::Bot
|
||||
attr_accessor :twitter
|
||||
|
||||
def configure
|
||||
end
|
||||
|
||||
def on_message(dm)
|
||||
reply dm, "echo: #{dm.text}"
|
||||
end
|
||||
|
||||
def on_mention(tweet)
|
||||
reply tweet, "echo: #{meta(tweet).mentionless}"
|
||||
end
|
||||
|
||||
def on_timeline(tweet)
|
||||
reply tweet, "fine tweet good sir"
|
||||
end
|
||||
end
|
||||
|
||||
module Ebooks::Test
|
||||
# Generates a random twitter id
|
||||
# Or a non-random one, given a string.
|
||||
def twitter_id(seed = nil)
|
||||
if seed.nil?
|
||||
(rand*10**18).to_i
|
||||
else
|
||||
id = 1
|
||||
seed.downcase.each_byte do |byte|
|
||||
id *= byte/10
|
||||
end
|
||||
id
|
||||
end
|
||||
end
|
||||
|
||||
# Creates a mock direct message
|
||||
# @param username User sending the DM
|
||||
# @param text DM content
|
||||
def mock_dm(username, text)
|
||||
Twitter::DirectMessage.new(id: twitter_id,
|
||||
sender: { id: twitter_id(username), screen_name: username},
|
||||
text: text)
|
||||
end
|
||||
|
||||
# Creates a mock tweet
|
||||
# @param username User sending the tweet
|
||||
# @param text Tweet content
|
||||
def mock_tweet(username, text, extra={})
|
||||
mentions = text.split.find_all { |x| x.start_with?('@') }
|
||||
tweet = Twitter::Tweet.new({
|
||||
id: twitter_id,
|
||||
in_reply_to_status_id: 'mock-link',
|
||||
user: { id: twitter_id(username), screen_name: username },
|
||||
text: text,
|
||||
created_at: Time.now.to_s,
|
||||
entities: {
|
||||
user_mentions: mentions.map { |m|
|
||||
{ screen_name: m.split('@')[1],
|
||||
indices: [text.index(m), text.index(m)+m.length] }
|
||||
}
|
||||
}
|
||||
}.merge!(extra))
|
||||
tweet
|
||||
end
|
||||
|
||||
# Creates a mock user
|
||||
def mock_user(username)
|
||||
Twitter::User.new(id: twitter_id(username), screen_name: username)
|
||||
end
|
||||
|
||||
def twitter_spy(bot)
|
||||
twitter = spy("twitter")
|
||||
allow(twitter).to receive(:update).and_return(mock_tweet(bot.username, "test tweet"))
|
||||
allow(twitter).to receive(:user).with(no_args).and_return(mock_user(bot.username))
|
||||
twitter
|
||||
end
|
||||
|
||||
def simulate(bot, &b)
|
||||
bot.twitter = twitter_spy(bot)
|
||||
bot.update_myself # Usually called in prepare
|
||||
b.call
|
||||
end
|
||||
|
||||
def expect_direct_message(bot, content)
|
||||
expect(bot.twitter).to have_received(:create_direct_message).with(anything(), content, {})
|
||||
bot.twitter = twitter_spy(bot)
|
||||
end
|
||||
|
||||
def expect_tweet(bot, content)
|
||||
expect(bot.twitter).to have_received(:update).with(content, anything())
|
||||
bot.twitter = twitter_spy(bot)
|
||||
end
|
||||
end
|
||||
|
||||
|
||||
describe Ebooks::Bot do
|
||||
include Ebooks::Test
|
||||
let(:bot) { TestBot.new('Test_Ebooks') }
|
||||
|
||||
before { Timecop.freeze }
|
||||
after { Timecop.return }
|
||||
|
||||
it "responds to dms" do
|
||||
simulate(bot) do
|
||||
bot.receive_event(mock_dm("m1sp", "this is a dm"))
|
||||
expect_direct_message(bot, "echo: this is a dm")
|
||||
end
|
||||
end
|
||||
|
||||
it "ignores its own dms" do
|
||||
simulate(bot) do
|
||||
expect(bot).to_not receive(:on_message)
|
||||
bot.receive_event(mock_dm("Test_Ebooks", "why am I talking to myself"))
|
||||
end
|
||||
end
|
||||
|
||||
it "responds to mentions" do
|
||||
simulate(bot) do
|
||||
bot.receive_event(mock_tweet("m1sp", "@test_ebooks this is a mention"))
|
||||
expect_tweet(bot, "@m1sp echo: this is a mention")
|
||||
end
|
||||
end
|
||||
|
||||
it "ignores its own mentions" do
|
||||
simulate(bot) do
|
||||
expect(bot).to_not receive(:on_mention)
|
||||
expect(bot).to_not receive(:on_timeline)
|
||||
bot.receive_event(mock_tweet("Test_Ebooks", "@m1sp i think that @test_ebooks is best bot"))
|
||||
end
|
||||
end
|
||||
|
||||
it "responds to timeline tweets" do
|
||||
simulate(bot) do
|
||||
bot.receive_event(mock_tweet("m1sp", "some excellent tweet"))
|
||||
expect_tweet(bot, "@m1sp fine tweet good sir")
|
||||
end
|
||||
end
|
||||
|
||||
it "ignores its own timeline tweets" do
|
||||
simulate(bot) do
|
||||
expect(bot).to_not receive(:on_timeline)
|
||||
bot.receive_event(mock_tweet("Test_Ebooks", "pudding is cute"))
|
||||
end
|
||||
end
|
||||
|
||||
it "links tweets to conversations correctly" do
|
||||
tweet1 = mock_tweet("m1sp", "tweet 1", id: 1, in_reply_to_status_id: nil)
|
||||
|
||||
tweet2 = mock_tweet("m1sp", "tweet 2", id: 2, in_reply_to_status_id: 1)
|
||||
|
||||
tweet3 = mock_tweet("m1sp", "tweet 3", id: 3, in_reply_to_status_id: nil)
|
||||
|
||||
bot.conversation(tweet1).add(tweet1)
|
||||
expect(bot.conversation(tweet2)).to eq(bot.conversation(tweet1))
|
||||
|
||||
bot.conversation(tweet2).add(tweet2)
|
||||
expect(bot.conversation(tweet3)).to_not eq(bot.conversation(tweet2))
|
||||
end
|
||||
|
||||
it "stops mentioning people after a certain limit" do
|
||||
simulate(bot) do
|
||||
bot.receive_event(mock_tweet("spammer", "@test_ebooks @m1sp 1"))
|
||||
expect_tweet(bot, "@spammer @m1sp echo: 1")
|
||||
|
||||
Timecop.travel(Time.now + 60)
|
||||
bot.receive_event(mock_tweet("spammer", "@test_ebooks @m1sp 2"))
|
||||
expect_tweet(bot, "@spammer @m1sp echo: 2")
|
||||
|
||||
Timecop.travel(Time.now + 60)
|
||||
bot.receive_event(mock_tweet("spammer", "@test_ebooks @m1sp 3"))
|
||||
expect_tweet(bot, "@spammer echo: 3")
|
||||
end
|
||||
end
|
||||
|
||||
it "doesn't stop mentioning them if they reply" do
|
||||
simulate(bot) do
|
||||
bot.receive_event(mock_tweet("spammer", "@test_ebooks @m1sp 4"))
|
||||
expect_tweet(bot, "@spammer @m1sp echo: 4")
|
||||
|
||||
Timecop.travel(Time.now + 60)
|
||||
bot.receive_event(mock_tweet("m1sp", "@spammer @test_ebooks 5"))
|
||||
expect_tweet(bot, "@m1sp @spammer echo: 5")
|
||||
|
||||
Timecop.travel(Time.now + 60)
|
||||
bot.receive_event(mock_tweet("spammer", "@test_ebooks @m1sp 6"))
|
||||
expect_tweet(bot, "@spammer @m1sp echo: 6")
|
||||
end
|
||||
end
|
||||
|
||||
it "doesn't get into infinite bot conversations" do
|
||||
simulate(bot) do
|
||||
bot.receive_event(mock_tweet("spammer", "@test_ebooks @m1sp 7"))
|
||||
expect_tweet(bot, "@spammer @m1sp echo: 7")
|
||||
|
||||
Timecop.travel(Time.now + 2)
|
||||
bot.receive_event(mock_tweet("spammer", "@test_ebooks @m1sp 8"))
|
||||
expect_tweet(bot, "@spammer @m1sp echo: 8")
|
||||
|
||||
Timecop.travel(Time.now + 2)
|
||||
bot.receive_event(mock_tweet("spammer", "@test_ebooks @m1sp 9"))
|
||||
expect(bot.twitter).to_not have_received(:update)
|
||||
end
|
||||
end
|
||||
|
||||
it "blocks blacklisted users on contact" do
|
||||
simulate(bot) do
|
||||
bot.blacklist = ["spammer"]
|
||||
bot.receive_event(mock_tweet("spammer", "@test_ebooks @m1sp 7"))
|
||||
expect(bot.twitter).to have_received(:block).with("spammer")
|
||||
end
|
||||
end
|
||||
end
|
203945
spec/data/0xabad1dea.json
Normal file
203945
spec/data/0xabad1dea.json
Normal file
File diff suppressed because it is too large
Load diff
6157
spec/data/0xabad1dea.model
Normal file
6157
spec/data/0xabad1dea.model
Normal file
File diff suppressed because it is too large
Load diff
37
spec/memprof.rb
Normal file
37
spec/memprof.rb
Normal file
|
@ -0,0 +1,37 @@
|
|||
require 'objspace'
|
||||
|
||||
module MemoryUsage
|
||||
MemoryReport = Struct.new(:total_memsize)
|
||||
|
||||
def self.full_gc
|
||||
GC.start(full_mark: true)
|
||||
end
|
||||
|
||||
def self.report(&block)
|
||||
rvalue_size = GC::INTERNAL_CONSTANTS[:RVALUE_SIZE]
|
||||
|
||||
full_gc
|
||||
GC.disable
|
||||
|
||||
total_memsize = 0
|
||||
|
||||
generation = nil
|
||||
ObjectSpace.trace_object_allocations do
|
||||
generation = GC.count
|
||||
block.call
|
||||
end
|
||||
|
||||
ObjectSpace.each_object do |obj|
|
||||
next unless generation == ObjectSpace.allocation_generation(obj)
|
||||
memsize = ObjectSpace.memsize_of(obj) + rvalue_size
|
||||
# compensate for API bug
|
||||
memsize = rvalue_size if memsize > 100_000_000_000
|
||||
total_memsize += memsize
|
||||
end
|
||||
|
||||
GC.enable
|
||||
full_gc
|
||||
|
||||
return MemoryReport.new(total_memsize)
|
||||
end
|
||||
end
|
74
spec/model_spec.rb
Normal file
74
spec/model_spec.rb
Normal file
|
@ -0,0 +1,74 @@
|
|||
require 'spec_helper'
|
||||
require 'memory_profiler'
|
||||
require 'tempfile'
|
||||
|
||||
def Process.rss; `ps -o rss= -p #{Process.pid}`.chomp.to_i; end
|
||||
|
||||
describe Ebooks::Model do
|
||||
describe 'making tweets' do
|
||||
before(:all) { @model = Ebooks::Model.consume(path("data/0xabad1dea.json")) }
|
||||
|
||||
it "generates a tweet" do
|
||||
s = @model.make_statement
|
||||
expect(s.length).to be <= 140
|
||||
puts s
|
||||
end
|
||||
|
||||
it "generates an appropriate response" do
|
||||
s = @model.make_response("hi")
|
||||
expect(s.length).to be <= 140
|
||||
expect(s.downcase).to include("hi")
|
||||
puts s
|
||||
end
|
||||
end
|
||||
|
||||
it "consumes, saves and loads models correctly" do
|
||||
model = nil
|
||||
|
||||
report = MemoryUsage.report do
|
||||
model = Ebooks::Model.consume(path("data/0xabad1dea.json"))
|
||||
end
|
||||
expect(report.total_memsize).to be < 200000000
|
||||
|
||||
file = Tempfile.new("0xabad1dea")
|
||||
model.save(file.path)
|
||||
|
||||
report2 = MemoryUsage.report do
|
||||
model = Ebooks::Model.load(file.path)
|
||||
end
|
||||
expect(report2.total_memsize).to be < 3000000
|
||||
|
||||
expect(model.tokens[0]).to be_a String
|
||||
expect(model.sentences[0][0]).to be_a Fixnum
|
||||
expect(model.mentions[0][0]).to be_a Fixnum
|
||||
expect(model.keywords[0]).to be_a String
|
||||
|
||||
puts "0xabad1dea.model uses #{report2.total_memsize} bytes in memory"
|
||||
end
|
||||
|
||||
describe '.consume' do
|
||||
it 'interprets lines with @ as mentions' do
|
||||
file = Tempfile.new('mentions')
|
||||
file.write('@m1spy hello!')
|
||||
file.close
|
||||
|
||||
model = Ebooks::Model.consume(file.path)
|
||||
expect(model.sentences.count).to eq 0
|
||||
expect(model.mentions.count).to eq 1
|
||||
|
||||
file.unlink
|
||||
end
|
||||
|
||||
it 'interprets lines without @ as statements' do
|
||||
file = Tempfile.new('statements')
|
||||
file.write('hello!')
|
||||
file.close
|
||||
|
||||
model = Ebooks::Model.consume(file.path)
|
||||
expect(model.mentions.count).to eq 0
|
||||
expect(model.sentences.count).to eq 1
|
||||
|
||||
file.unlink
|
||||
end
|
||||
end
|
||||
end
|
6
spec/spec_helper.rb
Normal file
6
spec/spec_helper.rb
Normal file
|
@ -0,0 +1,6 @@
|
|||
require 'twitter_ebooks'
|
||||
require_relative 'memprof'
|
||||
|
||||
def path(relpath)
|
||||
File.join(File.dirname(__FILE__), relpath)
|
||||
end
|
34
twitter_ebooks.gemspec
Normal file
34
twitter_ebooks.gemspec
Normal file
|
@ -0,0 +1,34 @@
|
|||
# -*- encoding: utf-8 -*-
|
||||
require File.expand_path('../lib/twitter_ebooks/version', __FILE__)
|
||||
|
||||
Gem::Specification.new do |gem|
|
||||
gem.authors = ["Jaiden Mispy"]
|
||||
gem.email = ["^_^@mispy.me"]
|
||||
gem.description = %q{Markov chains for all your friends~}
|
||||
gem.summary = %q{Markov chains for all your friends~}
|
||||
gem.homepage = ""
|
||||
|
||||
gem.files = `git ls-files`.split($\)
|
||||
gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
|
||||
gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
|
||||
gem.name = "twitter_ebooks"
|
||||
gem.require_paths = ["lib"]
|
||||
gem.version = Ebooks::VERSION
|
||||
|
||||
gem.add_development_dependency 'rspec'
|
||||
gem.add_development_dependency 'rspec-mocks'
|
||||
gem.add_development_dependency 'memory_profiler'
|
||||
gem.add_development_dependency 'timecop'
|
||||
gem.add_development_dependency 'pry-byebug'
|
||||
gem.add_development_dependency 'yard'
|
||||
|
||||
gem.add_runtime_dependency 'twitter', '~> 5.0'
|
||||
gem.add_runtime_dependency 'rufus-scheduler'
|
||||
gem.add_runtime_dependency 'gingerice'
|
||||
gem.add_runtime_dependency 'htmlentities'
|
||||
gem.add_runtime_dependency 'engtagger'
|
||||
gem.add_runtime_dependency 'fast-stemmer'
|
||||
gem.add_runtime_dependency 'highscore'
|
||||
gem.add_runtime_dependency 'pry'
|
||||
gem.add_runtime_dependency 'oauth'
|
||||
end
|
Loading…
Add table
Add a link
Reference in a new issue