Jaiden Mispy
a272bd69ca
Handle edge-case corpuses with short sentences
2016-01-21 12:51:33 -08:00
Jaiden Mispy
14f82a716f
Don't infinite loop for very small tweets. #78
2016-01-13 00:06:41 -08:00
Jaiden Mispy
25e3724f4d
Raise memory expectation in test slightly
2016-01-12 23:28:53 -08:00
Jaiden Mispy
42eee9f8e6
Merge pull request #82 from negatendo/consume_append
...
append to model
2015-06-13 18:07:20 +10:00
Joshua Charles Campbell
a885d5fe22
stuff I had to change to get the bot working
2015-06-04 10:46:01 -06:00
Brett O'Connor
43491cb668
added append method which reads and adds to an existing model file
2015-05-19 15:49:35 -06:00
Jaiden Mispy
9f9dfc9b0c
Add warning when consuming a plaintext corpus
2015-03-11 02:41:18 -07:00
Jaiden Mispy
0292264613
slightly less paranoid about including people
2014-12-16 10:59:58 +11:00
Jaiden Mispy
1977445b1c
Lots of documentation and cleanup
2014-12-05 21:12:39 +11:00
Jaiden Mispy
2e336fb9be
On second thought, we can't use a cache system
...
Simply because the corpuses are too darn big to keep around
2014-11-18 13:51:31 +11:00
Jaiden Mispy
b72a6db0e1
Threading!
2014-11-18 13:24:59 +11:00
Geoffroy Couprie
2698963fb1
consume multiple corpuses
2014-10-29 18:56:37 +01:00
Jaiden Mispy
0cb7abcb52
Test that models save and load correctly
2014-10-25 06:59:34 -07:00
Jaiden Mispy
302ea0229d
grr stupid mistake
2014-10-25 05:49:23 -07:00
Jaiden Mispy
4052d534b2
Save only necessary data into model
2014-10-25 04:26:52 -07:00
Jaiden Mispy
3b1d6f856d
Switch to using token indexes instead of strings
2014-10-24 09:55:49 -07:00
Paul Friedman
927efe7f07
Fix parser swapping mentions and sentences
2014-10-19 22:33:17 -07:00
Jaiden Mispy
228e0caa65
More memory profiling
2014-10-18 22:21:50 -07:00
Jaiden Mispy
b7f67ec0a6
Memory optimization
2014-10-16 03:02:39 -07:00
Jaiden Mispy
d09d968915
rspec and memory_profiler
2014-10-14 01:02:08 -07:00
Joel McCoy
be6ac9127f
MODEL: Read in utf-8, only parse CSV once
...
Ran into `Encoding::CompatibilityError` issue trying to consume my corpus (tweets.csv) on Windows 7, but this likely affects other environments as well.
Fix: force reading corpus file contents as utf-8.
Also a quick clean-up of the CSV flow to only parse the content once instead of double-dipping.
2014-06-27 18:42:51 -04:00
Brett O'Connor
2aac54c7aa
csv import now looks for text column
2014-05-03 16:44:07 -06:00
Joel McCoy
872dabdbf8
Support consuming tweets.csv from official twitter archives
2014-04-30 20:32:51 -04:00
Mispy
5d55d90f85
Be more paranoid about identifying mentions
2014-04-24 20:55:53 -07:00
Erik Michaels-Ober
7e033b7b3b
Fix file permissions
2014-02-12 16:23:49 +01:00
Mispy
34b8c5d0a0
Use binary read/write mode for Windows
2014-01-28 16:36:23 -08:00
Mispy
306c9ab873
Allow consumption of json archives
2013-11-27 05:12:54 -08:00
Mispy
61c5caee4d
Retry limit and mention separation
2013-11-20 12:07:24 -08:00
Mispy
95e96ceef9
2.0.9 - Whups, broke context
2013-11-14 10:19:48 -08:00
Mispy
00f0228dd4
2.0.8 -- different generation algorithm
2013-11-14 07:58:46 -08:00
Mispy
e87dc5862b
Github time!
2013-11-08 06:02:05 +11:00