Commit graph

31 commits

Author SHA1 Message Date
Jaiden Mispy
a272bd69ca Handle edge-case corpuses with short sentences 2016-01-21 12:51:33 -08:00
Jaiden Mispy
14f82a716f Don't infinite loop for very small tweets. #78 2016-01-13 00:06:41 -08:00
Jaiden Mispy
25e3724f4d Raise memory expectation in test slightly 2016-01-12 23:28:53 -08:00
Jaiden Mispy
42eee9f8e6 Merge pull request #82 from negatendo/consume_append
append to model
2015-06-13 18:07:20 +10:00
Joshua Charles Campbell
a885d5fe22 stuff I had to change to get the bot working 2015-06-04 10:46:01 -06:00
Brett O'Connor
43491cb668 added append method which reads and adds to an existing model file 2015-05-19 15:49:35 -06:00
Jaiden Mispy
9f9dfc9b0c Add warning when consuming a plaintext corpus 2015-03-11 02:41:18 -07:00
Jaiden Mispy
0292264613 slightly less paranoid about including people 2014-12-16 10:59:58 +11:00
Jaiden Mispy
1977445b1c Lots of documentation and cleanup 2014-12-05 21:12:39 +11:00
Jaiden Mispy
2e336fb9be On second thought, we can't use a cache system
Simply because the corpuses are too darn big to keep around
2014-11-18 13:51:31 +11:00
Jaiden Mispy
b72a6db0e1 Threading! 2014-11-18 13:24:59 +11:00
Geoffroy Couprie
2698963fb1 consume multiple corpuses 2014-10-29 18:56:37 +01:00
Jaiden Mispy
0cb7abcb52 Test that models save and load correctly 2014-10-25 06:59:34 -07:00
Jaiden Mispy
302ea0229d grr stupid mistake 2014-10-25 05:49:23 -07:00
Jaiden Mispy
4052d534b2 Save only necessary data into model 2014-10-25 04:26:52 -07:00
Jaiden Mispy
3b1d6f856d Switch to using token indexes instead of strings 2014-10-24 09:55:49 -07:00
Paul Friedman
927efe7f07 Fix parser swapping mentions and sentences 2014-10-19 22:33:17 -07:00
Jaiden Mispy
228e0caa65 More memory profiling 2014-10-18 22:21:50 -07:00
Jaiden Mispy
b7f67ec0a6 Memory optimization 2014-10-16 03:02:39 -07:00
Jaiden Mispy
d09d968915 rspec and memory_profiler 2014-10-14 01:02:08 -07:00
Joel McCoy
be6ac9127f MODEL: Read in utf-8, only parse CSV once
Ran into `Encoding::CompatibilityError` issue trying to consume my corpus (tweets.csv) on Windows 7, but this likely affects other environments as well. 

Fix: force reading corpus file contents as utf-8.

Also a quick clean-up of the CSV flow to only parse the content once instead of double-dipping.
2014-06-27 18:42:51 -04:00
Brett O'Connor
2aac54c7aa csv import now looks for text column 2014-05-03 16:44:07 -06:00
Joel McCoy
872dabdbf8 Support consuming tweets.csv from official twitter archives 2014-04-30 20:32:51 -04:00
Mispy
5d55d90f85 Be more paranoid about identifying mentions 2014-04-24 20:55:53 -07:00
Erik Michaels-Ober
7e033b7b3b Fix file permissions 2014-02-12 16:23:49 +01:00
Mispy
34b8c5d0a0 Use binary read/write mode for Windows 2014-01-28 16:36:23 -08:00
Mispy
306c9ab873 Allow consumption of json archives 2013-11-27 05:12:54 -08:00
Mispy
61c5caee4d Retry limit and mention separation 2013-11-20 12:07:24 -08:00
Mispy
95e96ceef9 2.0.9 - Whups, broke context 2013-11-14 10:19:48 -08:00
Mispy
00f0228dd4 2.0.8 -- different generation algorithm 2013-11-14 07:58:46 -08:00
Mispy
e87dc5862b Github time! 2013-11-08 06:02:05 +11:00