twitter-ebooks

Author	SHA1	Message	Date
Jaiden Mispy	a272bd69ca	Handle edge-case corpuses with short sentences	2016-01-21 12:51:33 -08:00
Jaiden Mispy	14f82a716f	Don't infinite loop for very small tweets. #78	2016-01-13 00:06:41 -08:00
Jaiden Mispy	25e3724f4d	Raise memory expectation in test slightly	2016-01-12 23:28:53 -08:00
Jaiden Mispy	42eee9f8e6	Merge pull request #82 from negatendo/consume_append append to model	2015-06-13 18:07:20 +10:00
Joshua Charles Campbell	a885d5fe22	stuff I had to change to get the bot working	2015-06-04 10:46:01 -06:00
Brett O'Connor	43491cb668	added append method which reads and adds to an existing model file	2015-05-19 15:49:35 -06:00
Jaiden Mispy	9f9dfc9b0c	Add warning when consuming a plaintext corpus	2015-03-11 02:41:18 -07:00
Jaiden Mispy	0292264613	slightly less paranoid about including people	2014-12-16 10:59:58 +11:00
Jaiden Mispy	1977445b1c	Lots of documentation and cleanup	2014-12-05 21:12:39 +11:00
Jaiden Mispy	2e336fb9be	On second thought, we can't use a cache system Simply because the corpuses are too darn big to keep around	2014-11-18 13:51:31 +11:00
Jaiden Mispy	b72a6db0e1	Threading!	2014-11-18 13:24:59 +11:00
Geoffroy Couprie	2698963fb1	consume multiple corpuses	2014-10-29 18:56:37 +01:00
Jaiden Mispy	0cb7abcb52	Test that models save and load correctly	2014-10-25 06:59:34 -07:00
Jaiden Mispy	302ea0229d	grr stupid mistake	2014-10-25 05:49:23 -07:00
Jaiden Mispy	4052d534b2	Save only necessary data into model	2014-10-25 04:26:52 -07:00
Jaiden Mispy	3b1d6f856d	Switch to using token indexes instead of strings	2014-10-24 09:55:49 -07:00
Paul Friedman	927efe7f07	Fix parser swapping mentions and sentences	2014-10-19 22:33:17 -07:00
Jaiden Mispy	228e0caa65	More memory profiling	2014-10-18 22:21:50 -07:00
Jaiden Mispy	b7f67ec0a6	Memory optimization	2014-10-16 03:02:39 -07:00
Jaiden Mispy	d09d968915	rspec and memory_profiler	2014-10-14 01:02:08 -07:00
Joel McCoy	be6ac9127f	MODEL: Read in utf-8, only parse CSV once Ran into `Encoding::CompatibilityError` issue trying to consume my corpus (tweets.csv) on Windows 7, but this likely affects other environments as well. Fix: force reading corpus file contents as utf-8. Also a quick clean-up of the CSV flow to only parse the content once instead of double-dipping.	2014-06-27 18:42:51 -04:00
Brett O'Connor	2aac54c7aa	csv import now looks for text column	2014-05-03 16:44:07 -06:00
Joel McCoy	872dabdbf8	Support consuming tweets.csv from official twitter archives	2014-04-30 20:32:51 -04:00
Mispy	5d55d90f85	Be more paranoid about identifying mentions	2014-04-24 20:55:53 -07:00
Erik Michaels-Ober	7e033b7b3b	Fix file permissions	2014-02-12 16:23:49 +01:00
Mispy	34b8c5d0a0	Use binary read/write mode for Windows	2014-01-28 16:36:23 -08:00
Mispy	306c9ab873	Allow consumption of json archives	2013-11-27 05:12:54 -08:00
Mispy	61c5caee4d	Retry limit and mention separation	2013-11-20 12:07:24 -08:00
Mispy	95e96ceef9	2.0.9 - Whups, broke context	2013-11-14 10:19:48 -08:00
Mispy	00f0228dd4	2.0.8 -- different generation algorithm	2013-11-14 07:58:46 -08:00
Mispy	e87dc5862b	Github time!	2013-11-08 06:02:05 +11:00

31 commits