twitter-ebooks/NOTES.md

5 lines
256 B
Markdown
Raw Normal View History

2013-11-08 06:02:05 +11:00
- Files in text/ are preprocessed by `rake consume` and serialized
- e.g. text/foo.tweets becomes consumed/foo.corpus
- `rake consume` looks at hashes to know which it needs to update
- Preprocessed corpus files are loaded at runtime by Corpus.load('foo')