I'm roughly spliting this into two topics, feel free to comment on one/both...
1) Anyone have experience with text mining packages in Python and/or R, to parse keywords/phrases from press releases (futures and/or stock markets)? Did you find one method (NLP, prototype matching, etc) worked better than others in situations involving keywords spaced widely apart as-opposed to clustered in a few sentences?
- Noted Python packages: textmining; Pattern
- Noted R packages: (none yet - this blog has a basic start to making scripts)
2) In stocks, any experience with parsing SEC's EDGAR system for specific, less-common filings (e.g. 13Fs)?
- I see a few scripts/packages available in Python for the more common filings like 10Q/10K, but none for 13Fs as example, mostly because the SEC's site isn't set up for feeds on such.
Just starting into research on this, so any pointers are appreciated!