Looks like the latency is specific to my application. Running latency tests on a similar diamond disruptor configuration with much smaller memory footprints looks to reduce the multithreaded latency issues.
Tests for a Disruptor running multi-threaded ~ 10-11mm messages / second
Run 0, Disruptor=10,785,159 ops/sec
Run 1, Disruptor=10,907,504 ops/sec
Run 2, Disruptor=11,059,500 ops/sec
Run 3, Disruptor=10,725,010 ops/sec
Run 4, Disruptor=11,037,527 ops/sec
Run 5, Disruptor=10,703,200 ops/sec
Run 6, Disruptor=11,156,978 ops/sec
So the good and bad news is that the latency is mostly originating from my program...just not sure how to best resolve it. I'm not going to spend too much more time on this before moving on, but I'm going to look around to see if there might be some large changes that make a difference...or if single threaded is the way to go.
Figured out the problem with my latency tests. I had been testing both latency and throughput, trying to publish as many messages as possible to my program and measuring the latencies I'd expect when it was maxed out.
I feel that a better latency test is to send a message, pause, send a message, pause,.... and to measure the latency quantiles and overall throughput throughout the test since this should replicate the burst traffic we can expect to see from the market.
So, adding a 1Ás pause between sending each event resulted in latencies that are only 10X the single threaded case vs. 10,000X times I was finding before. Throughput is still ~12MB/s which is way above what I'd expect to see from IQFeed even with 1800 symbols being watched in realtime. The other thing to take into account is I'm currently running this on a dual core computer, so until I run it on a computer with 4 cores or higher I won't have a true picture of the latency profile.
Messages Per Second 92,943
Message Size(bytes): 130
TotalThroughput (KB/S): 12,082.6
50.0% took 21.0 Ás, 90.0% took 73917.0 Ás, 99.0% took 97146.0 Ás, 99.9% took 225502.0 Ás, 99.99% took 298069.0 Ás, worst took 299526 Ás Parser
50.0% took 14.0 Ás, 90.0% took 31.0 Ás, 99.0% took 608.0 Ás, 99.9% took 34895.0 Ás, 99.99% took 121863.0 Ás, worst took 126712 Ás Serializer
50.0% took 22.0 Ás, 90.0% took 102.0 Ás, 99.0% took 1659.0 Ás, 99.9% took 59190.0 Ás, 99.99% took 156052.0 Ás, worst took 171280 Ás
50.0% took 35.0 Ás, 90.0% took 84655.0 Ás, 99.0% took 125872.0 Ás, 99.9% took 306603.0 Ás, 99.99% took 331352.0 Ás, worst took 334334 Ás
The following user says Thank You to dcooke888 for this post:
At this point I've got a Connector that takes IQFeed messages in, writes them to disk, parses to objects and fires them off. The goal for this section was low latency, and since I'm not doing any really high frequency trading I believe my current connector will suffice at current latency / throughput levels.
End uses Backtesting
- Throughput is most important, latency doesn't really matter
- Ideally the program will be massively parallel streaming to multiple receiving computers
- Program state will be kept in memory using the stream of event objects
- Reading TIC data from disk will not happen often in live trading, however I in case of program failure I will need to be able quickly load the current program state, ideally with some sort of failover protection.
Overall design requirements:
- Updates in real-time (data can be compressed / verified at the EOD)
- Preserves event received order
- Data can be requested by day, symbol and type (trade, Bid/Ask TOB, Bid/Ask depth)
Performance Measurement: based on throughput given various conditions
- One ticker, one feed type
- Multiple tickers, one feed type
- One Ticker, multiple feed types
- Multiple tickers, multiple feed types
- Startup performance (stream whole market 1 day multiple tickers / multiple feeds);
Most of my backtesting occurs across an entire portfolio, using multiple feed types. I will likely bias to the multiple tickers and multiple feed types, since this is my test case and should make the best startup / reboot times.
The only thing I don't understand is all the focus on latency, when IQFeed is not a latency focused app. It is not designed to be latency sensitive.
Sent from my LG Optimus G Pro
Due to time constraints, please do not PM me if your question can be resolved or answered on the forum.
Need help? 1) Stop changing things. No new indicators, charts, or methods. Be consistent with what is in front of you first. 2) Start a journal and post to it daily with the trades you made to show your strengths and weaknesses. 3) Set goals for yourself to reach daily. Make them about how you trade, not how much money you make. 4) Accept responsibility for your actions. Stop looking elsewhere to explain away poor performance. 5) Where to start as a trader? Watch this webinar and read this thread for hundreds of questions and answers. 6) Help using the forum? Watch this video to learn general tips on using the site.
If you want to support our community, become an Elite Member.
you're right, IQFeed is not particularly latency sensitive. I've looked at latency mainly since I want the code base to be modular allowing me to plug/play feeds as I see fit as well as a general learning experience.
All the work that I have done except for the parsing functionality from String to java object is generic, even the objects are designed as generic objects (I parse an IQFeed L1 message to a lastEvent (last price/size/time) and two TOBEvents (bid/ask / price/size/time)....same for other types of messages. Therefore the latency measurements will be useful if I move to a different feed and need to know what kind of latency I can expect from my program.
The second reason is learning, my background is mechanical engineering...only had one comsci class at university, so there is a lot I don't know about the field. I really didn't understand the relationship between buffers, latency and throughput and the testing has helped me to shape a more clear mental model of the relationship. I really didn't know what performance(latency / throughput) I could expect from any of the different types of classes, nor did I understand how multi-threading would effect the performance. This has at least helped me realize the complexity required to tackle the lower latencies and at the same time given me a sense of latencies that are realistic to achieve on cheap commodity hardware without a ton of work.
The following 2 users say Thank You to dcooke888 for this post:
I've written a Tic datastore using LZ4 Compression and Unsafe serialization that stores Trades, TopOfBook Updates (Bid/Ask), Depth Updates, and News Events.
My goals were the following:
- Ensure no duplicates - When adding files to the database ensures I don't accidentally double up on the same events.
- High Stream Rate - Use LZ4 and Unsafe but allow for changes in the future if another library performs better
- Asynchronous - Data stream would be too large to dump into memory, so force asynchronous
This will be used for both backtesting and data analysis. All of my data bars will be based off this information. After completing the database here is the performance that I'm seeing