Multi-threaded Custom Financial Database - Matlab, R project and Python | futures io social day trading
futures io futures trading


Multi-threaded Custom Financial Database
Updated: Views / Replies:2,249 / 16
Created: by dcooke888 Attachments:6

Welcome to futures io.

(If you already have an account, login at the top of the page)

futures io is the largest futures trading community on the planet, with over 90,000 members. At futures io, our goal has always been and always will be to create a friendly, positive, forward-thinking community where members can openly share and discuss everything the world of trading has to offer. The community is one of the friendliest you will find on any subject, with members going out of their way to help others. Some of the primary differences between futures io and other trading sites revolve around the standards of our community. Those standards include a code of conduct for our members, as well as extremely high standards that govern which partners we do business with, and which products or services we recommend to our members.

At futures io, our focus is on quality education. No hype, gimmicks, or secret sauce. The truth is: trading is hard. To succeed, you need to surround yourself with the right support system, educational content, and trading mentors all of which you can find on futures io, utilizing our social trading environment.

With futures io, you can find honest trading reviews on brokers, trading rooms, indicator packages, trading strategies, and much more. Our trading review process is highly moderated to ensure that only genuine users are allowed, so you dont need to worry about fake reviews.

We are fundamentally different than most other trading sites:
  • We are here to help. Just let us know what you need.
  • We work extremely hard to keep things positive in our community.
  • We do not tolerate rude behavior, trolling, or vendors advertising in posts.
  • We firmly believe in and encourage sharing. The holy grail is within you, we can help you find it.
  • We expect our members to participate and become a part of the community. Help yourself by helping others.

You'll need to register in order to view the content of the threads and start contributing to our community.  It's free and simple.

-- Big Mike, Site Administrator

Reply
 6  
 
Thread Tools Search this Thread
 

Multi-threaded Custom Financial Database

  #1 (permalink)
Elite Member
Boston, MA
 
Futures Experience: Beginner
Platform: IB
Favorite Futures: Stocks
 
Posts: 29 since Feb 2012
Thanks: 6 given, 21 received

Multi-threaded Custom Financial Database

I'm not sure if this is the correct place to post, so please advise if another thread is more suitable.

After seeing Big Mike post about his trading platform, I've been inspired to improve my own. After analyzing my strategy development over the past few months, it's become clear that my largest consistent time drain is getting data to a workable format for strategy testing/development. My goal is to create a database that is compatible across windows / unix and can output its data to any program/language so that I can use it in the future regardless of language.

Programming Languages to use:
- Matlab
- Java
- SQL

Data feed:
- IQFeed main data source
- Other longer-interval data sources (Reuters, FRED, etc...)

My current Database:
- Tick Database:
- Data: Tick data based - L1 (Bid/Ask/Last), L2 / Market Depth
- FileType: Flat File database
- FileFormat: Custom binary format file separated by symbol, date, and feed type (L1, L2/Market depth)

New features to include - longer data intervals (best database type?):
- Bars - Sec / Minute / Daily bars (Currently calculated on demand in backtesting)
- Calculated fields - VWAP, Avg vol, avg vol for time of day, etc...
- Fundamental data - News Events/time, VWAP, etc....

Any advice on the longer-interval database type would be much appreciated. I saw mike uses a MySQL database, I've got some SQL experience so I'm leaning toward that but would love to hear any comments on why/why not use it.

Also, I'm a noob when it comes to programming. My background is mechanical engineering and I've really only picked up programming in java over the past year. There are many things I don't know I'm doing wrong, i.e. I never use unit tests because it has never been at the top of my priority list, so if you see me making any noob mistakes please let me know.

Reply With Quote
The following 2 users say Thank You to dcooke888 for this post:
 
  #2 (permalink)
Quick Summary
Quick Summary Post

Quick Summary is created and edited by users like you... Add FAQ's, Links and other Relevant Information by clicking the edit button in the lower right hand corner of this message.

 
  #3 (permalink)
Elite Member
Boston, MA
 
Futures Experience: Beginner
Platform: IB
Favorite Futures: Stocks
 
Posts: 29 since Feb 2012
Thanks: 6 given, 21 received

Initial Listener for IQFeed


I'm writing out the IQFeed listener and here is my thought process:

IQFeed has 5 client ports that you connect to:
  • Level1Port
  • LookupPort
  • Level2Port
  • AdminPort
  • DerivativePort

So for each of these I'm going to have a simple process that I need to test throughput:
  • Read Socket
  • Journal Data: Write to disk
  • Parse Data: Parse to object from string

My goal is for maximum throughput and modular code, so I'll test each one in a single threaded manner and then figure out how to manage it in a multi-threaded context. The tests I'm going to run are:
1 - Socket Read rate
2 - Journal Data to disk (Multiple steps)
- Serialize incoming data
- Write bytes to file
3 - Parse Data from string to object

I believe maximizing each of the above processes in a single threaded context should allow me to get the fastest throughput when concurrency is applied...that is if I can properly write concurrent code.

I'll post the results and link to the code when its complete.

Reply With Quote
The following user says Thank You to dcooke888 for this post:
 
  #4 (permalink)
Elite Member
Boston, MA
 
Futures Experience: Beginner
Platform: IB
Favorite Futures: Stocks
 
Posts: 29 since Feb 2012
Thanks: 6 given, 21 received

Socket Read Performance

I've written some basic socket tests in order to get a sense of socket performance (mostly throughput) across a TCP socket. I tested both the java.net.Socket and java.nio.SocketChannel in a blocking fashion(for simplicity) with the results below.

Please register on futures.io to view futures trading content such as post attachment(s), image(s), and screenshot(s).

Please register on futures.io to view futures trading content such as post attachment(s), image(s), and screenshot(s).


My gut says that I'm doing something wrong since java NIO classes are supposed to be faster than the basic IO classes. However, I think there are still some useful takeaways from this.

- Don't Read data into a buffer smaller than 256 bytes
- Throughput over TCP/IP shouldn't be an issue as long as my buffer size is over 1KB

Let me know if anyone has interest in code and I'll start posting the tests to a github repository.

Next up is journaling the data (serialization and file I/O)

Reply With Quote
 
  #5 (permalink)
Elite Member
Boston, MA
 
Futures Experience: Beginner
Platform: IB
Favorite Futures: Stocks
 
Posts: 29 since Feb 2012
Thanks: 6 given, 21 received

Java Object Serialization

Following the mantra of standing on peoples shoulders, here's an article on serialization speeds within java.

Mechanical Sympathy: Native C/C++ Like Performance For Java Object Serialisation

Looks like unsafe memory for serialization of an object would be the way to go
- For pure speed Java's unsafe memory offers the fastest object based serialization you can get
- The downside is that it requires custom written serialization for each object (i.e. read object from bytes[] and write object to bytes[]) and a supporting class to read/write to an unsafe memory object.

Reply With Quote
 
  #6 (permalink)
Elite Member
Boston, MA
 
Futures Experience: Beginner
Platform: IB
Favorite Futures: Stocks
 
Posts: 29 since Feb 2012
Thanks: 6 given, 21 received

Java File I/O Read/Write Performance

Below are the results of testing Read/Write performance for a few different types of readers/writers:

The performance test was writing/reading 100MB of "Write Me because I need to be written%n" one line at a time and measuring the time it took to write the entire file.

Read(MB/s) Write(MB/s)
File Type HDD SSD HDD SSD
BufferedReader/Writer 207.4 199.5 57.9 105.8
MemoryMappedRandomAccess 125.9 124.8 86.7 94.1
RandomAccessFile 156.5 156.2 54.4 93.1

The performance test was in the BufferedReader/Writer's sweet spot since it was completely character data. This meant I had more work to manage reading/writing lines and might have made the test less valid.

Some interesting things to note that surprised me:

- Reading performance while streaming is very close between a HDD and SSD
- Write performance is much better for the SSD.
- The write performance of MemoryMapped files on a HDD is much higher than the other methods. This is surprising but I think its occurring because java is handing off write duties to the OS.

Due to the fact that I'm going to want to timestamp and compress the data before it gets written to disk, I'm not sure that this is a great test of my actual case. I think I'll have to compare complete serialization / compression / write performance to have a fair test....that'll be next.

Code is attached.

Attached Files
Register to download File Type: zip FileWritePerformance.zip (4.7 KB, 4 views)
Reply With Quote
 
  #7 (permalink)
Elite Member
Boston, MA
 
Futures Experience: Beginner
Platform: IB
Favorite Futures: Stocks
 
Posts: 29 since Feb 2012
Thanks: 6 given, 21 received

Looking at holistic performance

During the TIC database performance testing I realized that there are different goals for different portions of of the process.

1 - Real-time Streaming / Journal
- Goal: Lowest latency possible while ensuring throughput is not an issue
- Reads data stream from IQFeed
- Parses IQFeed data to event objects and sends out over IPC
- Writes IQFeed data to a journal for parsing later

2 - EOD Parser / Backtesting streamer
- Goal: Highest Backtesting throughput possible...not latency sensitive
- Reads Journals data at EOD and parses it to best format for backtesting throughput
- Feed independent data types to allow multi-feed compatibility

During the day I will not need to query tic data since I'll be receiving the data as a stream using IPC, so as long as the data gets journaled (written to disk) I don't need to touch it again. Latency is the goal because the faster I can react to market data, the better.

When performing backtests I am recreating the stream using historical data, therefore I care about both correct data order and maximum throughput to minimize backtesting time.

Reply With Quote
 
  #8 (permalink)
Elite Member
Boston, MA
 
Futures Experience: Beginner
Platform: IB
Favorite Futures: Stocks
 
Posts: 29 since Feb 2012
Thanks: 6 given, 21 received

Design possibilities for realtime streamer/journaler/parser

Designing a minimal latency Port based parser / journaler / object stream

Goal:
- Listen across multiple ports (3 IQFeed Ports - L1, L2, Admin)
- Minimize overall latency from input message to output object (50/90/99/99.9/99.99/Worst)
- Keep throughput above IQFeed maximum output rate for (1KB/s/Symbol * 1800 Symbols = 1.8MB/s ~ 2MB/s)
- Maximize program modularity by sending to output stream

Requirements:
- Stream order must be preserved
- Data must be written to disk before sent to output stream

Individual Program Components
1: TCP/IP Reader - Read message off of socket
2: Journaler - Write message to disk
3: Message Parser - Parse message to Object
4: Serializer - Parse Object to bytes
5: Compressor (Optional) - Compress bytes
6: IPC - Multicast out to listeners


Processing Configurations
Sequential Processing:
In -> Journaler -> Parse to Object -> Serialize -> Compress -> Publish

Parallel Processing:
+ -> Parse Object -> Serialize -> Compress -> +
| |
In ---> + --------------> Journaler ----------------> + -> Publish

Port Reading Configurations
3 Ports -> 3 Sockets (3 reader Threads each Blocks and sends data to processor as it is received):
L1 ---------> TCP/IP reader -> Processing Path

L2 ---------> TCP/IP reader -> Processing Path

Admin ------> TCP/IP reader -> Processing Path

3 Ports -> 3 Readers (Multiple read threads -> 1 processing path)
L1 ---------> TCP/IP reader -> +
|
L2 ---------> TCP/IP reader -> + -> Processing Path
|
Admin ------> TCP/IP reader -> +

3 Ports -> 1 Selector (1 Thread asynchronous - Selector - sends to single processing path):

L1 ------> +
|
L2 ------> + -> Selector -> TCP/IP reader -> Processing Path
|
Admin ---> +


1: TCP/IP Reader
Requirements:
- Read message from Port (all messages use line separator)
- Add receive timestamp and timezone of computer;

String Reader Output Message (byte timeZone, long timestamp, String message)
1: java.io.Socket -> java.io.InputStream -> java.io.InputStreamReader -> java.io.BufferedReader
- Blocks thread until full line is ready to be read
- Multiple ports: Requires 1 thread for each port

Byte Reader Output Message (byte timeZone, long timestamp, byte[] message)
1: java.nio.SocketChannel -> java.nio.DirectByteBuffer
- Asynchronous and reads all bytes available
- parses from end of message to find newLine character and sends chunks of bytes up until last newLine
- compacts the rest to save in case of a partial read
- Multiple ports: Can

2: Journaler
Requirements:
- Write message to disk with lowest possible latency

String Journaler
1: java.io.BufferedWriter -> java.io.FileOutputStream
- write character data to buffered writer
- Flush strategy
- 1 - flush after each write (ensures data isn't lost)
- 2 - flush on close (reduces latency by writing to a buffer instead of to disk, loses data on program crash)
Byte Journaler
1: java.nio.MappedByteBuffer -> java.nio.RandomAccessFile
- allocates off-heap memory storage and allows the OS to manage file writes
- system ownership of file writes ensures the data gets written even on a program crash
2: Java Chronicle (uses the above method but has been created as a library)
3: java.nio.DirectByteBuffer -> java.nio.RandomAccessFile

3: Parse message to object
Requirements:
- Transform incoming IQFeed Message(s) into object(s)

Byte parser to String(s)
1: new String(byte[] message)

String to Object parser
2: Reads in lines
- Loops through a line one at a time
- splits each line based on commas
- creates object


4: Serializer
Requirements:
- Transforms object to byte array
- Examples can be found here Mechanical Sympathy: Native C/C++ Like Performance For Java Object Serialisation

Java Object
1: Unsafe - Use the sun.misc.unsafe memory
2: DirectByteBuffer - Use the ByteBuffer.allocateDirect(int size)
3: ByteBuffer - Use the ByteBuffer.allocate(int size)

Reply With Quote
The following user says Thank You to dcooke888 for this post:
 
  #9 (permalink)
Elite Member
Boston, MA
 
Futures Experience: Beginner
Platform: IB
Favorite Futures: Stocks
 
Posts: 29 since Feb 2012
Thanks: 6 given, 21 received

IQFeed Streamer Component - Testing issues

I've written the three components of the system:
  • Journaler - Writes the incoming message to disk
  • Parser - Parses the message from IQFeed format to my programs generic event format
  • Serializer - Serializes the parsed event to bytes


My initial tests were focused on identifying a bottleneck in the system regarding latency. When testing the overall system I get the following performance (using 1 IQFeed L1 Message as an example):
Single Threaded Performance
OPS Per Second 301,077
Message Size(bytes): 130
AverageThroughput (KB/S): 39,140
50.0% took 3.0 s, 90.0% took 4.0 s, 99.0% took 12.0 s, 99.9% took 28.0 s, 99.99% took 783.0 s

Multi-threaded one producer -> three consumers diamond Performance
OPS Per Second 134,437
Message Size(bytes): 130
TotalThroughput (KB/S): 17,476
OneToThreeDiamondLatency:
50.0% took 43950.0 s, 90.0% took 126399.0 s, 99.0% took 179364.0 s, 99.9% took 327550.0 s, 99.99% took 509577.0 s, worst took 509586 s

Multi-threaded one producer -> three consumers pipeline Performance
OPS Per Second 109,663
Message Size(bytes): 130
TotalThroughput (KB/S): 14,256
OneToThreePipelineLatency:
50.0% took 61047.0 s, 90.0% took 134019.0 s, 99.0% took 188120.0 s, 99.9% took 341693.0 s, 99.99% took 344948.0 s

Its clear from the tests that the single threaded case dominates the multi-threaded case in throughput (100% more throughput) and latency(10,000X better latency). I'm confused as to why this is the case and where this latency is actually originating from...clearly more to explore.

When testing the individual components latency in single threaded mode here are the results:

Journaling Performance
ChronicleWriterLatency:
50.0% took 1.0 s, 90.0% took 1.0 s, 99.0% took 3.0 s, 99.9% took 14.0 s, 99.99% took 276.0 s, worst took 4125 s
BufferedWriterLatency:
50.0% took 1.0 s, 90.0% took 1.0 s, 99.0% took 61.0 s, 99.9% took 95.0 s, 99.99% took 634.0 s, worst took 122255 s

Parsing Performance
ParsingL1FeedLatency: 50.0% took 1.0 s, 90.0% took 1.0 s, 99.0% took 2.0 s, 99.9% took 4.0 s, 99.99% took 38.0 s, worst took 1742 s
ParsingL2FeedLatency: 50.0% took 1.0 s, 90.0% took 1.0 s, 99.0% took 1.0 s, 99.9% took 3.0 s, 99.99% took 15.0 s, worst took 1677 s

Serialization Performance
DirectBufferSerialization:
50.0% took 0.4 s, 90.0% took 0.4 s, 99.0% took 0.7 s, 99.9% took 1.3 s, 99.99% took 63.2 s, worst took 2676 s
UnsafeMemorySerialization:
50.0% took 0.3 s, 90.0% took 0.3 s, 99.0% took 0.5 s, 99.9% took 1.1 s, 99.99% took 72.0 s, worst took 1609 s

I'm guessing my error is in timing the multi-threaded case and that it is not actually 4 orders of magnitude lower in latency, since the throughput numbers are very close to each other. I'm currently using the disruptor framework and it would be useful if anyone had some recommendations on how to test latency using the disruptor. Also, I'd be glad to hear any common multi-threaded pitfalls in testing latency.

Reply With Quote
The following user says Thank You to dcooke888 for this post:
 
  #10 (permalink)
Elite Member
Boston, MA
 
Futures Experience: Beginner
Platform: IB
Favorite Futures: Stocks
 
Posts: 29 since Feb 2012
Thanks: 6 given, 21 received

More Timing


The latency seems to be consistent throughout the process - I need to look at testing this more correctly while using the disruptor Below is the timing from point to point within processing and an overview of the path:
+-> Parse Object -> Serialize -> +
In-> + ----------> Journaler ----------> + -> Publish

Results
OPS Per Second 119234
Message Size(bytes): 130
TotalThroughput (KB/S): 15500.4

Journaler
50.0% took 15020.0 s, 90.0% took 78629.0 s, 99.0% took 155205.0 s, 99.9% took 344668.0 s, 99.99% took 379129.0 s, worst took 399884 s

Parser
50.0% took 31833.0 s, 90.0% took 45121.0 s, 99.0% took 75483.0 s, 99.9% took 160908.0 s, 99.99% took 211112.0 s, worst took 213630 s

Serializer
50.0% took 49597.0 s, 90.0% took 64072.0 s, 99.0% took 107002.0 s, 99.9% took 262475.0 s, 99.99% took 287393.0 s, worst took 287791 s

OverallTime
50.0% took 60470.0 s, 90.0% took 103080.0 s, 99.0% took 220118.0 s, 99.9% took 386776.0 s, 99.99% took 400173.0 s, worst took 400175 s


Last edited by dcooke888; December 20th, 2013 at 12:50 PM.
Reply With Quote

Reply



futures io > > > > Multi-threaded Custom Financial Database

Thread Tools Search this Thread
Search this Thread:

Advanced Search



Upcoming Webinars and Events (4:30PM ET unless noted)

Jigsaw Trading: TBA

Elite only

FuturesTrader71: TBA

Elite only

NinjaTrader: TBA

Jan 18

RandBots: TBA

Jan 23

GFF Brokers & CME Group: Futures & Bitcoin

Elite only

Adam Grimes: TBA

Elite only

Ran Aroussi: TBA

Elite only
     

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom database with custom tickers. Is it possible ? enjoyaol AmiBroker 4 March 28th, 2014 01:33 PM
Multi-Trillion Bank Bailout Leads to Multi-Billion Bank Profit Bloomberg Finds Quick Summary News and Current Events 0 November 28th, 2011 02:00 AM
NT Multi Timeframe and multi indicator review eurostoxx NinjaTrader 3 August 29th, 2011 01:07 PM
Am looking for ELCollections.dll that has been modified for multi-threaded cpu's. sigmatrader EasyLanguage Programming 2 August 12th, 2011 06:02 PM
Database for NT bomberone1 NinjaTrader Programming 6 April 29th, 2011 10:11 AM


All times are GMT -4. The time now is 04:52 PM.

Copyright © 2017 by futures io, s.a., Av Ricardo J. Alfaro, Century Tower, Panama, +507 833-9432, info@futures.io
All information is for educational use only and is not investment advice.
There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
no new posts
Page generated 2017-12-13 in 0.18 seconds with 20 queries on phoenix via your IP 54.90.92.204