KX Community

Find answers, ask questions, and connect with our KX Community around the world.

Home Forums kdb+ Interaction between peach and other optimisations

  • Interaction between peach and other optimisations

    Posted by erichards on November 17, 2023 at 12:00 am

    I understand there are various parallel optimisations that happen under the hood when running with some number of secondary threads, e.g. summing across multiple partitions. How do these interact with peach?

     

    For example:

    disk0/hdb/par.txt –> disk1/hdb/partitions , disk2/hdb/partitions disk1/hdb/partitions/1-3-5 disk2/hdb/partitions/2-4-6

    If I ran a query such as

    select sum price by sym where int within (1;4)

    and I had two secondary threads available, thread #1 would retrieve data from partitions 1, 3 on disk 1, and thread #2 would retrieve data from partitions 2, 4 on disk 2 to maximise I/O throughput.

     

    But if my queries were wrapped in peach, would this still be possible, given peach would be using all available threads, e.g.

    {x[]} peach ( {select sum price by sym where int within (1;4)}; {select sum price by sym where int within (5;6)} )

     

    So are there situations when using peach can reduce performance? Thank you

    erichards replied 2 months, 1 week ago 2 Members · 4 Replies
  • 4 Replies
  • rocuinneagain

    Member
    November 17, 2023 at 12:00 am

    The parallelism can only go one layer deep.

    .i.ie These 2 statements end up executing the same path. In the first one the inner “peach“  can only run like an `each` as it is already in a thread:

    data:8#enlist til 1000000 ts {{neg x} peach x} peach data 553 1968 ts {{neg x} each x} peach data 551 1936

    For queries map-reduce still will be used to reduce the memory load of your nested queries even if run inside a “peach` even if not running the sub parts in parallel.

    https://code.kx.com/q4m3/14_Introduction_to_Kdb%2B/#1437-map-reduce

    Where you choose to put your `peach` can be important and change the performance of your execution.

    My example actually runs better without peach due to the overhead of passing data around versus `neg` being a simple operation

    ts {{neg x} each x} each data 348 91498576

    .Q.fc exists to help in these cases

    ts {.Q.fc[{neg x};x]} each data 19 67110432

    https://code.kx.com/q/ref/dotq/#fc-parallel-on-cut

    And in fact since `neg` has native multithreading and operates on vectors and vectors of vectors it is best of off left on it’s own:

    ts neg each data 5 67109216 
    ts neg data 5 67109104 
    neg data

    This example of course is extreme but does show that thought and optimisation can go in to each use-case on where to choose to iterate and place `peach“

  • erichards

    Member
    November 17, 2023 at 12:00 am

    I guess a more succint version of my question is “what happens to native parallelisations when running queries inside an instance of peach?”

  • erichards

    Member
    November 20, 2023 at 12:00 am

    Many thanks for the reply and examples.

     

    “in fact since `neg` has native multithreading and operates on vectors and vectors of vectors it is best of off left on it’s own”

    This is what I was keen to understand, and it’s useful to know that there are cases when you may be better off without peach.

  • rocuinneagain

    Member
    February 23, 2024 at 12:00 am

    kdb+ 4.1 has been released with some interesting improvements for peach which changes some of my answers as nesting is now supported

    https://code.kx.com/q//releases/ChangesIn4.1/#peachparallel-processing-enhancements

     

Log in to reply.