BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20230124T170804Z
LOCATION:C1-2-3
DTSTART;TZID=America/Chicago:20221116T083000
DTEND;TZID=America/Chicago:20221116T170000
UID:submissions.supercomputing.org_SC22_sess274_rpost177@linklings.com
SUMMARY:Parameterized Radix-r Bruck Algorithm for All-to-All Communication
DESCRIPTION:Posters, Research Posters\n\nParameterized Radix-r Bruck Algor
 ithm for All-to-All Communication\n\nFan, Kumar\n\nThe standard implementa
 tion of MPI_Alltoall uses a combination of techniques, including the sprea
 d-out and Bruck algorithms. The existing Bruck algorithm implementation is
  limited to a radix of two, so the total number of communication steps is 
 fixed at log2(P) (P: total number of processes). The spread-out algorithm,
  on the other hand, requires P-1 communication steps. There remains a wide
  unexplored parameter area between these two extremities of the communicat
 ion spectrum that can be tuned. In this paper, we formalize a generalized 
 formula and implementation of the Bruck algorithm, whose radix can be vari
 ed from 2 to P-1. With this ability, both the total number of communicatio
 n steps and the total amount of data transmitted can be tuned, which allow
 s performance tuning. We performed an experimental investigation and demon
 strated that the Bruck with the optimal radix is up to 57% faster than the
  vendor's optimized MPI_Alltoall on the Theta supercomputer.\n\nRegistrati
 on Category: Tech Program Reg Pass, Exhibits Reg Pass
END:VEVENT
END:VCALENDAR
