ÀÌÁ¦ ¿©·¯ºÐÀÇ Å¬·¯½ºÅÍ ½Ã½ºÅÛ¿¡ º¥Ä¡¸¶Å© ÇÁ·Î±×·¥À» ¼³Ä¡Çغ¸°í Á÷Á¢ ¼º´ÉÀ» üũ Çغ¸ÀÚ. ÀÌ ¹®¼¿¡¼´Â NetPIPE ¿Í SCALAPACK À» ¼³Ä¡, Å×½ºÆ® Çغ¼°ÍÀÌ´Ù.
CPU : AMD MP 1900+ 1.5Ghz 256KB Cash (SINGLE)
RAM : 2Giga PC 2100 ECC Memory
HDD : 10000 RPM SCSI
NIC : 3Com Gigabit Card, 3Com 10/100 Lancard
NetPIPE ´Â ¸Ó½Å°£¿¡ ȤÀº ¿¡´õ³Ý Ä«µå°£¿¡ ³×Æ®¿÷ ¼º´ÉÀ» ÃøÁ¤ÇÒ¼ö ÀÖ´Â º¥Ä¡¸¶Å© ÇÁ·Î±×·¥ ÀÌ´Ù. PVM °ú MPI µÑ´Ù Áö¿øÇÏ°í ÀÖÀ¸¸ç ¿©·¯ ±âÁ¾ÀÇ Çϵå¿þ¾î µµ Áö¿øÇÏ°í ÀÖ´Ù. ÀνºÅçÀ» Çغ¸ÀÚ. ´ÙÀ½ÀÇ url ¿¡¼ ¼Ò½º¸¦ ´Ù¿î¹Þ´Â´Ù. À̱ÛÀ» ¾²´Â ÇöÀç NetPIPE ÀÇ ÃֽŹöÀüÀº 3.3 ÀÌ´Ù. http://www.scl.ameslab.gov/netpipe/ Ŭ·¯½ºÅÍ°£ °øÀ¯µð·ºÅ丮 (ÇÊÀÚÀÇ È¯°æ¿¡¼´Â /home/share ) ¿¡¼ ÀÛ¾÷À» ÇÏ´Â °ÍÀÌ Á»´õ Æí¸®ÇÏ´Ù.
[micro@master share]$ tar xzf NetPIPE_3.3.tar.gz [micro@master share]$ cd NetPIPE_3.3 |
Makefile À» ÆíÁýÇϵµ·Ï ÇÏÀÚ. ´ëºÎºÐ ¼Õ´îºÎºÐÀº °ÅÀÇ ¾ø°í MPI ÀÇ ¼³Á¤ ºÎºÐ¸¸ ÆíÁýÇØÁÖ¸é µÈ´Ù. MPICC ÄÄÆÄÀÏ·¯ ¸¦ ÇØ´ç »ç¿ëÀÚÀÇ ½Ã½ºÅÛ ¿¡ ¸Â ´Â ÄÄÆÄÀÏ·¯¸¦ ÁöÁ¤ÇØÁÖ¸é µÈ´Ù. Áï LAM-MPI ·Î NetPIPE ¸¦ º¥Ä¡¸¶Å· ÇÒ°æ¿ì´Â LAM ÀÇ mpicc ¸¦ ÁöÁ¤ÇØÁÖ¸é µÇ°í MPICH ÀÇ ¼º´ÉÀ» Å×½ºÆ® Çغ¸·Á¸é MPICH ÀÇ mpicc ¸¦ ÁöÁ¤ÇØÁÖ¸é µÈ´Ù.
[micro@master NetPIPE_3.3]$ vi makefile # For MPI, mpicc will set up the proper include and library paths MPICC = /usr/local/mpich/bin/mpicc # ©¬ À̺κР¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦. MPI2CC = /usr/local/mpich/bin/mpicc # ©¬ À̺κР|
¼öÁ¤À» ÇßÀ¸¸é ÄÄÆÄÀÏÀ» Çغ¸µµ·Ï ÇÑ´Ù. ´Ü¼øÈ÷ tcp ÀÇ ¼º´ÉÀ» Å×½ºÆà ÇØ º¸±â À§Çؼ± make tcp ·Î ÄÄÆÄÀÏ ÇÏ¸é µÈ´Ù.
[micro@master NetPIPE_3.3]$ make tcp |
±âº»ÀûÀ¸·Î NetPIPE ÀÇ ÆÛÆ÷¸Õ½º ÃøÁ¤Àº ¾ç¹æÇâ ping-pong Å×½ºÆ® ÀÌ´Ù. Tcp ¼º´É º¥Ä¡¸¶Å· À» Çϴ°æ¿ì ÇÑÂÊ¿¡¼± receiver °¡ µÇ°í ´Ù¸¥ ÇÑÂÊ¿¡ ¼± sender ·Î ½ÇÇà½ÃÄÑ ÁÖ¸é µÈ´Ù.
[micro@master NetPIPE_3.3]$ ./NPtcp -r & [micro@master NetPIPE_3.3]$ rsh node01 [micro@node01 NetPIPE_3.3]$ ./NPtcp -t -h master Send and receive buffers are 512000 and 512000 bytes (A bug in Linux doubles the requested buffer sizes) Now starting the main loop 0: 1 bytes 500 times --> 0.10 Mbps in 78.87 usec 1: 2 bytes 1267 times --> 0.19 Mbps in 78.89 usec 2: 3 bytes 1267 times --> 0.29 Mbps in 78.84 usec 3: 4 bytes 845 times --> 0.39 Mbps in 79.14 usec 4: 6 bytes 947 times --> 0.58 Mbps in 79.00 usec ¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦ 123: 8388611 bytes 3 times --> 505.81 Mbps in 126529.34 usec |
ÇÊÀÚÀÇ È¯°æ¿¡¼´Â PEAK °á°ú°¡ ´ÙÀ½°ú °°ÀÌ ³ª¿Ô´Ù.
ÀÌÁ¦ MPI ¸¦ ÀÌ¿ëÇÑ ³×Æ®¿÷ ¼º´ÉÀ» ÃøÁ¤ÇØ º¸µµ·Ï ÇÏÀÚ. Âü°í·Î ÀÌ ¹®¼¿¡¼´Â lam-mpi ÀÇ mpi ¼º´ÉÀ» ÃøÁ¤ÇØ º¼°ÍÀÌ´Ù. ¾Õ¼ makefile ¿¡ mpicc ¸¦ ¼öÁ¤ÇØ ÁáÀ¸¸é ÄÄÆÄÀÏÀ» Çϵµ·Ï ÇÑ´Ù.
[micro@master NetPIPE_3.3]$ make mpi |
Npmpi ÇÁ·Î±×·¥ÀÌ ÄÄÆÄÀÏ µÇ¾úÀ» °ÍÀÌ´Ù. Lam-mpi ÀÇ mpirun À¸·Î ½ÇÇà ÇØ º¸µµ·Ï ÇÏÀÚ.
[micro@master NetPIPE_3.3]$ mpirun -O -np 2 ./Npmpi 0: master 1: node01 0: 1 bytes 500 times --> 0.09 Mbps in 81.29 usec 1: 2 bytes 1230 times --> 0.19 Mbps in 81.40 usec 2: 3 bytes 1228 times --> 0.28 Mbps in 81.38 usec 3: 4 bytes 819 times --> 0.38 Mbps in 81.16 usec ¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦ 123: 8388611 bytes 3 times --> 504.27 Mbps in 126917.33 usec |
lam-mpi ÀÇ mpi ³×Æ®¿÷ ÃÖ´ë ¼º´ÉÀº ´ÙÀ½°ú °°ÀÌ ³ª¿Ô´Ù.
¸¶Âù°¡Áö·Î mpich ÀÇ mpi ³×Æ®¿÷ ÆÛÆ÷¸Õ½ºµµ Çѹø Å×½ºÆ® Çغ¸±â ¹Ù¶õ´Ù.
SCALAPACK Àº ¼±Çü´ë¼ö ÀÇ Çظ¦ ±¸ÇÏ´Â ÆÐÅ°Áö·Î SCALAPACK ¿¡ ±âº»ÀûÀ¸·Î Æ÷ÇԵǾî ÀÖÀ¸¸ç ¸¹Àº ºÎºÐÀÌ ºÎµ¿¼Ò¼öÁ¡ ¿¬»êÀ¸·Î ±¸¼ºµÇ¾î ÀÖ´Ù. LINPACK º¥Ä¡¸¶Å© ¿¡¼ ÁßÁ¡ÀûÀ¸·Î »ç¿ëµÇ´Â ·çƾµéÀº Gauss ¼Ò°Å¹ýÀ» ÀÌ¿ëÇÑ N °³ÀÇ ¼±Çü¹æÁ¤½Ä ÀÇ Çظ¦ ±¸ÇÏ´Â °ÍÀ¸·Î BLAS (Basic Linear Algebra Subprograms) ¿¡ Æ÷ÇԵǾî ÀÖ´Ù. BLAS ´Â LINPACK º¥Ä¡¸¶Å© ¿¡¼ °¡Àå ±âº»ÀÌ µÇ´Â ¶óÀ̺귯¸® ·Î½á ±âº»ÀûÀÎ ¼±Çü´ë¼ö ¿¬»êÇÔ¼ö µéÀ» ±¸ÇöÇسõÀº ÁýÇÕÀÌ´Ù. ÀÌ°ÍÀº Fortran À¸·Î Â¥¿©Á® ÀÖÀ¸¸ç BLAS ¶óÀ̺귯¸® ³»ÀÇ °¢ ÇÔ¼öµéÀº ¿¬»êÀÚ¿Í ¿¬»ê°á°ú°¡ Vector ³Ä Matrix ³Ä ¿¡ µû¶ó °è»ê ·¹º§ÀÌ ³ª´µ¾î Áø´Ù. ÀÌ BLAS ¸¦ ÀÌ¿ëÇØ º¥Ä¡¸¶Å·À» ÇÒ¼öµµ ÀÖÁö¸¸ ATLAS (Automatically Tuned Linear Algebra Software) ¸¦ ÀÌ¿ëÇÏ¿© ÇØ´ç Ç÷§Æû¿¡ ÃÖÀûÈµÈ ·çƾ ¶óÀ̺귯¸® ¸¦ »ý¼ºÇÒ¼öµµ ÀÖ´Ù. ÀÌ ¹®¼¿¡¼´Â BLAS, BLACS, ATLAS, SCALAPACK À» ÀÌ¿ëÇÏ¿© º¥Ä¡¸¶Å·À» ÇÒ°ÍÀÌ´Ù. ¿©±â¼´Â ¼³Ä¡ ¹× ½ÇÇàÀÇ °£°áÇÑ Guide ¸¸À» Á¦½ÃÇÒ ¿¹Á¤À̸ç ÀÌ ¹®¼ ¿ÜÀÇ ±âŸ ÀÚ¼¼ÇÑ ³»¿ëÀº SCALAPACK ÀÇ È¨ÆäÀÌÁö http://www.netlib.org/scalapack/ À̳ª Çѱ¹Å¬·¯½ºÅÍ ±â¼ú¼¾ÅÍ(http://www.cluster.or.kr/board/read.php?table=benchmark) ÀÇ º¥Ä¡¸¶Å© Guide ¸¦ Âü°íÇϱ⠹ٶõ´Ù.
http://www.netlib.org/blas/blas.tgz ¿¡¼ BLAS ¸¦ ´Ù¿î ¹Þ¾Æ¼ ¼³Ä¡ÇÑ´Ù.
[micro@master share]$ mkdir BLAS [micro@master share]$ cd BLAS [micro@master share]$ tar xzf blas.tgz |
ÄÄÆÄÀÏ ÇÑ´Ù. Âü°í·Î ÇØ´ç ÇÁ·Î¼¼¼¿¡ ÃÖÀûÈµÈ ÄÄÆÄÀÏ·¯ ¸¦ »ç¿ëÇÏ¸é ¼º´ÉÇâ»óÀ» º¼¼ö ÀÖ´Ù (Intel ÀÇ pgcc ³ª Compaq ÀÇ ccc µîµî..)
[micro@master share]$ f77 -c *.f |
»ý¼ºµÈ ¿ÀºêÁ§Æ® ÆÄÀϵé (È®ÀåÀÚ°¡ *.o) À» ¶óÀ̺귯¸® ·Î ¸¸µç´Ù.
[micro@master share]$ ar cr blas_LINUX.a *.o |
BLACS (Basic Linear Algebra Communication Subprograms) ¼³Ä¡. BLACS ´Â ´Ù¾çÇÑ ºÐ»ê¸Þ¸ð¸® ȯ°æ¿¡¼ ÇÁ·Î¼¼¼°£ ¸Þ½ÃÁö Åë½ÅÀ» À§ ÇÑ ¼±Çü´ë¼ö ¶óÀ̺귯¸® ÀÌ´Ù. PVM °ú MPI ¿ëÀÌ µû·Î ÀÖÀ¸¹Ç·Î ÇÊ¿ä ÇÑ ÆÄÀÏÀ» ´Ù¿î¹Þ´Â´Ù. ¿©±â¼´Â MPI ¸¦ »ç¿ëÇϹǷΠhttp://www.netlib.org/blacs/ ¿¡¼ mpiblacs.tgz ¿Í blacstester.tgz ¸¦ ´Ù¿î ¹Þ´Â´Ù. ´ÙÀ½ mpiblacs.tgz ¸¦ ¾ÐÃàÀ» Ç®¸é BLACS µð·ºÅ丮°¡ »ý±ä´Ù.
[micro@master share]$ tar xzf mpiblacs.tgz [micro@master share]$ tar xzf blacstester.tgz BLACS/TESTING/* [micro@master share]$ cd BLACS |
BMAKES µð·ºÅ丮¿¡¼ ¸Ó½Å¿¡ ¸Â´Â Bmake ÆÄÀÏÀ» BLACS µð·ºÅ丮 ·Î º¹»çÇÑ´Ù.
[micro@master BLACS]$ cp BMAKES/Bmake.MPI-LINUX ./Bmake.inc |
Bmake.inc ÆÄÀÏÀ» ÆíÁýÇÑ´Ù. Bmake.inc ÆÄÀÏÀº 3°¡Áö Section À¸·Î Á¤ÀǵǾî ÀÖÀ¸¸ç °¢ ¼½¼ÇÀº ÄÄÆÄÀÏ °úÁ¤¿¡¼ ÇÊ¿äÇÑ ¿©·¯ ¸ÅÅ©·Î¸¦ Á¤ ÀÇ ÇÑ´Ù. ¼½¼Ç 1 ¿¡¼´Â ¶óÀ̺귯¸®¿Í ½ÇÇàÆÄÀÏÀÇ À§Ä¡¸¦ ÁöÁ¤ÇÏ°í make °á°ú·Î »ý¼ºµÇ´Â ÆÄÀÏÀÇ À̸§À» ÁöÁ¤ÇÒ ¶§ ÀÌ¿ëÇÏ´Â ¸ÅÅ©·Î¸¦ Á¤ÀÇÇÑ´Ù. Section 2 ¿¡¼´Â BLACS ¿¡¼ ÀÌ¿ëÇÏ´Â C Preprocessor °ª À» Á¤ÀÇÇÑ´Ù. Section 3 ¿¡¼± ÄÄÆÄÀÏ·¯¿Í ¸µÄ¿/·Î´õ ¸¦ ¼³Á¤ÇÏ´Â ¸ÅÅ©·Î ¸¦ Á¤ÀÇÇÑ´Ù.
[micro@master BLACS]$ vi Bmake.inc |
#============ SECTION 1: PATHS AND LIBRARIES ======================= SHELL = /bin/sh <- »ç¿ëÇÒ ½©ÀÇ Á¾·ù BTOPdir = $(HOME)/BLACS <- BLACS ÀÇ Top Level µð·ºÅ丮 COMMLIB = MPI <- »ç¿ëÇÒ communication ¶óÀ̺귯¸® CMMD, MPI, PVM, MPL, NX Áß Çϳª PLAT = LINUX <- Ç÷¿Æû BLACSdir = $(BTOPdir)/LIB <- BLACS ¶óÀ̺귯¸®ÀÇ À§Ä¡ BLACSDBGLVL = 1 <- µð¹ö±ë ·¹º§ (0 = NO, 1 = YES) BLACSFINIT = $(BLACSdir)/blacsF77init_$(COMMLIB)-$(PLAT)-$(BLACSDBGLVL).a BLACSCINIT = $(BLACSdir)/blacsCinit_$(COMMLIB)-$(PLAT)-$(BLACSDBGLVL).a BLACSLIB = $(BLACSdir)/blacs_$(COMMLIB)-$(PLAT)-$(BLACSDBGLVL).a <- ¶óÀ̺귯¸® À̸§µé MPIdir = /usr/local/mpich <- MPICH ÀÇ À§Ä¡ MPIdev = ch_p4mpd <- MPICH Device ÀÇ Á¾·ù MPIplat = LINUX MPILIBdir = $(MPIdir)/$(MPIdev)/lib <- MPICH ¶óÀ̺귯¸® À§Ä¡ MPIINCdir = $(MPIdir)/$(MPIdev)/include <- MPICH Çì´õ ÆÄÀÏ À§Ä¡ MPILIB = $(MPILIBdir)/libmpich.a <- MPICH ¶óÀ̺귯¸® ÆÄÀÏ BTLIBS = $(BLACSFINIT) $(BLACSLIB) $(BLACSFINIT) $(MPILIB) <- Å×½ºÆÿ¡ ÇÊ¿äÇÑ ¶óÀ̺귯¸® µé. INSTdir = $(BTOPdir)/INSTALL/EXE TESTdir = $(BTOPdir)/TESTING/EXE FTESTexe = $(TESTdir)/xFbtest_$(COMMLIB)-$(PLAT)-$(BLACSDBGLVL) CTESTexe = $(TESTdir)/xCbtest_$(COMMLIB)-$(PLAT)-$(BLACSDBGLVL) #================= End SECTION 1=============================== #============== SECTION 2: BLACS INTERNALS ======================== SYSINC = -I$(MPIINCdir) INTFACE = -Df77IsF2C <- Fortran77 ¿¡¼ C ·Î ÀÎÅÍÆäÀ̽º ¹æ¹ý Add_, NoChange, UpCase, ¶Ç´Â f77IsF2C È®½ÇÇÏÁö ¾ÊÀ» °æ¿ì INSTALL/EXE/xintface ÇÁ·Î±×·¥À» ¼öÇàÇØ º¼°Í SENDIS = <- -DSndIsLocBlk ·Î Á¤ÀÇÇϸé MPI_Send °¡ locally-blocking ·çƾÀ¸·Î ¼öÇàµÈ¾î ´õ È¿À²ÀûÀÌ´Ù. ºñ¿öµÑ °æ¿ì globally-blockingÀ¸·Î °¡Á¤ÇÑ´Ù. BUF = TRANSCOMM = -DuseMpich <- ¿©±â¿¡ Á¤ÀÇ µÈ ÆĶó¹ÌÅÍ´Â Ç÷§Æû ¸¶´Ù Ʋ¸®´Ù. ÁÖ¼®¿¡¼´Â BLACS/INSTALL/xtc_CsameF77 °ú BLACS/INSTALL/xtc_UseMpich ¸¦ ½ÇÇàÇϵµ·Ï Áö½ÃÇÏ°í ÀÖ´Ù. xtc_CsameF77 °ú xtc_UseMpich ¸¦ »ý¼ºÇÏ´Â ¹æ¹ýÀº ´ÙÀ½°ú °°´Ù. $ BLACS/INSTALL/make xtc_CsameF77 $ BLACS/INSTALL/make xtc_UseMpich ÀÌ ÆÄÀÏÀ» ½ÇÇà½ÃÅ°¸é ¼³Á¤ÇÒ °ªÀÌ Ãâ·ÂµÈ´Ù. $ BLACS/INSTALL/EXE/mpirun -np 2 xtc_CsameF77 ............. Set TRANSCOMM = -DUseMpich $ BLACS/INSTALL/EXE/xtc_UseMpich Set TRANSCOMM = -DuseMpich WHATMPI = SYSERRORS = DEBUGLVL = -DBlacsDebugLvl=$(BLACSDBGLVL) DEFS1 = -DSYSINC $(SYSINC) $(INTFACE) $(DEFBSTOP) $(DEFCOMBTOP) $(DEBUGLVL) BLACSDEFS = $(DEFS1) $(SENDIS) $(BUFF) $(TRANSCOMM) $(WHATMPI) $(SYSERRORS) #================= End SECTION 2=============================== #================= SECTION 3: COMPILERS ============================ F77 = f77 <- fortran ÄÄÆÄÀÏ·¯ #F77NO_OPTFLAGS = -Nx400 F77FLAGS = $(F77NO_OPTFLAGS) -O F77LOADER = $(F77) F77LOADFLAGS = CC = gcc <- C ÄÄÆÄÀÏ·¯ CCFLAGS = -O4 CCLOADER = $(CC) CCLOADFLAGS = ARCH = ar ARCHFLAGS = r RANLIB = ranlib #================= End SECTION 3 =============================== |
ÄÄÆÄÀÏ ÇÑ´Ù
[micro@master BLACS]$ make mpi |
LIB/blacs_MPI-LINUX-1.a ÈÀÏÀÌ »ý¼ºµÇ¾î¾ß ÇÑ´Ù. SRC/ µð·ºÅ丮¿¡´Â »ç¿ëÀÚ°¡ È£ÃâÇÒ ¼ö ÀÖ´Â ·çƾµéÀÌ µé¾î ÀÖ°í, ¸ðµÎ C¿Í Fortran77 ÀÎÅÍÆäÀ̽º¸¦ °¡Áö°í ÀÖ´Ù. ¸ðµç non-communication ·çƾµéÀº blacs_ ¶ó´Â Á¢µÎ¾î·Î ½ÃÀ۵ȴÙ. BLACS ³»ºÎ ·çƾ°ú Àü¿ª º¯¼ö(global variables)µéÀº ¸ðµÎ BI_ ¶ó´Â Á¢µÎ¾î¸¦ °¡Áö°í ÀÖ´Ù.
ATLAS (Automatically Tuned Linear Algebra Software) ¼³Ä¡. ATLAS Ȩ (http://math-atlas.sourceforge.net) ¿¡¼ °ü·Ã ÆÄÀÏÀ» ´Ù¿î ¹Þ¾Æ ¼³Ä¡¸¦ ÇÑ´Ù. À̱ÛÀ» ¾²´Â ½ÃÁ¡¿¡¼ÀÇ ATLAS ÃֽŹöÀüÀº 3.5.2 ÀÌ´Ù.
[micro@master share]$ tar xzf atlas3.5.2.tar.gz [micro@master share]$ cd ATLAS [micro@master ATLAS]$ make config CC=gcc #CC ¸¦ ÁöÁ¤ÇÏÁö ¾ÊÀ¸¸é gcc °¡ »ç¿ëµÈ´Ù. [micro@master ATLAS]$ make config gcc -o xconfig config.c ./xconfig ATLAS configure started. 160 159 ¡¦¡¦ 001 Enter number at top left of screen [0]: 160 # ȸ鿡 º¸ÀÌ´Â °¡Àå Å« ¼ö¸¦ Àû´Â´Ù. ==================================================================== IMPORTANT ==================================================================== Before going any further, check http://math-atlas.sourceforge.net/errata.html. This is the ATLAS errata file, which keeps a running count of all known ATLAS bugs and system problems, with associated workarounds or fixes. IF YOU DO NOT CHECK THIS FILE, YOU MAY BE COMPILING A LIBRARY WITH KNOWN BUGS. Have you scoped the errata file? [y]: y # errata ¹®¼¸¦ Àоµµ·Ï ÇÑ´Ù. Configure will ask a series of questions, in one of two forms. The first form of question is a menu of choices. One option in almost all menus is 'Other/UNKNOWN'. If you are unsure of the answer, always choose this option. ¡¦¡¦¡¦¡¦..(»ý·«)¡¦¡¦¡¦. Are you ready to continue? [y]: y I need to know if you are using a cross-compiler (i.e., you are compiling on a different architecture than you want the library built for). Are you using a cross-compiler? [n]: n Probing to make operating system determination: Operating system configured as Linux # ¸Â´ÂÁö È®ÀÎ Probing for architecture: Architecture is set to ATHLON # ¸Â´ÂÁö È®ÀÎ Probing for supported ISA extensions: make[2]: *** [atlas_run] Error 132 make[1]: *** [IRun_SSE1] Error 2 SSE2: NO. SSE1: DETECTED! Number of CPUs: 1 Required cache flush detected as : 524288 bytes Looking for compilers: F77 = /usr/bin/g77 -funroll-all-loops -O3 CC = /usr/bin/gcc -fomit-frame-pointer -O3 -funroll-all-loops MCC = /usr/bin/gcc -fomit-frame-pointer -O Looking for BLAS (this may take a while): Unable to find usable BLAS, BLASlib left blank. FINDING tar, gzip, AND gunzip tar : /bin/tar gzip : /bin/gzip gunzip : /bin/gunzip ATLAS has default parameters for OS='Linux' and system='ATHLON'. If you want to just trust these default values, you can use express setup, drastically reducing the amount of questions you are required to answer use express setup? [y]: y ¡¦¡¦¡¦¡¦¡¦ Enter Architecture name (ARCH) [Linux_ATHLONSSE1]: Enter [micro@master ATLAS]$ make install arch=< arch> |
arch ´Â ¾ÆÅ°ÅØÃÄ À̸§ÀÌ´Ù. config °úÁ¤¿¡¼ ¸¶Áö¸·¿¡ Ãâ·ÂµÈ´Ù. À§ÀÇ config °úÁ¤¿¡¼ Ãâ·ÂµÈ ´ë·Î make install arch=Linux_ATHLONSSE1 ¸¦ ÀÔ·ÂÇÑ´Ù.
[micro@master ATLAS]$ make install arch=Linux_ATHLONSSE1 ......... (»ý·« 1½Ã°£ ÀÌ»ó ¼Ò¿ä) ATLAS install complete. Examine ATLAS/bin//INSTALL_LOG/SUMMARY.LOG for details. |
´ÙÀ½Àº SCALAPACK À» ¼³Ä¡ÇÑ´Ù. MPICH, BLAS, BLACS °¡ ¼³Ä¡µÇ¾î ÀÖ¾î¾ß ÇÑ´Ù. http://www.netlib.org/scalapack/ ¿¡¼ ÃֽŹöÀüÀÇ scalapack À» ´Ù¿î¹ÞÀºÈÄ ¾ÐÃàÀ» Ç®¸é SCALAPACK µð·ºÅ丮°¡ »ý±ä´Ù.
[micro@master share]$ tar xzf scalapack.tgz |
Slmake.inc ÆÄÀÏÀ» ÆíÁýÇÑ´Ù. ÀÌ ÆÄÀÏÀº ¸ðµç Makefile ¿¡¼ include µÇ¾î »ç¿ë µÇ¸ç ¼³Ä¡¿¡ ÇÊ¿äÇÑ ¸ÅÅ©·Î¸¦ Á¤ÀÇÇÑ´Ù. INSTALL µð·ºÅ丮¸¦ ÂüÁ¶ ÇÏ¿© Àڽſ¡°Ô ¸Â´Â Slmake.inc ÆÄÀÏÀ» º¹»çÇÏ¿© ÆíÁýÇÑ´Ù.
[micro@master share]$ cd SCALAPACK [micro@master SCALAPACK]$ cp INSTALL/SLmake.LINUX ./SLmake.inc [micro@master SCALAPACK]$ vi SLmake.inc |
´ëºÎºÐÀÇ °ªµéÀº ±âº»°ªÀ» ±×´ë·Î ÀÌ¿ëÇÏ°í, ¾Õ¼ ATLAS ¸¦ ÀÌ¿ë Ç÷§Æû ¿¡ ÃÖÀûÈµÈ BLAS ¸¦ »ý¼ºÇßÀ¸´Ï °ü·Ã¼³Á¤À» ¸ÂÃß¾î ÁÖµµ·Ï ÇÑ´Ù.
############################################################################ # # Program: ScaLAPACK # # Module: SLmake.inc # # Purpose: Top-level Definitions # # Creation date: February 15, 2000 # # Modified: # # Send bug reports, comments or suggestions to scalapack@cs.utk.edu # ############################################################################ # SHELL = /bin/sh # # The complete path to the top level of ScaLAPACK directory, usually # $(HOME)/SCALAPACK # home = $(HOME)/SCALAPACK # # The platform identifier to suffix to the end of library names # PLAT = LINUX # # BLACS setup. All version need the debug level (0 or 1), # and the directory where the BLACS libraries are # BLACSDBGLVL = 1 BLACSdir = $(HOME)/BLACS/LIB # # MPI setup; tailor to your system if using MPIBLACS # Will need to comment out these 6 lines if using PVM # USEMPI = -DUsingMpiBlacs #SMPLIB = /usr/lib/mpi/build/LINUX/ch_p4/lib/libmpich.a SMPLIB = /usr/local/mpich/lib/libmpich.a BLACSFINIT = $(BLACSdir)/blacsF77init_MPI-$(PLAT)-$(BLACSDBGLVL).a BLACSCINIT = $(BLACSdir)/blacsCinit_MPI-$(PLAT)-$(BLACSDBGLVL).a BLACSLIB = $(BLACSdir)/blacs_MPI-$(PLAT)-$(BLACSDBGLVL).a TESTINGdir = $(home)/TESTING # # PVMBLACS setup, uncomment next 6 lines if using PVM # #USEMPI = #SMPLIB = $(PVM_ROOT)/lib/$(PLAT)/libpvm3.a #BLACSFINIT = #BLACSCINIT = #BLACSLIB = $(BLACSdir)/blacs_PVM-$(PLAT)-$(BLACSDBGLVL).a #TESTINGdir = $(HOME)/pvm3/bin/$(PLAT) CBLACSLIB = $(BLACSCINIT) $(BLACSLIB) $(BLACSCINIT) FBLACSLIB = $(BLACSFINIT) $(BLACSLIB) $(BLACSFINIT) # # The directories to find the various pieces of ScaLapack # PBLASdir = $(home)/PBLAS SRCdir = $(home)/SRC TESTdir = $(home)/TESTING PBLASTSTdir = $(TESTINGdir) TOOLSdir = $(home)/TOOLS REDISTdir = $(home)/REDIST REDISTTSTdir = $(TESTINGdir) # # The fortran and C compilers, loaders, and their flags # F77 = /usr/local/mpich/bin/mpif77 CC = /usr/local/mpich/bin/mpicc NOOPT = F77FLAGS = -funroll-all-loops -O3 $(NOOPT) DRVOPTS = $(F77FLAGS) CCFLAGS = -O4 SRCFLAG = #F77LOADER = $(F77) F77LOADER = $(F77) CCLOADER = $(CC) F77LOADFLAGS = CCLOADFLAGS = # # C preprocessor defs for compilation # (-DNoChange, -DAdd_, -DUpCase, or -Df77IsF2C) # CDEFS = -Df77IsF2C -DNO_IEEE $(USEMPI) # # The archiver and the flag(s) to use when building archive (library) # Also the ranlib routine. If your system has no ranlib, set RANLIB = echo # ARCH = ar ARCHFLAGS = cr RANLIB = ranlib # # The name of the libraries to be created/linked to # SCALAPACKLIB = $(home)/libscalapack.a #BLASLIB = $(HOME)/BLAS/blas_LINUX.a # ATLAS ÀÇ BLAS ¶óÀ̺귯¸®¸¦ ÁöÁ¤ÇØ ÁØ´Ù. BLASLIB = -L$(HOME)/ATLAS/lib/Linux_ATHLONSSE1 -lf77blas -latlas # PBLIBS = $(SCALAPACKLIB) $(FBLACSLIB) $(BLASLIB) $(SMPLIB) PRLIBS = $(SCALAPACKLIB) $(CBLACSLIB) $(SMPLIB) RLIBS = $(SCALAPACKLIB) $(FBLACSLIB) $(CBLACSLIB) $(BLASLIB) $(SMPLIB) LIBS = $(PBLIBS) ############################################################################ |
ÄÄÆÄÀÏ ÇÑ´Ù. ÄÄÆÄÀÏ °úÁ¤¿¡¼ ¿¡·¯°¡ »ý±â¸é SLmake.inc ÆÄÀÏÀ» ¼öÁ¤ÇÏ °í ´Ù½Ã ÄÄÆÄÀÏ ÇÑ´Ù.
[micro@master SCALAPACK]$ make lib |
SCALAPACK µð·ºÅ丮 ¾Æ·¡¿¡ libscalapack.a ÆÄÀÏÀÌ ¸¸µé¾î Áø´Ù. ¿©±â ±îÁö ÀÌ»óÀÌ ¾ø´Ù¸é °£´ÜÇÑ Å×½ºÆ® ÇÁ·Î±×·¥À» µ¹·Áº¸ÀÚ.
[micro@master SCALAPACK]$ cd TESTING [micro@master TESTING]$ cd LIN [micro@master LIN]$ make double [micro@master LIN]$ cd .. [micro@master TESTING]$ /usr/local/mpich/bin/mpirun -np [ÇÁ·Î¼¼¼ °³¼ö] ./xdlu ScaLAPACK Ax=b by LU factorization. 'MPI Machine' Tests of the parallel real double precision LU factorization and solve. The following scaled residual checks will be computed: Solve residual = ||Ax - b|| / (||x|| * ||A|| * eps * N) Factorization residual = ||A - LU|| / (||A|| * eps * N) The matrix A is randomly generated for each test. An explanation of the input/output parameters follows: TIME : Indicates whether WALL or CPU time was used. M : The number of rows in the matrix A. N : The number of columns in the matrix A. NB : The size of the square blocks the matrix A is split into. NRHS : The total number of RHS to solve for. NBRHS : The number of RHS to be put on a column of processes before going on to the next column of processes. P : The number of process rows. Q : The number of process columns. THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED LU time : Time in seconds to factor the matrix Sol Time: Time in seconds to solve the system. MFLOPS : Rate of execution for factor and solve. The following parameter values will be used: M : 10000 N : 10000 NB : 36 NRHS : 3 NBRHS : 3 P : 1 Q : 7 Relative machine precision (eps) is taken to be 0.111022E-15 Routines pass computational tests if scaled residual is less than 1.0000 TIME M N NB NRHS NBRHS P Q LU Time Sol Time MFLOPS CHECK ---- ----- ----- --- ---- ----- ---- ---- -------- -------- -------- ------ WALL 10000 10000 36 3 3 1 7 100.52 0.46 6607.53 PASSED Finished 1 tests, with the following results: 1 tests completed and passed residual checks. 0 tests completed and failed residual checks. 0 tests skipped because of illegal input values. END OF TESTS. |
À§¿Í ºñ½ÁÇÑ °á°ú°¡ ³ª¿Í¾ß ÇÑ´Ù. ´ÙÀ½Àº SCALAPACKÀÌ ¼³Ä¡µÇ¾ú´Ù°í °¡Á¤ÇÏ°í, ´ÜÀÏ ³ëµå¿¡¼ LINPACK Benchmark¸¦ ¼öÇàÇÏ´Â ¹æ¹ýÀ» ¼³¸íÇÑ´Ù. ATLAS ÃÖÀûÈ ·çƾÀ» ÀÌ¿ëÇÏ·Á¸é SLmake.inc¿¡¼ ATLAS¿¡¼ Á¦°øÇÏ´Â BLAS ·çƾÀ» ÀÌ¿ëÇϵµ·Ï ¼³Á¤ÇÑ´Ù.
[micro@master SCALAPACK]$ vi SLmake.inc ......(»ý·«) BLASLIB = -L$(HOME)/ATLAS/lib/Linux_ATHLONSSE1 -lf77blas -latlas ......(»ý·«) |
SCALAPACK/TESTING µð·ºÅ丮¿¡´Â ±âº»ÀûÀ¸·Î 13°³ÀÇ .dat ÈÀÏ°ú LIN, EIG µð·ºÅ丮°¡ ¼³Ä¡µÈ´Ù. LIN, EIG µð·ºÅ丮¿¡´Â º¥Ä¡¸¶Å©¸¦ ¼öÇàÇÏ´Â FortranÀ¸·Î ÀÛ¼ºµÈ ¼Ò½ºÄÚµå¿Í MakefileÀÌ µé¾î ÀÖ´Ù. LIN µð·ºÅ丮´Â Linear Equations Testing À» À§ÇÑ ÇÁ·Î±×·¥µéÀÌ µé¾î ÀÖ´Ù. EIG µð·ºÅ丮¿¡´Â Eigenroutine TestingÀ» À§ÇÑ ÇÁ·Î±×·¥ÀÌ µé¾î ÀÖ´Ù. TESTING µð·ºÅ丮¿¡ Æ÷ÇÔµÈ .dat ÈÀÏ°ú ¿ëµµ´Â ´ÙÀ½°ú °°´Ù.
BLLT.dat 'ScaLAPACK, Version 1.2, banded linear systems input file' BLU.dat 'ScaLAPACK, Version 1.2, banded linear systems input file' BRD.dat 'ScaLAPACK BRD input file' HRD.dat 'ScaLAPACK HRD input file' INV.dat 'ScaLAPACK, Version 1.0, Matrix Inversion Testing input file' LLT.dat 'ScaLAPACK, LLt factorization input file' LS.dat 'ScaLAPACK LS solve input file' LU.dat 'SCALAPACK, LU factorization input file' NEP.dat 'SCALAPACK NEP (Nonsymmetric Eigenvalue Problem) input file' QR.dat 'ScaLAPACK, Orthogonal factorizations input file' SEP.dat 'ScaLAPACK Symmetric Eigensolver Test File' SVD.dat 'ScaLAPACK Singular Value Decomposition input file' TRD.dat 'ScaLAPACK TRD computation input file' |
.dat ÆÄÀÏ¿¡´Â °¢°¢ÀÇ º¥Ä¡¸¶Å©¿¡ ÇÊ¿äÇÑ º¯¼öµéÀÌ ÀúÀåµÈ´Ù. Å×½ºÆ® ¿¡ ÇÊ¿äÇÑ LU.dat ÆÄÀÏÀ» »ìÆ캸ÀÚ.
[micro@master TESTING]$ vi LU.dat -- LU.dat -- 'SCALAPACK, LU factorization input file' 'MPI Machine' 'LU.out' output file name (if any) 6 device out 4 number of problems sizes 4 10 17 13 23 31 57 values of M 4 12 13 13 23 31 50 values of N 3 number of NB's 2 3 4 5 values of NB 3 number of NRHS's 1 3 9 28 values of NRHS 3 Number of NBRHS's 1 3 5 7 values of NBRHS 4 number of process grids (ordered pairs of P & Q) 1 2 1 4 2 3 8 values of P 1 2 4 1 3 2 1 values of Q 1.0 threshold T (T or F) Test Cond. Est. and Iter. Ref. Routines -- LU.dat -- |
´ë·«ÀûÀÎ Çü½ÄÀº °¢ º¥Ä¡¸¶Å©¿¡ ÇÊ¿äÇÑ °ªµé°ú ÇØ´ç °ªµéÀÇ ¼ýÀÚ·Î Á¤ÀÇ µÇ¾î ÀÖ´Ù. ¿¹¸¦ µé¾î number of problems sizes °¡ 4 À̹ǷΠM °ú N ÀÇ º¥Ä¡¸¶Å© Å×½ºÆà ¿¡ »ç¿ëµÉ °ªÀº M = 4,10,17,13 ÀÌ°í N = 4,12,13,23 ÀÌ´Ù. ±×¸®°í values of P,Q ´Â ÇÁ·Î¼¼¼ÀÇ Grid ¸¦ ¶æÇÑ´Ù. P ´Â ÇÁ·Î¼¼¼ÀÇ row ¸¦ ¶æÇϸç Q ´Â column À» ¶æÇÑ´Ù. À§ÀÇ °æ¿ì P = 1 , Q = 1 À̸é 1 X 1 = 1 ÀÌ´Ï 1 °³ÀÇ ÇÁ·Î¼¼¼ ¿¡¼ ½ÇÇàµÈ´Ù´Â °ÍÀ» ¶æÇÑ´Ù. µÎ¹ø° ÀÇ °æ¿ìÀÎ P = 2 , Q = 2 ÀÇ °æ¿ì 2 X 2 = 4 ÀÌ´Ï 4 °³ÀÇ ÇÁ·Î¼¼¼(Node) ¿¡¼ ½ÇÇàµÈ´Ù´Â °ÍÀ» ¶æÇÑ´Ù. ÇØ´ç º¥Ä¡¸¶Å· ÇÁ·Î±×·¥À» »ý¼ºÇÏ·Á¸é LIN À̳ª EIG µð·ºÅ丮 ¿¡¼ make [type] À» ½ÇÇàÇÏ¸é µÈ´Ù. ÇØ´ç type Àº single, double, complex, complex16 4 °¡Áö °¡ ÀÖ´Ù. ¿¹¸¦ µé¾î make single Àº single precision floating point ¸¦ À§ÇÑ º¥Ä¡¸¶Å· ÇÁ·Î±×·¥ÀÌ »ý¼ºµÈ´Ù. make all Àº 4°¡Áö Çü¿¡ ´ëÇÑ ¸ðµç º¥Ä¡¸¶Å· ÇÁ·Î±×·¥À» ÇѲ¨¹ø¿¡ ÄÄÆÄÀÏ ÇÑ´Ù. ÇØ´ç type ¿¡ ´ëÇÑ ÆÄÀϸí°ú °³¼ö´Â ´ÙÀ½°ú °°´Ù.
[micro@master LIN]$ make single [micro@master LIN]$ ls ../xs* xsdblu* xsdtlu* xsgblu* xsinv* xsllt* xsls* xslu* xspbllt* xsptllt* xsqr* [micro@master LIN]$ make double [micro@master LIN]$ ls ../xd* xddblu* xddtlu* xdgblu* xdinv* xdllt* xdls* xdlu* xdpbllt* xdptllt* xdqr* [micro@master LIN]$ make complex [micro@master LIN]$ ls ../xc* xcdblu* xcdtlu* xcgblu* xcinv* xcllt* xcls* xclu* xcpbllt* xcptllt* xcqr* [micro@master LIN]$ make complex16 [micro@master LIN]$ ls ../xz* xzdblu* xzdtlu* xzgblu* xzinv* xzllt* xzls* xzlu* xzpbllt* xzptllt* xzqr* |
ÀÌ¿Í °°ÀÌ ÃÑ 40°³ÀÇ ½ÇÇàÆÄÀÏ µéÀÌ »ý¼ºµÇ´Âµ¥ ¸¶Âù°¡Áö·Î EIG µð·ºÅ丮 ¿¡¼µµ °°Àº Çü½ÄÀ¸·Î make ¸¦ Çϸé ÇÁ·Î±×·¥ÀÌ »ý¼ºµÈ´Ù. °¢°¢ÀÇ ÇÁ·Î±× ·¥À» ½ÇÇàÇÏ¿© º¸ÀÚ. MPI ¸¦ ÀÌ¿ëÇÏ¿© ½ÇÇàÇÏ·Á¸é mpirun -np N program À» ½ÇÇàÇÏ¸é µÈ´Ù. LINPACK ÀÇ º¥Ä¡¸¶Å© ¿¡¼´Â LU.dat ÆÄÀÏÀÇ ÆĶó¸ÞÅÍ ¼³Á¤À» ÀÌ¿ëÇÏ¿© º¥Ä¡¸¶Å© ÇÒ¼ö ÀÖ´Ù. ±âº»ÀûÀ¸·Î ScaLAPACKÀº ºí·° ´ÜÀ§·Î ¿¬»êÀ» ¼öÇàÇϸç, Ŭ·¯½ºÅÍ¿Í °°Àº º´·Ä ÄÄÇ»ÅÍ¿¡¼ ÃÖ´ë ¼º´ÉÀ» ¾ò±â À§Çؼ´Â ÁÖ¾îÁø ÄÄÇ»ÅÍ¿¡ ÀûÀýÇÑ ºí·ÏÀÇ Å©±â¸¦ ±¸ÇÏ¿©¾ß ÇÑ´Ù. ÀÌ´Â °è»êÀ¸·Î ´ë·«ÀûÀÎ °ªÀ» ±¸ÇÑ ÈÄ¿¡, ¸¹Àº ½ÇÇàÀ» °ÅÃÄ °æÇèÀûÀ¸·Î ¾òÀ» ¼ö ÀÖ´Ù. ±×¸®°í ÄÄÇ»ÅÍ°¡ ¼öÇàÇÒ ¼ö ÀÖ´Â ÃÖ´ë ¹®Á¦ Å©±â(Nmax)¸¦ ¾ò±â À§Çؼ´Â, ÇϳªÀÇ ÇÁ·Î¼¼¼¿¡¼ ¹®Á¦ Å©±â¸¦ Á¡Â÷·Î ´Ã¸®¸é¼ ÁÖ¾îÁø ¸Þ¸ð¸®¿¡ ³ëµå°¡ Æ÷¿ëÇÒ ¼ö ÀÖ´Â ÃÖ´ë Å©±â¸¦ ¾Ë¾Æ³»¾ß ÇÑ´Ù. ¸¶Áö¸·À¸·Î À̸¦ ¹ÙÅÁÀ¸·Î ¸¹Àº ³ëµå¸¦ °¡Áø º´·Ä ÄÄÇ»ÅÍ¿¡¼ ¼öÇàÇÒ ¼ö ÀÖ´Â ÃÖ´ëÀÇ Å©±â·Î LU ÀμöºÐÇØ ·çÆÃÀ» ¼öÇà½ÃÄѼ ÃÖ´ëÀÇ ¼º´É(Rmax)À» ¾òÀ» ¼ö ÀÖ´Ù.
Fortran À¸·Î Â¥¿©Áø LIN/pdludriver.f ÆÄÀÏ¿¡¼ TOTMEM ÀÇ °ªÀ» º¯È½ÃÅ°¸é¼ Segmentation fault °¡ ¹ß»ýÇÏ´Â ¹üÀ§¸¦ Á¶»çÇØ º¸ÀÚ. ÀÌ°ÍÀº ÁÖ ¸Þ¸ð¸® »Ó¸¸ ¾Æ´Ï¶ó ½º¿Ò¿µ¿ªÀÇ Å©±â¿¡ ¿µÇâÀ» ¹Þ´Â´Ù. ½º¿Ò ¿µ¿ªÀÇ Å©±âº¸´Ù TOTMEM °ªÀ» Å©°Ô Çϸé Segmentation fault ¸¦ ÀÏÀ¸Å³ °ÍÀÌ´Ù. ¹°·Ð ¼Ò½º¸¦ ¼öÁ¤ÇÏ°í ³ª¸é ÄÄÆÄÀÏÀ» ´Ù½Ã ÇØ¾ß ÇÑ´Ù. 2GB ¸Þ¸ð¸®, 500MB ½º¿Ò¿µ¿ªÀÇ ÁÖ¾îÁø Á¶°Ç¿¡¼ TOTMEM À» 500000000 ·Î Á¤ÇÏ¿´´Ù. TESTING µð·ºÅ丮ÀÇ LU.dat À» ´ÙÀ½°ú °°ÀÌ ¼öÁ¤ÇÏ°í xdlu ¸¦ ½ÇÇà½ÃÄÑ º¸ÀÚ. °è»ê¿¡ ÇÊ¿äÇÑ ¸Þ¸ð¸®¾çÀÌ ¸ÞÀÎ ¸Þ¸ð¸®ÀÇ Å©±âº¸´Ù Å©¸é ½º¿Ò¿µ¿ªÀÇ ¾ï¼¼½º¸¦ À§Çؼ Çϵåµð½ºÅ©°¡ µ¿ÀÛÇÏ´Â °ÍÀ» º¼¼ö ÀÖÀ»°ÍÀÌ´Ù.
-- LU.dat -- 'SCALAPACK, LU factorization input file' 'MPI Machine' 'LU.out' output file name (if any) 6 device out 6 number of problems sizes 1000 1200 1400 1600 1800 2000 values of M 1000 1200 1400 1600 1800 2000 values of N 1 number of NB's 60 values of NB 1 number of NRHS's 1 values of NRHS 1 Number of NBRHS's 1 values of NBRHS 1 number of process grids (ordered pairs of P & Q) 1 values of P 1 values of Q 1.0 threshold T (T or F) Test Cond. Est. and Iter. Ref. Routines -- LU.dat -- |
ÀÌ°ÍÀ» 2¹ø ½ÇÇàÇÑ °á°ú´Â ´ÙÀ½°ú °°´Ù. ù¹ø° ½ÇÇàÇÑ °á°ú
TIME M N NB NRHS NBRHS P Q LU Time Sol Time MFLOPS CHECK ---- ----- ----- --- ---- ----- ---- ---- -------- -------- -------- ------ WALL 1000 1000 60 1 1 1 1 0.64 0.01 1026.42 PASSED WALL 1200 1200 60 1 1 1 1 1.05 0.02 1078.62 PASSED WALL 1400 1400 60 1 1 1 1 1.67 0.02 1083.34 PASSED WALL 1600 1600 60 1 1 1 1 2.29 0.03 1177.45 PASSED WALL 1800 1800 60 1 1 1 1 3.13 0.04 1227.84 PASSED WALL 2000 2000 60 1 1 1 1 4.37 0.05 1207.76 PASSED |
µÎ¹ø° ½ÇÇàÇÑ °á°ú
TIME M N NB NRHS NBRHS P Q LU Time Sol Time MFLOPS CHECK ---- ----- ----- --- ---- ----- ---- ---- -------- -------- -------- ------ WALL 1000 1000 60 1 1 1 1 0.63 0.01 1032.06 PASSED WALL 1200 1200 60 1 1 1 1 1.05 0.02 1079.69 PASSED WALL 1400 1400 60 1 1 1 1 1.59 0.02 1134.49 PASSED WALL 1600 1600 60 1 1 1 1 2.28 0.03 1184.27 PASSED WALL 1800 1800 60 1 1 1 1 3.12 0.04 1231.93 PASSED WALL 2000 2000 60 1 1 1 1 4.37 0.05 1207.30 PASSED |
¹®Á¦ÀÇ Å©±â°¡ Ä¿Áú¼ö·Ï MFLOPS °¡ Áõ°¡ÇÏ´Ù°¡ SWAP À» »ç¿ëÇÒ Á¤µµ°¡ µÇ¸é ¼º´ÉÀÌ ¶³¾îÁö´Â °ÍÀ» ¾Ë¼ö ÀÖ´Ù. ´ÙÀ½Àº NB¸¦ ¹Ù²Ù¸é¼ ¼öÇàÀ» Çغ¸ÀÚ. M °ú N À» °¢ÀÚÀÇ ½Ã½ºÅÛ¿¡ ¸Â°Ô ¼öÁ¤ÇÏ¸é¼ NB ¸¦ 28 ¿¡¼ 60 ±îÁö º¯È ½ÃŲ´Ù.
TIME M N NB NRHS NBRHS P Q LU Time Sol Time MFLOPS CHECK ---- ----- ----- --- ---- ----- ---- ---- -------- -------- -------- ------ WALL 5000 5000 28 1 1 1 7 19.85 0.14 4170.73 PASSED WALL 5000 5000 30 1 1 1 7 14.85 0.14 5562.10 PASSED WALL 5000 5000 32 1 1 1 7 15.40 0.13 5367.77 PASSED WALL 5000 5000 34 1 1 1 7 15.89 0.15 5198.10 PASSED WALL 7000 7000 28 1 1 1 7 49.39 0.24 4608.81 PASSED WALL 7000 7000 30 1 1 1 7 37.77 0.27 6013.41 PASSED WALL 7000 7000 32 1 1 1 7 38.96 0.25 5833.21 PASSED WALL 7000 7000 34 1 1 1 7 39.07 0.26 5816.04 PASSED WALL 10000 10000 28 1 1 1 7 133.66 0.41 4973.60 PASSED WALL 10000 10000 30 1 1 1 7 99.69 0.45 6659.18 PASSED WALL 10000 10000 32 1 1 1 7 102.15 0.43 6500.25 PASSED WALL 10000 10000 34 1 1 1 7 101.73 0.40 6529.03 PASSED |
À§ÀÇ ½ÇÇè¿¡¼ ÃøÁ¤µÈ ÃÖ°í ¼º´ÉÀº M=N=10000 NB=30 À϶§ 6659.18 MFLOPS ÀÌ´Ù. M=N °ªÀ» ÅëÀÏÇÏ°í NB ÀÇ ÃÖÀûÈ °ªÀ» ã¾Æº¸µµ·Ï ÇÏÀÚ.
TIME M N NB NRHS NBRHS P Q LU Time Sol Time MFLOPS CHECK ---- ----- ----- --- ---- ----- ---- ---- -------- -------- -------- ------ WALL 1000 1000 28 1 1 1 7 1.09 0.03 595.45 PASSED WALL 1000 1000 30 1 1 1 7 0.33 0.02 1939.09 PASSED WALL 1000 1000 32 1 1 1 7 0.33 0.02 1911.82 PASSED WALL 1000 1000 34 1 1 1 7 0.37 0.02 1699.22 PASSED WALL 1000 1000 36 1 1 1 7 0.38 0.02 1695.32 PASSED WALL 1000 1000 38 1 1 1 7 0.40 0.02 1606.20 PASSED |
M=N=1000 À¸·Î ÅëÀϽÃÅ°°í ÃøÁ¤ÇßÀ»°æ¿ì NB °¡ 30 ÀÏ°æ¿ì °¡Àå ÁÁÀº ¼º´ÉÀ» º¸ÀÓ.
´ÙÀ½Àº ´ë¿ë·® ¸Þ¸ð¸® ½Ã½ºÅÛ À» º¥Ä¡¸¶Å© Çϴµ¥ ¾²ÀÌ´Â (Àü¼¼°è ½´ÆÛÄÄÇ»Æà ¼øÀ§¸¦ ¸Å±â´Â TOP 500 Site ¿¡¼ »ç¿ëÇÏ´Â ÇÁ·Î±×·¥) HPL À» ÀÌ¿ëÇÏ¿© º¥Ä¡¸¶Å· À» Çغ¸ÀÚ. HPL À» ¼³Ä¡Çϱâ Àü¿¡ BLAS , MPICH CBLAS µîÀÌ ¼³Ä¡µÇ¾î ÀÖ¾î¾ß ÇÑ´Ù. ¿©±â¼´Â ATLAS ÀÇ BLAS ·çƾÀ» ÀÌ¿ëÇÒ °Í À̱⠶§¹®¿¡ ATLAS µµ ¼³Ä¡µÇ¾î ÀÖ¾î¾ß ÇÑ´Ù. CBLAS ¼³Ä¡´Â Çѱ¹ Ŭ·¯½ºÅÍ ±â¼ú¼¾ÅÍ http://www.cluster.or.kr/board/read.php?table=benchmark=3 ¸¦ Âü°íÇϰųª ¿©±â¸¦ Âü°íÇϵµ·Ï ÇÑ´Ù. http://www.netlib.org/blas/ Hpl À» ´Ù¿î¹Þ¾Æ¼ ¾ÐÃàÀ» Ǭ´Ù. http://www.netlib.org/benchmark/hpl/
[micro@master share]$ tar xzf hpl.tgz |
hpl µð·ºÅ丮 ¾ÈÀÇ setup µð·ºÅ丮 ¿¡¼ ÇØ´çÇ÷§Æû¿¡ ¸Â´Â make ÆÄÀÏÀ» hpl Top µð·ºÅ丮 ¾ÈÀ¸·Î º¹»çÇÑ´Ù. ¿©±â¼´Â Linux ÀÇ Athlon Ĩ, ±×¸® °í BLAS ÀÇ C ÀÎÅÍÆäÀ̽ºÀÎ CBLAS ¸¦ »ç¿ëÇÒ °Í À̹ǷΠÆÄÀϸíÀº ´ÙÀ½ °ú °°´Ù.
[micro@master share]$ cd hpl [micro@master hpl]$ cp setup/Make.Linux_ATHLON_CBLAS . |
ÇØ´çÆÄÀÏÀ» ¼öÁ¤Çϵµ·Ï ÇÑ´Ù.
[micro@master hpl]$ vi Make.Linux_ATHLON_CBLAS ------ Make.Linux_ATHLON_CBLAS ------- SHELL = /bin/sh # CD = cd CP = cp LN_S = ln -s MKDIR = mkdir RM = /bin/rm -f TOUCH = touch ARCH = Linux_ATHLON_CBLAS TOPdir = $(HOME)/hpl INCdir = $(TOPdir)/include BINdir = $(TOPdir)/bin/$(ARCH) LIBdir = $(TOPdir)/lib/$(ARCH) # HPLlib = $(LIBdir)/libhpl.a #CC = gcc CC = /usr/local/mpich/bin/mpicc <- MPICH ÀÇ C ÄÄÆÄÀÏ·¯ NOOPT = #CCFLAGS = -fomit-frame-pointer -O3 -funroll-loops -W -Wall CCFLAGS = -fomit-frame-pointer -O3 -funroll-loops # #LINKER = gcc LINKER = /usr/local/mpich/bin/mpicc LINKFLAGS = $(CCFLAGS) # ARCHIVER = ar ARFLAGS = r RANLIB = echo MPdir = /usr/local/mpich MPinc = -I$(MPdir)/include MPlib = $(MPdir)/lib/libmpich.a F2CDEFS = NOOPT = F77 = /usr/local/mpich/bin/mpif77 F77LOADER = /usr/local/mpich/bin/mpif77 F77FLAGS = -O $(NOOPT) LAdir = $(HOME)/ATLAS/lib/Linux_ATHLONSSE1 LAinc = $(HOME)/ATLAS/include/Linux_ATHLONSSE1 LAlib = $(LAdir)/libcblas.a $(LAdir)/libatlas.a #HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) <- ÀºÎºÐÀ» ¾Æ·¡¿Í °°ÀÌ ¼öÁ¤ÇÑ´Ù. HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) -I$(LAinc) $(MPinc) HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) HPL_OPTS = -DHPL_CALL_CBLAS HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) ------ Make.Linux_ATHLON_CBLAS ------- |
´ÙÀ½ Make.top ÆÄÀÏ°ú Makefile ÀÇ arch ºÎºÐÀ» ¼öÁ¤ÇØ ÁØ´Ù.
# arch = Linux_ATHLON_CBLAS # |
ÄÄÆÄÀÏ ÇÑ´Ù. Make arch=[ÇØ´ç½Ã½ºÅÛ] À» ÀÔ·ÂÇÏÀÚ. ±×·³ bin µð·ºÅ丮 ¾Æ ·¡¿¡ [ÇØ´ç½Ã½ºÅÛ] µð·ºÅ丮°¡ »ý¼ºµÆÀ»°ÍÀÌ´Ù.
[micro@master hpl]$ make arch=Linux_ATHLON_CBLAS [micro@master hpl]$ cd bin/Linux_ATHLON_CBLAS |
bin/Linux_ATHLON_CBLAS µð·ºÅ丮¿¡ °¡º¸¸é HPL.dat ÆÄÀÏ°ú xhpl ÆÄ ÀÏÀÌ º¸ÀÏ°ÍÀÌ´Ù. HPL.dat ÆÄÀÏÀº ¾Õ¼ LINPACK º¥Ä¡¸¶Å· ¿¡ ȯ°æ¼³Á¤ ÆÄÀÏó·³ ¿©·¯°¡Áö º¥Ä¡¸¶Å·¿¡ ÇÊ¿äÇÑ ÆĶó¹ÌÅÍ µéÀ» ¼³Á¤ÇÏ´Â °÷ÀÌ°í, xhpl ½ÇÇàÆÄÀÏÀº ½ÇÁúÀûÀ¸·Î º¥Ä¡¸¶Å·¿¡ µ¹¸®´Â ÇÁ·Î±×·¥ÀÌ´Ù. ±×·³ HPL.dat ÆÄÀÏÀÇ Æ÷¸ËÀ» »ìÆ캸ÀÚ.
[micro@master hpl]$ vi HPL.dat ----- HPL.dat ----- HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 10000 1 # of NBs 85 NBs 1 # of process grids (P x Q) 1 Ps 7 Qs 16.0 threshold 1 # of panel fact 1 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 2 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 1 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) ----- HPL.dat ----- |
±âº»ÀûÀ¸·Î LINPACK º¥Ä¡¸¶Å©ÀÇ LU.dat °ú Å©°Ô ´Ù¸£Áö ¾Ê´Ù´Â °ÍÀ» ¾Ë¼ö°¡ ÀÖ´Ù. ¸î°¡Áö Â÷ÀÌÁ¡Àº Problem size °¡ 1Â÷¿ø À¸·Î ¹Ù²ï°Í°ú Swapping threshold ¸¦ ÁöÁ¤ÇÒ¼ö ÀÖ´Ù´Â °Í µîÀε¥ ÀÚ¼¼ÇÑ ³»¿ëÀº ÇØ´ç Æ©´× ÆäÀÌÁö http://www.netlib.org/benchmark/hpl/tuning.html ¸¦ Âü°íÇϵµ·Ï ÇÏÀÚ. xhpl À» ½ÇÇàÇØ º¸µµ·Ï ÇÑ´Ù.
[micro@master Linux_ATHLON_CBLAS]$ mpirun -np 7 xhpl ==================================================================== HPLinpack 1.0 -- High-Performance Linpack benchmark -- September 27, 2000 Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK ==================================================================== An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 10000 NB : 85 P : 1 Q : 7 PFACT : Crout NBMIN : 4 NDIV : 2 RFACT : Right BCAST : 1ringM DEPTH : 1 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words ============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- W11R2C4 10000 85 1 7 70.49 9.460e+00 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0646673 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0153022 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0034203 ...... PASSED ============================================================================ |
À§ÀÇ °á°ú´Â N = 10000 , NB = 85 À϶§ 9.46Gflops °¡ ³ª¿Ô´Ù. LINPACK °ú ¸¶Âù°¡Áö·Î ¿©·¯ºÐÀÇ ½Ã½ºÅÛ È¯°æ¿¡ ¸Â°Ô problem size ¿Í NB ¸¦ ÀûÀýÈ÷ ¼öÁ¤ÇØ °¡¸é¼ ½Ã½ºÅÛÀÌ ¼öÇàÇÒ¼ö ÀÖ´Â ÃÖ°í¼º´ÉÀ» À̲ø¾î ³»º¸ÀÚ. HPL.dat ÆÄÀÏÀ» ¼öÁ¤ÇÑ´ÙÀ½ ÄÄÆÄÀÏÀ» ´Ù½Ã ÇÑ´Ù.
[micro@master Linux_ATHLON_CBLAS]$ rm -f ./xhpl [micro@master Linux_ATHLON_CBLAS]$ cd ../../ [micro@master Linux_ATHLON_CBLAS]$ make clean [micro@master Linux_ATHLON_CBLAS]$ make all |