4.5. Install & Run

ÀÌÁ¦ ¿©·¯ºÐÀÇ Å¬·¯½ºÅÍ ½Ã½ºÅÛ¿¡ º¥Ä¡¸¶Å© ÇÁ·Î±×·¥À» ¼³Ä¡Çغ¸°í Á÷Á¢ ¼º´ÉÀ» üũ Çغ¸ÀÚ. ÀÌ ¹®¼­¿¡¼­´Â NetPIPE ¿Í SCALAPACK À» ¼³Ä¡, Å×½ºÆ® Çغ¼°ÍÀÌ´Ù.

4.5.1. Å×½ºÆ® »ç¾ç

4.5.2. NetPIPE

NetPIPE ´Â ¸Ó½Å°£¿¡ ȤÀº ¿¡´õ³Ý Ä«µå°£¿¡ ³×Æ®¿÷ ¼º´ÉÀ» ÃøÁ¤ÇÒ¼ö ÀÖ´Â º¥Ä¡¸¶Å© ÇÁ·Î±×·¥ ÀÌ´Ù. PVM °ú MPI µÑ´Ù Áö¿øÇÏ°í ÀÖÀ¸¸ç ¿©·¯ ±âÁ¾ÀÇ Çϵå¿þ¾î µµ Áö¿øÇÏ°í ÀÖ´Ù. ÀνºÅçÀ» Çغ¸ÀÚ. ´ÙÀ½ÀÇ url ¿¡¼­ ¼Ò½º¸¦ ´Ù¿î¹Þ´Â´Ù. À̱ÛÀ» ¾²´Â ÇöÀç NetPIPE ÀÇ ÃֽŹöÀüÀº 3.3 ÀÌ´Ù. http://www.scl.ameslab.gov/netpipe/ Ŭ·¯½ºÅÍ°£ °øÀ¯µð·ºÅ丮 (ÇÊÀÚÀÇ È¯°æ¿¡¼­´Â /home/share ) ¿¡¼­ ÀÛ¾÷À» ÇÏ´Â °ÍÀÌ Á»´õ Æí¸®ÇÏ´Ù.

[micro@master share]$ tar xzf NetPIPE_3.3.tar.gz 
[micro@master share]$ cd NetPIPE_3.3
	  

Makefile À» ÆíÁýÇϵµ·Ï ÇÏÀÚ. ´ëºÎºÐ ¼Õ´îºÎºÐÀº °ÅÀÇ ¾ø°í MPI ÀÇ ¼³Á¤ ºÎºÐ¸¸ ÆíÁýÇØÁÖ¸é µÈ´Ù. MPICC ÄÄÆÄÀÏ·¯ ¸¦ ÇØ´ç »ç¿ëÀÚÀÇ ½Ã½ºÅÛ ¿¡ ¸Â ´Â ÄÄÆÄÀÏ·¯¸¦ ÁöÁ¤ÇØÁÖ¸é µÈ´Ù. Áï LAM-MPI ·Î NetPIPE ¸¦ º¥Ä¡¸¶Å· ÇÒ°æ¿ì´Â LAM ÀÇ mpicc ¸¦ ÁöÁ¤ÇØÁÖ¸é µÇ°í MPICH ÀÇ ¼º´ÉÀ» Å×½ºÆ® Çغ¸·Á¸é MPICH ÀÇ mpicc ¸¦ ÁöÁ¤ÇØÁÖ¸é µÈ´Ù.

[micro@master NetPIPE_3.3]$ vi makefile 
# For MPI, mpicc will set up the proper include and library paths
MPICC       = /usr/local/mpich/bin/mpicc    	# ©¬ À̺κÐ
¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦.
MPI2CC   = /usr/local/mpich/bin/mpicc		# ©¬ À̺κÐ
	  

¼öÁ¤À» ÇßÀ¸¸é ÄÄÆÄÀÏÀ» Çغ¸µµ·Ï ÇÑ´Ù. ´Ü¼øÈ÷ tcp ÀÇ ¼º´ÉÀ» Å×½ºÆà ÇØ º¸±â À§Çؼ± make tcp ·Î ÄÄÆÄÀÏ ÇÏ¸é µÈ´Ù.

[micro@master NetPIPE_3.3]$ make tcp
	  

±âº»ÀûÀ¸·Î NetPIPE ÀÇ ÆÛÆ÷¸Õ½º ÃøÁ¤Àº ¾ç¹æÇâ ping-pong Å×½ºÆ® ÀÌ´Ù. Tcp ¼º´É º¥Ä¡¸¶Å· À» Çϴ°æ¿ì ÇÑÂÊ¿¡¼± receiver °¡ µÇ°í ´Ù¸¥ ÇÑÂÊ¿¡ ¼± sender ·Î ½ÇÇà½ÃÄÑ ÁÖ¸é µÈ´Ù.

[micro@master NetPIPE_3.3]$ ./NPtcp -r &
[micro@master NetPIPE_3.3]$ rsh node01
[micro@node01 NetPIPE_3.3]$ ./NPtcp -t -h master 
Send and receive buffers are 512000 and 512000 bytes
(A bug in Linux doubles the requested buffer sizes)
Now starting the main loop
  0:       1 bytes    500 times -->      0.10 Mbps in      78.87 usec
  1:       2 bytes   1267 times -->      0.19 Mbps in      78.89 usec
  2:       3 bytes   1267 times -->      0.29 Mbps in      78.84 usec
  3:       4 bytes    845 times -->      0.39 Mbps in      79.14 usec
4:       6 bytes    947 times -->      0.58 Mbps in      79.00 usec
¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦
123: 8388611 bytes      3 times -->    505.81 Mbps in  126529.34 usec
	 

ÇÊÀÚÀÇ È¯°æ¿¡¼­´Â PEAK °á°ú°¡ ´ÙÀ½°ú °°ÀÌ ³ª¿Ô´Ù.

Ç¥ 4-1. NetPIPE º¥Ä¡¸¶Å© °á°ú (tcp)

Message SizeBandwidthLatency (Usec)
8.3 Mbytes505 Mbps126529

ÀÌÁ¦ MPI ¸¦ ÀÌ¿ëÇÑ ³×Æ®¿÷ ¼º´ÉÀ» ÃøÁ¤ÇØ º¸µµ·Ï ÇÏÀÚ. Âü°í·Î ÀÌ ¹®¼­¿¡¼­´Â lam-mpi ÀÇ mpi ¼º´ÉÀ» ÃøÁ¤ÇØ º¼°ÍÀÌ´Ù. ¾Õ¼­ makefile ¿¡ mpicc ¸¦ ¼öÁ¤ÇØ ÁáÀ¸¸é ÄÄÆÄÀÏÀ» Çϵµ·Ï ÇÑ´Ù.

[micro@master NetPIPE_3.3]$ make mpi 
	 

Npmpi ÇÁ·Î±×·¥ÀÌ ÄÄÆÄÀÏ µÇ¾úÀ» °ÍÀÌ´Ù. Lam-mpi ÀÇ mpirun À¸·Î ½ÇÇà ÇØ º¸µµ·Ï ÇÏÀÚ.

[micro@master NetPIPE_3.3]$ mpirun -O -np 2 ./Npmpi
0: master
1: node01
0:       1 bytes    500 times -->	0.09 Mbps in      81.29 usec
1:       2 bytes   1230 times -->      0.19 Mbps in      81.40 usec
2:       3 bytes   1228 times -->      0.28 Mbps in      81.38 usec
3:       4 bytes    819 times -->      0.38 Mbps in      81.16 usec
¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦¡¦
123: 8388611 bytes      3 times -->    504.27 Mbps in  126917.33 usec
	 

lam-mpi ÀÇ mpi ³×Æ®¿÷ ÃÖ´ë ¼º´ÉÀº ´ÙÀ½°ú °°ÀÌ ³ª¿Ô´Ù.

Ç¥ 4-2. NetPIPE º¥Ä¡¸¶Å© °á°ú (mpi)

Message SizeBandwidthLatency (Usec)
8.3 Mbytes508 Mbps94347

¸¶Âù°¡Áö·Î mpich ÀÇ mpi ³×Æ®¿÷ ÆÛÆ÷¸Õ½ºµµ Çѹø Å×½ºÆ® Çغ¸±â ¹Ù¶õ´Ù.

4.5.3. SCALAPACK (LINPACK Benchmark)

SCALAPACK Àº ¼±Çü´ë¼ö ÀÇ Çظ¦ ±¸ÇÏ´Â ÆÐÅ°Áö·Î SCALAPACK ¿¡ ±âº»ÀûÀ¸·Î Æ÷ÇԵǾî ÀÖÀ¸¸ç ¸¹Àº ºÎºÐÀÌ ºÎµ¿¼Ò¼öÁ¡ ¿¬»êÀ¸·Î ±¸¼ºµÇ¾î ÀÖ´Ù. LINPACK º¥Ä¡¸¶Å© ¿¡¼­ ÁßÁ¡ÀûÀ¸·Î »ç¿ëµÇ´Â ·çƾµéÀº Gauss ¼Ò°Å¹ýÀ» ÀÌ¿ëÇÑ N °³ÀÇ ¼±Çü¹æÁ¤½Ä ÀÇ Çظ¦ ±¸ÇÏ´Â °ÍÀ¸·Î BLAS (Basic Linear Algebra Subprograms) ¿¡ Æ÷ÇԵǾî ÀÖ´Ù. BLAS ´Â LINPACK º¥Ä¡¸¶Å© ¿¡¼­ °¡Àå ±âº»ÀÌ µÇ´Â ¶óÀ̺귯¸® ·Î½á ±âº»ÀûÀÎ ¼±Çü´ë¼ö ¿¬»êÇÔ¼ö µéÀ» ±¸ÇöÇسõÀº ÁýÇÕÀÌ´Ù. ÀÌ°ÍÀº Fortran À¸·Î Â¥¿©Á® ÀÖÀ¸¸ç BLAS ¶óÀ̺귯¸® ³»ÀÇ °¢ ÇÔ¼öµéÀº ¿¬»êÀÚ¿Í ¿¬»ê°á°ú°¡ Vector ³Ä Matrix ³Ä ¿¡ µû¶ó °è»ê ·¹º§ÀÌ ³ª´µ¾î Áø´Ù. ÀÌ BLAS ¸¦ ÀÌ¿ëÇØ º¥Ä¡¸¶Å·À» ÇÒ¼öµµ ÀÖÁö¸¸ ATLAS (Automatically Tuned Linear Algebra Software) ¸¦ ÀÌ¿ëÇÏ¿© ÇØ´ç Ç÷§Æû¿¡ ÃÖÀûÈ­µÈ ·çƾ ¶óÀ̺귯¸® ¸¦ »ý¼ºÇÒ¼öµµ ÀÖ´Ù. ÀÌ ¹®¼­¿¡¼­´Â BLAS, BLACS, ATLAS, SCALAPACK À» ÀÌ¿ëÇÏ¿© º¥Ä¡¸¶Å·À» ÇÒ°ÍÀÌ´Ù. ¿©±â¼­´Â ¼³Ä¡ ¹× ½ÇÇàÀÇ °£°áÇÑ Guide ¸¸À» Á¦½ÃÇÒ ¿¹Á¤À̸ç ÀÌ ¹®¼­ ¿ÜÀÇ ±âŸ ÀÚ¼¼ÇÑ ³»¿ëÀº SCALAPACK ÀÇ È¨ÆäÀÌÁö http://www.netlib.org/scalapack/ À̳ª Çѱ¹Å¬·¯½ºÅÍ ±â¼ú¼¾ÅÍ(http://www.cluster.or.kr/board/read.php?table=benchmark) ÀÇ º¥Ä¡¸¶Å© Guide ¸¦ Âü°íÇϱ⠹ٶõ´Ù.

4.5.3.1. LINPACK

http://www.netlib.org/blas/blas.tgz ¿¡¼­ BLAS ¸¦ ´Ù¿î ¹Þ¾Æ¼­ ¼³Ä¡ÇÑ´Ù.

[micro@master share]$ mkdir BLAS
[micro@master share]$ cd BLAS
[micro@master share]$ tar xzf blas.tgz 
	    

ÄÄÆÄÀÏ ÇÑ´Ù. Âü°í·Î ÇØ´ç ÇÁ·Î¼¼¼­¿¡ ÃÖÀûÈ­µÈ ÄÄÆÄÀÏ·¯ ¸¦ »ç¿ëÇÏ¸é ¼º´ÉÇâ»óÀ» º¼¼ö ÀÖ´Ù (Intel ÀÇ pgcc ³ª Compaq ÀÇ ccc µîµî..)

[micro@master share]$ f77 -c *.f 
	    

»ý¼ºµÈ ¿ÀºêÁ§Æ® ÆÄÀϵé (È®ÀåÀÚ°¡ *.o) À» ¶óÀ̺귯¸® ·Î ¸¸µç´Ù.

[micro@master share]$ ar cr blas_LINUX.a *.o 
	    

BLACS (Basic Linear Algebra Communication Subprograms) ¼³Ä¡. BLACS ´Â ´Ù¾çÇÑ ºÐ»ê¸Þ¸ð¸® ȯ°æ¿¡¼­ ÇÁ·Î¼¼¼­°£ ¸Þ½ÃÁö Åë½ÅÀ» À§ ÇÑ ¼±Çü´ë¼ö ¶óÀ̺귯¸® ÀÌ´Ù. PVM °ú MPI ¿ëÀÌ µû·Î ÀÖÀ¸¹Ç·Î ÇÊ¿ä ÇÑ ÆÄÀÏÀ» ´Ù¿î¹Þ´Â´Ù. ¿©±â¼­´Â MPI ¸¦ »ç¿ëÇϹǷΠhttp://www.netlib.org/blacs/ ¿¡¼­ mpiblacs.tgz ¿Í blacstester.tgz ¸¦ ´Ù¿î ¹Þ´Â´Ù. ´ÙÀ½ mpiblacs.tgz ¸¦ ¾ÐÃàÀ» Ç®¸é BLACS µð·ºÅ丮°¡ »ý±ä´Ù.

[micro@master share]$ tar xzf mpiblacs.tgz 
[micro@master share]$ tar xzf blacstester.tgz BLACS/TESTING/*
[micro@master share]$ cd BLACS 
	    

BMAKES µð·ºÅ丮¿¡¼­ ¸Ó½Å¿¡ ¸Â´Â Bmake ÆÄÀÏÀ» BLACS µð·ºÅ丮 ·Î º¹»çÇÑ´Ù.

[micro@master BLACS]$ cp BMAKES/Bmake.MPI-LINUX ./Bmake.inc 
	    

Bmake.inc ÆÄÀÏÀ» ÆíÁýÇÑ´Ù. Bmake.inc ÆÄÀÏÀº 3°¡Áö Section À¸·Î Á¤ÀǵǾî ÀÖÀ¸¸ç °¢ ¼½¼ÇÀº ÄÄÆÄÀÏ °úÁ¤¿¡¼­ ÇÊ¿äÇÑ ¿©·¯ ¸ÅÅ©·Î¸¦ Á¤ ÀÇ ÇÑ´Ù. ¼½¼Ç 1 ¿¡¼­´Â ¶óÀ̺귯¸®¿Í ½ÇÇàÆÄÀÏÀÇ À§Ä¡¸¦ ÁöÁ¤ÇÏ°í make °á°ú·Î »ý¼ºµÇ´Â ÆÄÀÏÀÇ À̸§À» ÁöÁ¤ÇÒ ¶§ ÀÌ¿ëÇÏ´Â ¸ÅÅ©·Î¸¦ Á¤ÀÇÇÑ´Ù. Section 2 ¿¡¼­´Â BLACS ¿¡¼­ ÀÌ¿ëÇÏ´Â C Preprocessor °ª À» Á¤ÀÇÇÑ´Ù. Section 3 ¿¡¼± ÄÄÆÄÀÏ·¯¿Í ¸µÄ¿/·Î´õ ¸¦ ¼³Á¤ÇÏ´Â ¸ÅÅ©·Î ¸¦ Á¤ÀÇÇÑ´Ù.

[micro@master BLACS]$ vi Bmake.inc
#============ SECTION 1: PATHS AND LIBRARIES =======================
SHELL = /bin/sh			<- »ç¿ëÇÒ ½©ÀÇ Á¾·ù

BTOPdir = $(HOME)/BLACS  	<- BLACS ÀÇ Top Level µð·ºÅ丮

COMMLIB = MPI			<- »ç¿ëÇÒ communication ¶óÀ̺귯¸® CMMD, 					MPI, PVM, MPL, NX Áß Çϳª

PLAT = LINUX               	<- Ç÷¿Æû

BLACSdir    = $(BTOPdir)/LIB	<- BLACS ¶óÀ̺귯¸®ÀÇ À§Ä¡
BLACSDBGLVL = 1			<- µð¹ö±ë ·¹º§ (0 = NO, 1 = YES)
BLACSFINIT  = $(BLACSdir)/blacsF77init_$(COMMLIB)-$(PLAT)-$(BLACSDBGLVL).a
BLACSCINIT  = $(BLACSdir)/blacsCinit_$(COMMLIB)-$(PLAT)-$(BLACSDBGLVL).a
BLACSLIB    = $(BLACSdir)/blacs_$(COMMLIB)-$(PLAT)-$(BLACSDBGLVL).a
			<- ¶óÀ̺귯¸® À̸§µé 

MPIdir = /usr/local/mpich		<- MPICH ÀÇ À§Ä¡
MPIdev = ch_p4mpd			<- MPICH Device ÀÇ Á¾·ù
MPIplat = LINUX			
MPILIBdir = $(MPIdir)/$(MPIdev)/lib	<- MPICH ¶óÀ̺귯¸® À§Ä¡
MPIINCdir = $(MPIdir)/$(MPIdev)/include	<- MPICH Çì´õ ÆÄÀÏ À§Ä¡
MPILIB = $(MPILIBdir)/libmpich.a	<- MPICH ¶óÀ̺귯¸® ÆÄÀÏ

BTLIBS = $(BLACSFINIT) $(BLACSLIB) $(BLACSFINIT) $(MPILIB)
			<- Å×½ºÆÿ¡ ÇÊ¿äÇÑ ¶óÀ̺귯¸® µé.
INSTdir = $(BTOPdir)/INSTALL/EXE
TESTdir = $(BTOPdir)/TESTING/EXE
FTESTexe = $(TESTdir)/xFbtest_$(COMMLIB)-$(PLAT)-$(BLACSDBGLVL)
CTESTexe = $(TESTdir)/xCbtest_$(COMMLIB)-$(PLAT)-$(BLACSDBGLVL)
#================= End SECTION 1===============================
#============== SECTION 2: BLACS INTERNALS ========================
SYSINC = -I$(MPIINCdir)
INTFACE = -Df77IsF2C	<- Fortran77 ¿¡¼­ C ·Î ÀÎÅÍÆäÀ̽º ¹æ¹ý Add_, NoChange, UpCase, ¶Ç´Â f77IsF2C È®½ÇÇÏÁö ¾ÊÀ» °æ¿ì INSTALL/EXE/xintface ÇÁ·Î±×·¥À» ¼öÇàÇØ º¼°Í
SENDIS =           <- -DSndIsLocBlk ·Î Á¤ÀÇÇϸé MPI_Send °¡
                   locally-blocking ·çƾÀ¸·Î ¼öÇàµÈ¾î ´õ È¿À²ÀûÀÌ´Ù.
		   ºñ¿öµÑ °æ¿ì globally-blockingÀ¸·Î °¡Á¤ÇÑ´Ù.
BUF = 
TRANSCOMM = -DuseMpich <- ¿©±â¿¡ Á¤ÀÇ	µÈ ÆĶó¹ÌÅÍ´Â Ç÷§Æû ¸¶´Ù Ʋ¸®´Ù. ÁÖ¼®¿¡¼­´Â BLACS/INSTALL/xtc_CsameF77 °ú BLACS/INSTALL/xtc_UseMpich ¸¦ ½ÇÇàÇϵµ·Ï Áö½ÃÇÏ°í ÀÖ´Ù. xtc_CsameF77 °ú xtc_UseMpich ¸¦ »ý¼ºÇÏ´Â ¹æ¹ýÀº ´ÙÀ½°ú °°´Ù.
	$ BLACS/INSTALL/make xtc_CsameF77
$ BLACS/INSTALL/make xtc_UseMpich
ÀÌ ÆÄÀÏÀ» ½ÇÇà½ÃÅ°¸é ¼³Á¤ÇÒ °ªÀÌ Ãâ·ÂµÈ´Ù.
	$ BLACS/INSTALL/EXE/mpirun -np 2 xtc_CsameF77
.............
Set TRANSCOMM = -DUseMpich
$ BLACS/INSTALL/EXE/xtc_UseMpich
Set TRANSCOMM = -DuseMpich
WHATMPI =
SYSERRORS =
DEBUGLVL = -DBlacsDebugLvl=$(BLACSDBGLVL)
DEFS1 = -DSYSINC $(SYSINC) $(INTFACE) $(DEFBSTOP) $(DEFCOMBTOP) $(DEBUGLVL)
BLACSDEFS = $(DEFS1) $(SENDIS) $(BUFF) $(TRANSCOMM) $(WHATMPI) $(SYSERRORS)
#================= End SECTION 2===============================
#================= SECTION 3: COMPILERS ============================
F77            = f77            <- fortran ÄÄÆÄÀÏ·¯
#F77NO_OPTFLAGS = -Nx400
F77FLAGS       = $(F77NO_OPTFLAGS) -O
F77LOADER      = $(F77)
F77LOADFLAGS   =
CC             = gcc            <- C ÄÄÆÄÀÏ·¯
CCFLAGS        = -O4
CCLOADER       = $(CC)
CCLOADFLAGS    =

ARCH      = ar
ARCHFLAGS = r
RANLIB    = ranlib
#================= End SECTION 3 ===============================
	   

ÄÄÆÄÀÏ ÇÑ´Ù

[micro@master BLACS]$ make mpi 
	   

LIB/blacs_MPI-LINUX-1.a È­ÀÏÀÌ »ý¼ºµÇ¾î¾ß ÇÑ´Ù. SRC/ µð·ºÅ丮¿¡´Â »ç¿ëÀÚ°¡ È£ÃâÇÒ ¼ö ÀÖ´Â ·çƾµéÀÌ µé¾î ÀÖ°í, ¸ðµÎ C¿Í Fortran77 ÀÎÅÍÆäÀ̽º¸¦ °¡Áö°í ÀÖ´Ù. ¸ðµç non-communication ·çƾµéÀº blacs_ ¶ó´Â Á¢µÎ¾î·Î ½ÃÀ۵ȴÙ. BLACS ³»ºÎ ·çƾ°ú Àü¿ª º¯¼ö(global variables)µéÀº ¸ðµÎ BI_ ¶ó´Â Á¢µÎ¾î¸¦ °¡Áö°í ÀÖ´Ù.

ATLAS (Automatically Tuned Linear Algebra Software) ¼³Ä¡. ATLAS Ȩ (http://math-atlas.sourceforge.net) ¿¡¼­ °ü·Ã ÆÄÀÏÀ» ´Ù¿î ¹Þ¾Æ ¼³Ä¡¸¦ ÇÑ´Ù. À̱ÛÀ» ¾²´Â ½ÃÁ¡¿¡¼­ÀÇ ATLAS ÃֽŹöÀüÀº 3.5.2 ÀÌ´Ù.

[micro@master share]$ tar xzf atlas3.5.2.tar.gz
[micro@master share]$ cd ATLAS
[micro@master ATLAS]$ make config CC=gcc  #CC ¸¦ ÁöÁ¤ÇÏÁö ¾ÊÀ¸¸é gcc °¡ »ç¿ëµÈ´Ù.
[micro@master ATLAS]$ make config
gcc -o xconfig config.c
./xconfig
ATLAS configure started.
160
159
¡¦¡¦
001
Enter number at top left of screen [0]: 160 # È­¸é¿¡ º¸ÀÌ´Â °¡Àå Å« ¼ö¸¦ Àû´Â´Ù.
====================================================================
IMPORTANT
====================================================================
Before going any further, check
http://math-atlas.sourceforge.net/errata.html.
This is the ATLAS errata file, which keeps a running count of all known
ATLAS bugs and system problems, with associated workarounds or fixes.
IF YOU DO NOT CHECK THIS FILE, YOU MAY BE COMPILING A LIBRARY WITH KNOWN BUGS.

Have you scoped the errata file? [y]: y 	# errata ¹®¼­¸¦ Àоµµ·Ï ÇÑ´Ù.
Configure will ask a series of questions, in one of two forms. The first form of question is a menu of choices. One option in almost all menus is
'Other/UNKNOWN'. If you are unsure of the answer, always choose this option.
¡¦¡¦¡¦¡¦..(»ý·«)¡¦¡¦¡¦.
Are you ready to continue? [y]: y 
I need to know if you are using a cross-compiler (i.e., you are compiling on a different architecture than you want the library built for).

Are you using a cross-compiler? [n]: n
Probing to make operating system determination:
Operating system configured as Linux # ¸Â´ÂÁö È®ÀÎ

Probing for architecture:
Architecture is set to ATHLON # ¸Â´ÂÁö È®ÀÎ

Probing for supported ISA extensions:
make[2]: *** [atlas_run] Error 132
make[1]: *** [IRun_SSE1] Error 2
SSE2: NO.
SSE1: DETECTED!
Number of CPUs: 1
Required cache flush detected as : 524288 bytes
Looking for compilers:

F77 = /usr/bin/g77 -funroll-all-loops -O3
CC = /usr/bin/gcc -fomit-frame-pointer -O3 -funroll-all-loops
MCC = /usr/bin/gcc -fomit-frame-pointer -O

Looking for BLAS (this may take a while):
Unable to find usable BLAS, BLASlib left blank.
FINDING tar, gzip, AND gunzip
tar : /bin/tar
gzip : /bin/gzip
gunzip : /bin/gunzip


ATLAS has default parameters for OS='Linux' and system='ATHLON'.
If you want to just trust these default values, you can use express setup,
drastically reducing the amount of questions you are required to answer

use express setup? [y]: y
¡¦¡¦¡¦¡¦¡¦
Enter Architecture name (ARCH) [Linux_ATHLONSSE1]: Enter 

[micro@master ATLAS]$ make install arch=< arch>
	   

arch ´Â ¾ÆÅ°ÅØÃÄ À̸§ÀÌ´Ù. config °úÁ¤¿¡¼­ ¸¶Áö¸·¿¡ Ãâ·ÂµÈ´Ù. À§ÀÇ config °úÁ¤¿¡¼­ Ãâ·ÂµÈ ´ë·Î make install arch=Linux_ATHLONSSE1 ¸¦ ÀÔ·ÂÇÑ´Ù.

[micro@master ATLAS]$ make install arch=Linux_ATHLONSSE1
.........
(»ý·« 1½Ã°£ ÀÌ»ó ¼Ò¿ä)
ATLAS install complete. Examine
ATLAS/bin//INSTALL_LOG/SUMMARY.LOG for details.
	   

´ÙÀ½Àº SCALAPACK À» ¼³Ä¡ÇÑ´Ù. MPICH, BLAS, BLACS °¡ ¼³Ä¡µÇ¾î ÀÖ¾î¾ß ÇÑ´Ù. http://www.netlib.org/scalapack/ ¿¡¼­ ÃֽŹöÀüÀÇ scalapack À» ´Ù¿î¹ÞÀºÈÄ ¾ÐÃàÀ» Ç®¸é SCALAPACK µð·ºÅ丮°¡ »ý±ä´Ù.

[micro@master share]$ tar xzf scalapack.tgz 
	   

Slmake.inc ÆÄÀÏÀ» ÆíÁýÇÑ´Ù. ÀÌ ÆÄÀÏÀº ¸ðµç Makefile ¿¡¼­ include µÇ¾î »ç¿ë µÇ¸ç ¼³Ä¡¿¡ ÇÊ¿äÇÑ ¸ÅÅ©·Î¸¦ Á¤ÀÇÇÑ´Ù. INSTALL µð·ºÅ丮¸¦ ÂüÁ¶ ÇÏ¿© Àڽſ¡°Ô ¸Â´Â Slmake.inc ÆÄÀÏÀ» º¹»çÇÏ¿© ÆíÁýÇÑ´Ù.

[micro@master share]$ cd SCALAPACK
[micro@master SCALAPACK]$ cp INSTALL/SLmake.LINUX ./SLmake.inc
[micro@master SCALAPACK]$ vi SLmake.inc 
	   

´ëºÎºÐÀÇ °ªµéÀº ±âº»°ªÀ» ±×´ë·Î ÀÌ¿ëÇÏ°í, ¾Õ¼­ ATLAS ¸¦ ÀÌ¿ë Ç÷§Æû ¿¡ ÃÖÀûÈ­µÈ BLAS ¸¦ »ý¼ºÇßÀ¸´Ï °ü·Ã¼³Á¤À» ¸ÂÃß¾î ÁÖµµ·Ï ÇÑ´Ù.

############################################################################
#
#  Program:         ScaLAPACK
#
#  Module:          SLmake.inc
#
#  Purpose:         Top-level Definitions
#
#  Creation date:   February 15, 2000
#
#  Modified:
#
#  Send bug reports, comments or suggestions to scalapack@cs.utk.edu
#
############################################################################
#
SHELL         = /bin/sh
#
#  The complete path to the top level of ScaLAPACK directory, usually
#  $(HOME)/SCALAPACK
#
home          = $(HOME)/SCALAPACK
#
#  The platform identifier to suffix to the end of library names
#
PLAT          = LINUX
#
#  BLACS setup.  All version need the debug level (0 or 1),
#  and the directory where the BLACS libraries are
#
BLACSDBGLVL   = 1
BLACSdir      = $(HOME)/BLACS/LIB
#
#  MPI setup; tailor to your system if using MPIBLACS
#  Will need to comment out these 6 lines if using PVM
#
USEMPI        = -DUsingMpiBlacs
#SMPLIB        = /usr/lib/mpi/build/LINUX/ch_p4/lib/libmpich.a
SMPLIB        = /usr/local/mpich/lib/libmpich.a
BLACSFINIT    = $(BLACSdir)/blacsF77init_MPI-$(PLAT)-$(BLACSDBGLVL).a
BLACSCINIT    = $(BLACSdir)/blacsCinit_MPI-$(PLAT)-$(BLACSDBGLVL).a
BLACSLIB      = $(BLACSdir)/blacs_MPI-$(PLAT)-$(BLACSDBGLVL).a
TESTINGdir    = $(home)/TESTING

#
#  PVMBLACS setup, uncomment next 6 lines if using PVM
#
#USEMPI        =
#SMPLIB        = $(PVM_ROOT)/lib/$(PLAT)/libpvm3.a
#BLACSFINIT    =
#BLACSCINIT    =
#BLACSLIB      = $(BLACSdir)/blacs_PVM-$(PLAT)-$(BLACSDBGLVL).a
#TESTINGdir    = $(HOME)/pvm3/bin/$(PLAT)

CBLACSLIB     = $(BLACSCINIT) $(BLACSLIB) $(BLACSCINIT)
FBLACSLIB     = $(BLACSFINIT) $(BLACSLIB) $(BLACSFINIT)

#
#  The directories to find the various pieces of ScaLapack
#
PBLASdir      = $(home)/PBLAS
SRCdir        = $(home)/SRC
TESTdir       = $(home)/TESTING
PBLASTSTdir   = $(TESTINGdir)
TOOLSdir      = $(home)/TOOLS
REDISTdir     = $(home)/REDIST
REDISTTSTdir  = $(TESTINGdir)
#
#  The fortran and C compilers, loaders, and their flags
#
F77           = /usr/local/mpich/bin/mpif77
CC            = /usr/local/mpich/bin/mpicc
NOOPT        = 
F77FLAGS     =  -funroll-all-loops -O3 $(NOOPT)
DRVOPTS      = $(F77FLAGS)
CCFLAGS      = -O4
SRCFLAG       =
#F77LOADER     = $(F77)
F77LOADER     = $(F77)
CCLOADER      = $(CC)
F77LOADFLAGS  =
CCLOADFLAGS   =
#
#  C preprocessor defs for compilation 
#  (-DNoChange, -DAdd_, -DUpCase, or -Df77IsF2C)
#
CDEFS         = -Df77IsF2C -DNO_IEEE $(USEMPI)
#
#  The archiver and the flag(s) to use when building archive (library)
#  Also the ranlib routine.  If your system has no ranlib, set RANLIB = echo
#
ARCH          = ar
ARCHFLAGS     = cr
RANLIB        = ranlib
#
#  The name of the libraries to be created/linked to
#
SCALAPACKLIB  = $(home)/libscalapack.a
#BLASLIB       = $(HOME)/BLAS/blas_LINUX.a
# ATLAS ÀÇ BLAS ¶óÀ̺귯¸®¸¦ ÁöÁ¤ÇØ ÁØ´Ù.
BLASLIB       = -L$(HOME)/ATLAS/lib/Linux_ATHLONSSE1 -lf77blas -latlas
#
PBLIBS        = $(SCALAPACKLIB) $(FBLACSLIB) $(BLASLIB) $(SMPLIB)
PRLIBS        = $(SCALAPACKLIB) $(CBLACSLIB) $(SMPLIB)
RLIBS         = $(SCALAPACKLIB) $(FBLACSLIB) $(CBLACSLIB) $(BLASLIB) $(SMPLIB)
LIBS          = $(PBLIBS)
############################################################################
	   

ÄÄÆÄÀÏ ÇÑ´Ù. ÄÄÆÄÀÏ °úÁ¤¿¡¼­ ¿¡·¯°¡ »ý±â¸é SLmake.inc ÆÄÀÏÀ» ¼öÁ¤ÇÏ °í ´Ù½Ã ÄÄÆÄÀÏ ÇÑ´Ù.

[micro@master SCALAPACK]$ make lib 
	   

SCALAPACK µð·ºÅ丮 ¾Æ·¡¿¡ libscalapack.a ÆÄÀÏÀÌ ¸¸µé¾î Áø´Ù. ¿©±â ±îÁö ÀÌ»óÀÌ ¾ø´Ù¸é °£´ÜÇÑ Å×½ºÆ® ÇÁ·Î±×·¥À» µ¹·Áº¸ÀÚ.

[micro@master SCALAPACK]$ cd TESTING
[micro@master TESTING]$ cd LIN
[micro@master LIN]$ make double
[micro@master LIN]$ cd ..
[micro@master TESTING]$ /usr/local/mpich/bin/mpirun -np [ÇÁ·Î¼¼¼­ °³¼ö] ./xdlu 
ScaLAPACK Ax=b by LU factorization.
'MPI Machine'

Tests of the parallel real double precision LU factorization and solve.
The following scaled residual checks will be computed:
 Solve residual         = ||Ax - b|| / (||x|| * ||A|| * eps * N)
 Factorization residual = ||A - LU|| / (||A|| * eps * N)
The matrix A is randomly generated for each test.

An explanation of the input/output parameters follows:
TIME    : Indicates whether WALL or CPU time was used.
M       : The number of rows in the matrix A.
N       : The number of columns in the matrix A.
NB      : The size of the square blocks the matrix A is split into.
NRHS    : The total number of RHS to solve for.
NBRHS   : The number of RHS to be put on a column of processes before going
          on to the next column of processes.
P       : The number of process rows.
Q       : The number of process columns.
THRESH  : If a residual value is less than THRESH, CHECK is flagged as PASSED
LU time : Time in seconds to factor the matrix
Sol Time: Time in seconds to solve the system.
MFLOPS  : Rate of execution for factor and solve.

The following parameter values will be used:
  M       :         10000
  N       :         10000
  NB      :            36
  NRHS    :             3
  NBRHS   :             3
  P       :             1
  Q       :             7

Relative machine precision (eps) is taken to be       0.111022E-15
Routines pass computational tests if scaled residual is less than   1.0000

TIME     M     N  NB NRHS NBRHS    P    Q  LU Time Sol Time  MFLOPS  CHECK
---- ----- ----- --- ---- ----- ---- ---- -------- -------- -------- ------
WALL 10000 10000  36     3    3    1    7   100.52     0.46  6607.53 PASSED
Finished      1 tests, with the following results:
    1 tests completed and passed residual checks.
    0 tests completed and failed residual checks.
    0 tests skipped because of illegal input values.

END OF TESTS.
	   

À§¿Í ºñ½ÁÇÑ °á°ú°¡ ³ª¿Í¾ß ÇÑ´Ù. ´ÙÀ½Àº SCALAPACKÀÌ ¼³Ä¡µÇ¾ú´Ù°í °¡Á¤ÇÏ°í, ´ÜÀÏ ³ëµå¿¡¼­ LINPACK Benchmark¸¦ ¼öÇàÇÏ´Â ¹æ¹ýÀ» ¼³¸íÇÑ´Ù. ATLAS ÃÖÀûÈ­ ·çƾÀ» ÀÌ¿ëÇÏ·Á¸é SLmake.inc¿¡¼­ ATLAS¿¡¼­ Á¦°øÇÏ´Â BLAS ·çƾÀ» ÀÌ¿ëÇϵµ·Ï ¼³Á¤ÇÑ´Ù.

[micro@master SCALAPACK]$ vi SLmake.inc 
......(»ý·«)
BLASLIB       = -L$(HOME)/ATLAS/lib/Linux_ATHLONSSE1 -lf77blas -latlas
......(»ý·«)
	   

SCALAPACK/TESTING µð·ºÅ丮¿¡´Â ±âº»ÀûÀ¸·Î 13°³ÀÇ .dat È­ÀÏ°ú LIN, EIG µð·ºÅ丮°¡ ¼³Ä¡µÈ´Ù. LIN, EIG µð·ºÅ丮¿¡´Â º¥Ä¡¸¶Å©¸¦ ¼öÇàÇÏ´Â FortranÀ¸·Î ÀÛ¼ºµÈ ¼Ò½ºÄÚµå¿Í MakefileÀÌ µé¾î ÀÖ´Ù. LIN µð·ºÅ丮´Â Linear Equations Testing À» À§ÇÑ ÇÁ·Î±×·¥µéÀÌ µé¾î ÀÖ´Ù. EIG µð·ºÅ丮¿¡´Â Eigenroutine TestingÀ» À§ÇÑ ÇÁ·Î±×·¥ÀÌ µé¾î ÀÖ´Ù. TESTING µð·ºÅ丮¿¡ Æ÷ÇÔµÈ .dat È­ÀÏ°ú ¿ëµµ´Â ´ÙÀ½°ú °°´Ù.

BLLT.dat 'ScaLAPACK, Version 1.2, banded linear systems input file'
BLU.dat  'ScaLAPACK, Version 1.2, banded linear systems input file'
BRD.dat  'ScaLAPACK BRD input file'
HRD.dat  'ScaLAPACK HRD input file'
INV.dat  'ScaLAPACK, Version 1.0, Matrix Inversion Testing input file'
LLT.dat  'ScaLAPACK, LLt factorization input file'
LS.dat   'ScaLAPACK LS solve input file'
LU.dat   'SCALAPACK, LU factorization input file'
NEP.dat  'SCALAPACK NEP (Nonsymmetric Eigenvalue Problem) input file'
QR.dat   'ScaLAPACK, Orthogonal factorizations input file'
SEP.dat  'ScaLAPACK Symmetric Eigensolver Test File'
SVD.dat  'ScaLAPACK Singular Value Decomposition  input file'
TRD.dat  'ScaLAPACK TRD computation input file'
	   

.dat ÆÄÀÏ¿¡´Â °¢°¢ÀÇ º¥Ä¡¸¶Å©¿¡ ÇÊ¿äÇÑ º¯¼öµéÀÌ ÀúÀåµÈ´Ù. Å×½ºÆ® ¿¡ ÇÊ¿äÇÑ LU.dat ÆÄÀÏÀ» »ìÆ캸ÀÚ.

[micro@master TESTING]$ vi LU.dat 
-- LU.dat --
'SCALAPACK, LU factorization input file'
'MPI Machine'
'LU.out'                output file name (if any)
6                       device out
4                       number of problems sizes
4 10 17 13 23 31 57     values of M
4 12 13 13 23 31 50     values of N
3                       number of NB's
2 3 4 5                 values of NB
3                       number of NRHS's
1 3 9 28                values of NRHS
3                       Number of NBRHS's
1 3 5 7                 values of NBRHS
4                       number of process grids (ordered pairs of P & Q)
1 2 1 4 2 3 8           values of P
1 2 4 1 3 2 1           values of Q
1.0                     threshold
T                       (T or F) Test Cond. Est. and Iter. Ref. Routines
-- LU.dat --
	  

´ë·«ÀûÀÎ Çü½ÄÀº °¢ º¥Ä¡¸¶Å©¿¡ ÇÊ¿äÇÑ °ªµé°ú ÇØ´ç °ªµéÀÇ ¼ýÀÚ·Î Á¤ÀÇ µÇ¾î ÀÖ´Ù. ¿¹¸¦ µé¾î number of problems sizes °¡ 4 À̹ǷΠM °ú N ÀÇ º¥Ä¡¸¶Å© Å×½ºÆà ¿¡ »ç¿ëµÉ °ªÀº M = 4,10,17,13 ÀÌ°í N = 4,12,13,23 ÀÌ´Ù. ±×¸®°í values of P,Q ´Â ÇÁ·Î¼¼¼­ÀÇ Grid ¸¦ ¶æÇÑ´Ù. P ´Â ÇÁ·Î¼¼¼­ÀÇ row ¸¦ ¶æÇϸç Q ´Â column À» ¶æÇÑ´Ù. À§ÀÇ °æ¿ì P = 1 , Q = 1 À̸é 1 X 1 = 1 ÀÌ´Ï 1 °³ÀÇ ÇÁ·Î¼¼¼­ ¿¡¼­ ½ÇÇàµÈ´Ù´Â °ÍÀ» ¶æÇÑ´Ù. µÎ¹ø° ÀÇ °æ¿ìÀÎ P = 2 , Q = 2 ÀÇ °æ¿ì 2 X 2 = 4 ÀÌ´Ï 4 °³ÀÇ ÇÁ·Î¼¼¼­(Node) ¿¡¼­ ½ÇÇàµÈ´Ù´Â °ÍÀ» ¶æÇÑ´Ù. ÇØ´ç º¥Ä¡¸¶Å· ÇÁ·Î±×·¥À» »ý¼ºÇÏ·Á¸é LIN À̳ª EIG µð·ºÅ丮 ¿¡¼­ make [type] À» ½ÇÇàÇÏ¸é µÈ´Ù. ÇØ´ç type Àº single, double, complex, complex16 4 °¡Áö °¡ ÀÖ´Ù. ¿¹¸¦ µé¾î make single Àº single precision floating point ¸¦ À§ÇÑ º¥Ä¡¸¶Å· ÇÁ·Î±×·¥ÀÌ »ý¼ºµÈ´Ù. make all Àº 4°¡Áö Çü¿¡ ´ëÇÑ ¸ðµç º¥Ä¡¸¶Å· ÇÁ·Î±×·¥À» ÇѲ¨¹ø¿¡ ÄÄÆÄÀÏ ÇÑ´Ù. ÇØ´ç type ¿¡ ´ëÇÑ ÆÄÀϸí°ú °³¼ö´Â ´ÙÀ½°ú °°´Ù.

[micro@master LIN]$ make single 
[micro@master LIN]$ ls ../xs* 
xsdblu*  xsdtlu*  xsgblu*  xsinv*  xsllt*  xsls*  xslu*  xspbllt*  xsptllt*  xsqr*
[micro@master LIN]$ make double
[micro@master LIN]$ ls ../xd*
xddblu*  xddtlu*  xdgblu*  xdinv*  xdllt*  xdls*  xdlu*  xdpbllt*  xdptllt*  xdqr*
[micro@master LIN]$ make complex 
[micro@master LIN]$ ls ../xc*
xcdblu*  xcdtlu*  xcgblu*  xcinv*  xcllt*  xcls*  xclu*  xcpbllt*  xcptllt*  xcqr*
[micro@master LIN]$ make complex16 
[micro@master LIN]$ ls ../xz* 
xzdblu*  xzdtlu*  xzgblu*  xzinv*  xzllt*  xzls*  xzlu*  xzpbllt*  xzptllt*  xzqr*
	  

ÀÌ¿Í °°ÀÌ ÃÑ 40°³ÀÇ ½ÇÇàÆÄÀÏ µéÀÌ »ý¼ºµÇ´Âµ¥ ¸¶Âù°¡Áö·Î EIG µð·ºÅ丮 ¿¡¼­µµ °°Àº Çü½ÄÀ¸·Î make ¸¦ Çϸé ÇÁ·Î±×·¥ÀÌ »ý¼ºµÈ´Ù. °¢°¢ÀÇ ÇÁ·Î±× ·¥À» ½ÇÇàÇÏ¿© º¸ÀÚ. MPI ¸¦ ÀÌ¿ëÇÏ¿© ½ÇÇàÇÏ·Á¸é mpirun -np N program À» ½ÇÇàÇÏ¸é µÈ´Ù. LINPACK ÀÇ º¥Ä¡¸¶Å© ¿¡¼­´Â LU.dat ÆÄÀÏÀÇ ÆĶó¸ÞÅÍ ¼³Á¤À» ÀÌ¿ëÇÏ¿© º¥Ä¡¸¶Å© ÇÒ¼ö ÀÖ´Ù. ±âº»ÀûÀ¸·Î ScaLAPACKÀº ºí·° ´ÜÀ§·Î ¿¬»êÀ» ¼öÇàÇϸç, Ŭ·¯½ºÅÍ¿Í °°Àº º´·Ä ÄÄÇ»ÅÍ¿¡¼­ ÃÖ´ë ¼º´ÉÀ» ¾ò±â À§Çؼ­´Â ÁÖ¾îÁø ÄÄÇ»ÅÍ¿¡ ÀûÀýÇÑ ºí·ÏÀÇ Å©±â¸¦ ±¸ÇÏ¿©¾ß ÇÑ´Ù. ÀÌ´Â °è»êÀ¸·Î ´ë·«ÀûÀÎ °ªÀ» ±¸ÇÑ ÈÄ¿¡, ¸¹Àº ½ÇÇàÀ» °ÅÃÄ °æÇèÀûÀ¸·Î ¾òÀ» ¼ö ÀÖ´Ù. ±×¸®°í ÄÄÇ»ÅÍ°¡ ¼öÇàÇÒ ¼ö ÀÖ´Â ÃÖ´ë ¹®Á¦ Å©±â(Nmax)¸¦ ¾ò±â À§Çؼ­´Â, ÇϳªÀÇ ÇÁ·Î¼¼¼­¿¡¼­ ¹®Á¦ Å©±â¸¦ Á¡Â÷·Î ´Ã¸®¸é¼­ ÁÖ¾îÁø ¸Þ¸ð¸®¿¡ ³ëµå°¡ Æ÷¿ëÇÒ ¼ö ÀÖ´Â ÃÖ´ë Å©±â¸¦ ¾Ë¾Æ³»¾ß ÇÑ´Ù. ¸¶Áö¸·À¸·Î À̸¦ ¹ÙÅÁÀ¸·Î ¸¹Àº ³ëµå¸¦ °¡Áø º´·Ä ÄÄÇ»ÅÍ¿¡¼­ ¼öÇàÇÒ ¼ö ÀÖ´Â ÃÖ´ëÀÇ Å©±â·Î LU ÀμöºÐÇØ ·çÆÃÀ» ¼öÇà½ÃÄѼ­ ÃÖ´ëÀÇ ¼º´É(Rmax)À» ¾òÀ» ¼ö ÀÖ´Ù.

Fortran À¸·Î Â¥¿©Áø LIN/pdludriver.f ÆÄÀÏ¿¡¼­ TOTMEM ÀÇ °ªÀ» º¯È­½ÃÅ°¸é¼­ Segmentation fault °¡ ¹ß»ýÇÏ´Â ¹üÀ§¸¦ Á¶»çÇØ º¸ÀÚ. ÀÌ°ÍÀº ÁÖ ¸Þ¸ð¸® »Ó¸¸ ¾Æ´Ï¶ó ½º¿Ò¿µ¿ªÀÇ Å©±â¿¡ ¿µÇâÀ» ¹Þ´Â´Ù. ½º¿Ò ¿µ¿ªÀÇ Å©±âº¸´Ù TOTMEM °ªÀ» Å©°Ô Çϸé Segmentation fault ¸¦ ÀÏÀ¸Å³ °ÍÀÌ´Ù. ¹°·Ð ¼Ò½º¸¦ ¼öÁ¤ÇÏ°í ³ª¸é ÄÄÆÄÀÏÀ» ´Ù½Ã ÇØ¾ß ÇÑ´Ù. 2GB ¸Þ¸ð¸®, 500MB ½º¿Ò¿µ¿ªÀÇ ÁÖ¾îÁø Á¶°Ç¿¡¼­ TOTMEM À» 500000000 ·Î Á¤ÇÏ¿´´Ù. TESTING µð·ºÅ丮ÀÇ LU.dat À» ´ÙÀ½°ú °°ÀÌ ¼öÁ¤ÇÏ°í xdlu ¸¦ ½ÇÇà½ÃÄÑ º¸ÀÚ. °è»ê¿¡ ÇÊ¿äÇÑ ¸Þ¸ð¸®¾çÀÌ ¸ÞÀÎ ¸Þ¸ð¸®ÀÇ Å©±âº¸´Ù Å©¸é ½º¿Ò¿µ¿ªÀÇ ¾ï¼¼½º¸¦ À§Çؼ­ Çϵåµð½ºÅ©°¡ µ¿ÀÛÇÏ´Â °ÍÀ» º¼¼ö ÀÖÀ»°ÍÀÌ´Ù.

-- LU.dat --
'SCALAPACK, LU factorization input file'
'MPI Machine'
'LU.out'                output file name (if any)
6                       device out
6                       number of problems sizes
1000 1200 1400 1600 1800 2000 values of M
1000 1200 1400 1600 1800 2000 values of N
1                       number of NB's
60		        values of NB
1                       number of NRHS's
1 	               values of NRHS
1                       Number of NBRHS's
1	                 values of NBRHS
1                       number of process grids (ordered pairs of P & Q)
1		       values of P
1		       values of Q
1.0                     threshold
T                       (T or F) Test Cond. Est. and Iter. Ref. Routines
-- LU.dat --
	  

ÀÌ°ÍÀ» 2¹ø ½ÇÇàÇÑ °á°ú´Â ´ÙÀ½°ú °°´Ù. ù¹ø° ½ÇÇàÇÑ °á°ú

TIME     M     N  NB NRHS NBRHS    P    Q  LU Time Sol Time  MFLOPS  CHECK
---- ----- ----- --- ---- ----- ---- ---- -------- -------- -------- ------
WALL  1000  1000  60     1    1    1    1     0.64     0.01  1026.42 PASSED
WALL  1200  1200  60     1    1    1    1     1.05     0.02  1078.62 PASSED
WALL  1400  1400  60     1    1    1    1     1.67     0.02  1083.34 PASSED
WALL  1600  1600  60     1    1    1    1     2.29     0.03  1177.45 PASSED
WALL  1800  1800  60     1    1    1    1     3.13     0.04  1227.84 PASSED
WALL  2000  2000  60     1    1    1    1     4.37     0.05  1207.76 PASSED
	  

µÎ¹ø° ½ÇÇàÇÑ °á°ú

TIME     M     N  NB NRHS NBRHS    P    Q  LU Time Sol Time  MFLOPS  CHECK
---- ----- ----- --- ---- ----- ---- ---- -------- -------- -------- ------

WALL  1000  1000  60     1    1    1    1     0.63     0.01  1032.06 PASSED
WALL  1200  1200  60     1    1    1    1     1.05     0.02  1079.69 PASSED
WALL  1400  1400  60     1    1    1    1     1.59     0.02  1134.49 PASSED
WALL  1600  1600  60     1    1    1    1     2.28     0.03  1184.27 PASSED
WALL  1800  1800  60     1    1    1    1     3.12     0.04  1231.93 PASSED
WALL  2000  2000  60     1    1    1    1     4.37     0.05  1207.30 PASSED
	  

¹®Á¦ÀÇ Å©±â°¡ Ä¿Áú¼ö·Ï MFLOPS °¡ Áõ°¡ÇÏ´Ù°¡ SWAP À» »ç¿ëÇÒ Á¤µµ°¡ µÇ¸é ¼º´ÉÀÌ ¶³¾îÁö´Â °ÍÀ» ¾Ë¼ö ÀÖ´Ù. ´ÙÀ½Àº NB¸¦ ¹Ù²Ù¸é¼­ ¼öÇàÀ» Çغ¸ÀÚ. M °ú N À» °¢ÀÚÀÇ ½Ã½ºÅÛ¿¡ ¸Â°Ô ¼öÁ¤Çϸ鼭 NB ¸¦ 28 ¿¡¼­ 60 ±îÁö º¯È­ ½ÃŲ´Ù.

TIME     M     N  NB NRHS NBRHS    P    Q  LU Time Sol Time  MFLOPS  CHECK
---- ----- ----- --- ---- ----- ---- ---- -------- -------- -------- ------

WALL  5000  5000  28     1    1    1    7    19.85     0.14  4170.73 PASSED
WALL  5000  5000  30     1    1    1    7    14.85     0.14  5562.10 PASSED
WALL  5000  5000  32     1    1    1    7    15.40     0.13  5367.77 PASSED
WALL  5000  5000  34     1    1    1    7    15.89     0.15  5198.10 PASSED
WALL  7000  7000  28     1    1    1    7    49.39     0.24  4608.81 PASSED
WALL  7000  7000  30     1    1    1    7    37.77     0.27  6013.41 PASSED
WALL  7000  7000  32     1    1    1    7    38.96     0.25  5833.21 PASSED
WALL  7000  7000  34     1    1    1    7    39.07     0.26  5816.04 PASSED
WALL 10000 10000  28     1    1    1    7   133.66     0.41  4973.60 PASSED
WALL 10000 10000  30     1    1    1    7    99.69     0.45  6659.18 PASSED
WALL 10000 10000  32     1    1    1    7   102.15     0.43  6500.25 PASSED
WALL 10000 10000  34     1    1    1    7   101.73     0.40  6529.03 PASSED
	 

À§ÀÇ ½ÇÇè¿¡¼­ ÃøÁ¤µÈ ÃÖ°í ¼º´ÉÀº M=N=10000 NB=30 À϶§ 6659.18 MFLOPS ÀÌ´Ù. M=N °ªÀ» ÅëÀÏÇÏ°í NB ÀÇ ÃÖÀûÈ­ °ªÀ» ã¾Æº¸µµ·Ï ÇÏÀÚ.

TIME     M     N  NB NRHS NBRHS    P    Q  LU Time Sol Time  MFLOPS  CHECK
---- ----- ----- --- ---- ----- ---- ---- -------- -------- -------- ------

WALL  1000  1000  28     1    1    1    7     1.09     0.03   595.45 PASSED
WALL  1000  1000  30     1    1    1    7     0.33     0.02  1939.09 PASSED
WALL  1000  1000  32     1    1    1    7     0.33     0.02  1911.82 PASSED
WALL  1000  1000  34     1    1    1    7     0.37     0.02  1699.22 PASSED
WALL  1000  1000  36     1    1    1    7     0.38     0.02  1695.32 PASSED
WALL  1000  1000  38     1    1    1    7     0.40     0.02  1606.20 PASSED
	 

M=N=1000 À¸·Î ÅëÀϽÃÅ°°í ÃøÁ¤ÇßÀ»°æ¿ì NB °¡ 30 ÀÏ°æ¿ì °¡Àå ÁÁÀº ¼º´ÉÀ» º¸ÀÓ.

4.5.3.2. HPL (High-Performance Linpack Benchmark)

´ÙÀ½Àº ´ë¿ë·® ¸Þ¸ð¸® ½Ã½ºÅÛ À» º¥Ä¡¸¶Å© Çϴµ¥ ¾²ÀÌ´Â (Àü¼¼°è ½´ÆÛÄÄÇ»Æà ¼øÀ§¸¦ ¸Å±â´Â TOP 500 Site ¿¡¼­ »ç¿ëÇÏ´Â ÇÁ·Î±×·¥) HPL À» ÀÌ¿ëÇÏ¿© º¥Ä¡¸¶Å· À» Çغ¸ÀÚ. HPL À» ¼³Ä¡Çϱâ Àü¿¡ BLAS , MPICH CBLAS µîÀÌ ¼³Ä¡µÇ¾î ÀÖ¾î¾ß ÇÑ´Ù. ¿©±â¼­´Â ATLAS ÀÇ BLAS ·çƾÀ» ÀÌ¿ëÇÒ °Í À̱⠶§¹®¿¡ ATLAS µµ ¼³Ä¡µÇ¾î ÀÖ¾î¾ß ÇÑ´Ù. CBLAS ¼³Ä¡´Â Çѱ¹ Ŭ·¯½ºÅÍ ±â¼ú¼¾ÅÍ http://www.cluster.or.kr/board/read.php?table=benchmark=3 ¸¦ Âü°íÇϰųª ¿©±â¸¦ Âü°íÇϵµ·Ï ÇÑ´Ù. http://www.netlib.org/blas/ Hpl À» ´Ù¿î¹Þ¾Æ¼­ ¾ÐÃàÀ» Ǭ´Ù. http://www.netlib.org/benchmark/hpl/

[micro@master share]$ tar xzf hpl.tgz
	    

hpl µð·ºÅ丮 ¾ÈÀÇ setup µð·ºÅ丮 ¿¡¼­ ÇØ´çÇ÷§Æû¿¡ ¸Â´Â make ÆÄÀÏÀ» hpl Top µð·ºÅ丮 ¾ÈÀ¸·Î º¹»çÇÑ´Ù. ¿©±â¼­´Â Linux ÀÇ Athlon Ĩ, ±×¸® °í BLAS ÀÇ C ÀÎÅÍÆäÀ̽ºÀÎ CBLAS ¸¦ »ç¿ëÇÒ °Í À̹ǷΠÆÄÀϸíÀº ´ÙÀ½ °ú °°´Ù.

[micro@master share]$ cd hpl 
[micro@master hpl]$ cp setup/Make.Linux_ATHLON_CBLAS . 
	    

ÇØ´çÆÄÀÏÀ» ¼öÁ¤Çϵµ·Ï ÇÑ´Ù.

[micro@master hpl]$ vi Make.Linux_ATHLON_CBLAS
------ Make.Linux_ATHLON_CBLAS -------
SHELL        = /bin/sh
#
CD           = cd
CP           = cp
LN_S         = ln -s
MKDIR        = mkdir
RM           = /bin/rm -f
TOUCH        = touch

ARCH         = Linux_ATHLON_CBLAS

TOPdir       = $(HOME)/hpl
INCdir       = $(TOPdir)/include
BINdir       = $(TOPdir)/bin/$(ARCH)
LIBdir       = $(TOPdir)/lib/$(ARCH)
#
HPLlib       = $(LIBdir)/libhpl.a

#CC           = gcc
CC           = /usr/local/mpich/bin/mpicc  <- MPICH ÀÇ C ÄÄÆÄÀÏ·¯
NOOPT        =
#CCFLAGS      = -fomit-frame-pointer -O3 -funroll-loops -W -Wall
CCFLAGS      = -fomit-frame-pointer -O3 -funroll-loops
#
#LINKER       = gcc
LINKER       = /usr/local/mpich/bin/mpicc
LINKFLAGS    = $(CCFLAGS)
#
ARCHIVER     = ar
ARFLAGS      = r
RANLIB       = echo

MPdir        = /usr/local/mpich
MPinc        = -I$(MPdir)/include
MPlib        = $(MPdir)/lib/libmpich.a

F2CDEFS      =
NOOPT        =
F77          = /usr/local/mpich/bin/mpif77
F77LOADER    = /usr/local/mpich/bin/mpif77
F77FLAGS     = -O $(NOOPT)

LAdir        = $(HOME)/ATLAS/lib/Linux_ATHLONSSE1
LAinc        = $(HOME)/ATLAS/include/Linux_ATHLONSSE1
LAlib        = $(LAdir)/libcblas.a $(LAdir)/libatlas.a

#HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
		<- À­ºÎºÐÀ» ¾Æ·¡¿Í °°ÀÌ ¼öÁ¤ÇÑ´Ù.
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) -I$(LAinc) $(MPinc)
HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)

HPL_OPTS     = -DHPL_CALL_CBLAS
HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
------ Make.Linux_ATHLON_CBLAS -------
	    

´ÙÀ½ Make.top ÆÄÀÏ°ú Makefile ÀÇ arch ºÎºÐÀ» ¼öÁ¤ÇØ ÁØ´Ù.

#
arch             = Linux_ATHLON_CBLAS
#
	    

ÄÄÆÄÀÏ ÇÑ´Ù. Make arch=[ÇØ´ç½Ã½ºÅÛ] À» ÀÔ·ÂÇÏÀÚ. ±×·³ bin µð·ºÅ丮 ¾Æ ·¡¿¡ [ÇØ´ç½Ã½ºÅÛ] µð·ºÅ丮°¡ »ý¼ºµÆÀ»°ÍÀÌ´Ù.

[micro@master hpl]$ make arch=Linux_ATHLON_CBLAS 
[micro@master hpl]$ cd bin/Linux_ATHLON_CBLAS 
	    

bin/Linux_ATHLON_CBLAS µð·ºÅ丮¿¡ °¡º¸¸é HPL.dat ÆÄÀÏ°ú xhpl ÆÄ ÀÏÀÌ º¸ÀÏ°ÍÀÌ´Ù. HPL.dat ÆÄÀÏÀº ¾Õ¼­ LINPACK º¥Ä¡¸¶Å· ¿¡ ȯ°æ¼³Á¤ ÆÄÀÏó·³ ¿©·¯°¡Áö º¥Ä¡¸¶Å·¿¡ ÇÊ¿äÇÑ ÆĶó¹ÌÅÍ µéÀ» ¼³Á¤ÇÏ´Â °÷ÀÌ°í, xhpl ½ÇÇàÆÄÀÏÀº ½ÇÁúÀûÀ¸·Î º¥Ä¡¸¶Å·¿¡ µ¹¸®´Â ÇÁ·Î±×·¥ÀÌ´Ù. ±×·³ HPL.dat ÆÄÀÏÀÇ Æ÷¸ËÀ» »ìÆ캸ÀÚ.

[micro@master hpl]$ vi HPL.dat
----- HPL.dat -----
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
10000
1           # of NBs
85           NBs
1            # of process grids (P x Q)
1            Ps
7            Qs
16.0         threshold
1            # of panel fact
1            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
2        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64          swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
----- HPL.dat -----
	    

±âº»ÀûÀ¸·Î LINPACK º¥Ä¡¸¶Å©ÀÇ LU.dat °ú Å©°Ô ´Ù¸£Áö ¾Ê´Ù´Â °ÍÀ» ¾Ë¼ö°¡ ÀÖ´Ù. ¸î°¡Áö Â÷ÀÌÁ¡Àº Problem size °¡ 1Â÷¿ø À¸·Î ¹Ù²ï°Í°ú Swapping threshold ¸¦ ÁöÁ¤ÇÒ¼ö ÀÖ´Ù´Â °Í µîÀε¥ ÀÚ¼¼ÇÑ ³»¿ëÀº ÇØ´ç Æ©´× ÆäÀÌÁö http://www.netlib.org/benchmark/hpl/tuning.html ¸¦ Âü°íÇϵµ·Ï ÇÏÀÚ. xhpl À» ½ÇÇàÇØ º¸µµ·Ï ÇÑ´Ù.

[micro@master Linux_ATHLON_CBLAS]$ mpirun -np 7 xhpl
====================================================================
HPLinpack 1.0  --  High-Performance Linpack benchmark  --  September 27, 2000
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Labs.,  UTK
====================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   10000
NB     :    85
P      :       1
Q      :       7
PFACT  :   Crout
NBMIN  :       4
NDIV   :       2
RFACT  :   Right
BCAST  :  1ringM
DEPTH  :       1
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words
============================================================================
T/V                N    NB     P     Q               Time             Gflops
----------------------------------------------------------------------------
W11R2C4        10000    85     1     7              70.49          9.460e+00
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0646673 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0153022 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0034203 ...... PASSED
============================================================================
	    

À§ÀÇ °á°ú´Â N = 10000 , NB = 85 À϶§ 9.46Gflops °¡ ³ª¿Ô´Ù. LINPACK °ú ¸¶Âù°¡Áö·Î ¿©·¯ºÐÀÇ ½Ã½ºÅÛ È¯°æ¿¡ ¸Â°Ô problem size ¿Í NB ¸¦ ÀûÀýÈ÷ ¼öÁ¤ÇØ °¡¸é¼­ ½Ã½ºÅÛÀÌ ¼öÇàÇÒ¼ö ÀÖ´Â ÃÖ°í¼º´ÉÀ» À̲ø¾î ³»º¸ÀÚ. HPL.dat ÆÄÀÏÀ» ¼öÁ¤ÇÑ´ÙÀ½ ÄÄÆÄÀÏÀ» ´Ù½Ã ÇÑ´Ù.

[micro@master Linux_ATHLON_CBLAS]$ rm -f ./xhpl
[micro@master Linux_ATHLON_CBLAS]$ cd ../../
[micro@master Linux_ATHLON_CBLAS]$ make clean
[micro@master Linux_ATHLON_CBLAS]$ make all