VREDUCEPD (Intel x86/64 assembly instruction)

모두의 코드
VREDUCEPD (Intel x86/64 assembly instruction)

작성일 : 2020-09-01 이 글은 494 번 읽혔습니다.

VREDUCEPD

Perform Reduction Transformation on Packed Float64 Values

참고 사항

아래 표를 해석하는 방법은 x86-64 명령어 레퍼런스 읽는 법 글을 참조하시기 바랍니다.

Opcode/ Instruction	Op / En	64/32 bit Mode Support	CPUID Feature Flag	Description
`EVEX.128.66.0F3A.W1 56 /r ib` VREDUCEPD xmm1 {k1}{z} xmm2/m128/m64bcst imm8	FV	V/V	AVX512VL AVX512DQ	Perform reduction transformation on packed double-precision floating point values in xmm2/m128/m32bcst by subtracting a number of fraction bits specified by the imm8 field. Stores the result in xmm1 register under writemask k1.
`EVEX.256.66.0F3A.W1 56 /r ib` VREDUCEPD ymm1 {k1}{z} ymm2/m256/m64bcst imm8	FV	V/V	AVX512VL AVX512DQ	Perform reduction transformation on packed double-precision floating point values in ymm2/m256/m32bcst by subtracting a number of fraction bits specified by the imm8 field. Stores the result in ymm1 register under writemask k1.
`EVEX.512.66.0F3A.W1 56 /r ib` VREDUCEPD zmm1 {k1}{z} zmm2/m512/m64bcst{sae} imm8	FV	V/V	AVX512DQ	Perform reduction transformation on double-precision floating point values in zmm2/m512/m32bcst by subtracting a number of fraction bits specified by the imm8 field. Stores the result in zmm1 register under writemask k1.

Instruction Operand Encoding

Op/En	Operand 1	Operand 2	Operand 3	Operand 4
FV	ModRM:reg (w)	ModRM:r/m (r)	Imm8	NA

Description

Perform reduction transformation of the packed binary encoded double-precision FP values in the source operand (the second operand) and store the reduced results in binary FP format to the destination operand (the first operand) under the writemask k1.

The reduction transformation subtracts the integer part and the leading M fractional bits from the binary FP source value, where M is a unsigned integer specified by imm8[7:4], see Figure 5-28. Specifically, the reduction transfor-mation can be expressed as:

dest = src - (ROUND(2^M \esc{*}src))\esc{*}2^-M ;

where "Round()" treats "src", "2^M ", and their product as binary FP numbers with normalized significand and bi-ased exponents.

The magnitude of the reduced result can be expressed by considering src= 2^p \esc{*}man2,

where 'man2' is the normalized significand and 'p' is the unbiased exponent

Then if RC = RNE: 0<=|Reduced Result|<=2^p-M-1

Then if RC -> RNE: 0<=|Reduced Result|<2^p-M

This instruction might end up with a precision exception set. However, in case of SPE set (i.e. Suppress Precision Exception, which is imm8[3]=1), no precision exception is reported.

EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Figure 5-28. Imm8 Controls for VREDUCEPD/SD/PS/SS

Handling of special case of input values are listed in Table 5-21.

Table 5-21. VREDUCEPD/SD/PS/SS Special Cases

* Round control = (imm8.MS1)? MXCSR.RC: imm8.RC

||Round Mode|Returned value | ||--------------|-------------------| ||Src1| < 2^-M-1|RNE|Src1| ||RPI, Src1 > 0|Round (Src1-2^-M ) *| ||RPI, Src1 " 0|Src1| ||RNI, Src1 0|Src1|

	Src1	< 2^-M	RNI, Src1 < 0	Round (Src1+2^-M ) *
-0.0	Src1 = $\pm$INF	any	+0.0	Src1= $\pm$NAN
n/a	QNaN(Src1)

Operation

VREDUCEPD

(KL, VL) = (2, 128), (4, 256), (8, 512)
FOR j <-  0 TO KL-1
    i <-  j * 64
    IF k1[j] OR *no writemask* THEN
                IF (EVEX.b == 1) AND (SRC *is memory*)
                      THEN DEST[i+63:i] <-  ReduceArgumentDP(SRC[63:0], imm8[7:0]);
                      ELSE DEST[i+63:i] <-  ReduceArgumentDP(SRC[i+63:i], imm8[7:0]);
                FI;
    ELSE 
          IF *merging-masking* ; merging-masking
                THEN *DEST[i+63:i] remains unchanged*
                ELSE  ; zeroing-masking
                      DEST[i+63:i] = 0
          FI;
    FI;
ENDFOR;
DEST[MAX_VL-1:VL] <-  0

Intel C/C++ Compiler Intrinsic Equivalent

VREDUCEPD __m512d _mm512_mask_reduce_pd(__m512d a, int imm,
                                        int sae) VREDUCEPD __m512d
    _mm512_mask_reduce_pd(__m512d s, __mmask8 k, __m512d a, int imm,
                          int sae) VREDUCEPD __m512d
    _mm512_maskz_reduce_pd(__mmask8 k, __m512d a, int imm,
                           int sae) VREDUCEPD __m256d
    _mm256_mask_reduce_pd(__m256d a, int imm) VREDUCEPD __m256d
    _mm256_mask_reduce_pd(__m256d s, __mmask8 k, __m256d a,
                          int imm) VREDUCEPD __m256d
    _mm256_maskz_reduce_pd(__mmask8 k, __m256d a, int imm) VREDUCEPD __m128d
    _mm_mask_reduce_pd(__m128d a, int imm) VREDUCEPD __m128d
    _mm_mask_reduce_pd(__m128d s, __mmask8 k, __m128d a,
                       int imm) VREDUCEPD __m128d
    _mm_maskz_reduce_pd(__mmask8 k, __m128d a, int imm)

SIMD Floating-Point Exceptions

Invalid, Precision

If SPE is enabled, precision exception is not reported (regardless of MXCSR exception mask).

Other Exceptions

See Exceptions Type E2, additionally

#UD If EVEX.vvvv != 1111B.

첫 댓글을 달아주세요!

강좌에 관련 없이 궁금한 내용은 여기를 사용해주세요

또는 직접 입력하세요 (댓글 수정시 비밀번호가 필요합니다)

댓글을 불러오는 중입니다..

모두의 코드 VREDUCEPD (Intel x86/64 assembly instruction)

VREDUCEPD

Instruction Operand Encoding

Description

Table 5-21. VREDUCEPD/SD/PS/SS Special Cases

Operation

VREDUCEPD

Intel C/C++ Compiler Intrinsic Equivalent

SIMD Floating-Point Exceptions

Other Exceptions

모두의 코드
VREDUCEPD (Intel x86/64 assembly instruction)