其中_prob_QK用于选取Q、K是非常模型核心,要认真读,贴一下公式: M ‾ ( qi, k )=m a xj{qikjTd} − 1 L k∑ j = 1 L k qikjTd \overline{M}_{(q_i,k)} = \mathop{max} \limits_{j} \{\frac{q_ik_j^{T}}{\sqrt{d}}\}-\frac{1}{L_{k}}\sum^{L_k}_{j=1}\frac{q_ik_j^{T}}{\sqrt{d}}M(qi,k)=jmax{dqikjT}−Lk1j=1∑LkdqikjT
train 24425val 3485test 6989iters:100, epoch:1| loss:0.4753647speed:5.8926s/iter; left time:26393.0550siters:200, epoch:1| loss:0.3887450speed:5.6093s/iter; left time:24563.0934siters:300, epoch:1| loss:0.3397639speed:5.6881s/iter; left time:24339.4008siters:400, epoch:1| loss:0.3773919speed:5.5947s/iter; left time:23380.1260siters:500, epoch:1| loss:0.3424160speed:5.8912s/iter; left time:24030.1962siters:600, epoch:1| loss:0.3589063speed:6.0372s/iter; left time:24021.9204siters:700, epoch:1| loss:0.3522923speed:5.2896s/iter; left time:20518.3927sEpoch:1 cost time:4319.718204259872Epoch:1, Steps:763| Train Loss:0.3825711 Vali Loss:0.4002144 Test Loss:0.3138740Validation loss decreased (inf -->0.400214).Saving model ...Updating learning rate to 0.0001iters:100, epoch:2| loss:0.3452260speed:12.8896s/iter; left time:47897.7932siters:200, epoch:2| loss:0.2782844speed:4.7867s/iter; left time:17308.6180siters:300, epoch:2| loss:0.2653053speed:4.7938s/iter; left time:16855.0160siters:400, epoch:2| loss:0.3157508speed:4.7083s/iter; left time:16083.5403siters:500, epoch:2| loss:0.3046930speed:4.7699s/iter; left time:15816.8855siters:600, epoch:2| loss:0.2360453speed:4.8311s/iter; left time:15536.9307siters:700, epoch:2| loss:0.2668953speed:4.7713s/iter; left time:14867.4169sEpoch:2 cost time:3644.3840498924255Epoch:2, Steps:763| Train Loss:0.2945577 Vali Loss:0.3963071 Test Loss:0.3274192Validation loss decreased (0.400214-->0.396307).Saving model ...Updating learning rate to 5e-05iters:100, epoch:3| loss:0.2556470speed:12.6569s/iter; left time:37375.7115siters:200, epoch:3| loss:0.2456252speed:4.7655s/iter; left time:13596.0810siters:300, epoch:3| loss:0.2562804speed:4.7336s/iter; left time:13031.4940siters:400, epoch:3| loss:0.2049552speed:4.7622s/iter; left time:12634.1883siters:500, epoch:3| loss:0.2604980speed:4.7524s/iter; left time:12132.7789siters:600, epoch:3| loss:0.2539216speed:4.7413s/iter; left time:11630.3915siters:700, epoch:3| loss:0.2098076speed:4.7394s/iter; left time:11151.7416sEpoch:3 cost time:3628.159082174301Epoch:3, Steps:763| Train Loss:0.2486252 Vali Loss:0.4155475 Test Loss:0.3301197EarlyStopping counter:1 out of 3Updating learning rate to 2.5e-05iters:100, epoch:4| loss:0.2175551speed:12.6253s/iter; left time:27649.4546siters:200, epoch:4| loss:0.2459734speed:4.7335s/iter; left time:9892.9213siters:300, epoch:4| loss:0.2354426speed:4.7546s/iter; left time:9461.6300siters:400, epoch:4| loss:0.2267139speed:4.7719s/iter; left time:9018.9749siters:500, epoch:4| loss:0.2379844speed:4.8038s/iter; left time:8598.7446siters:600, epoch:4| loss:0.2434178speed:4.7608s/iter; left time:8045.7994siters:700, epoch:4| loss:0.2231207speed:4.7765s/iter; left time:7594.6586sEpoch:4 cost time:3649.547614812851Epoch:4, Steps:763| Train Loss:0.2224283 Vali Loss:0.4230270 Test Loss:0.3334258EarlyStopping counter:2 out of 3Updating learning rate to 1.25e-05iters:100, epoch:5| loss:0.1837259speed:12.7564s/iter; left time:18203.3974siters:200, epoch:5| loss:0.1708880speed:4.7804s/iter; left time:6343.6200siters:300, epoch:5| loss:0.2529005speed:4.7426s/iter; left time:5819.1675siters:400, epoch:5| loss:0.2434390speed:4.7388s/iter; left time:5340.6568siters:500, epoch:5| loss:0.2078404speed:4.7515s/iter; left time:4879.7921siters:600, epoch:5| loss:0.2372987speed:4.7986s/iter; left time:4448.2748siters:700, epoch:5| loss:0.2022571speed:4.7718s/iter; left time:3946.2739sEpoch:5 cost time:3636.7107157707214Epoch:5, Steps:763| Train Loss:0.2088229 Vali Loss:0.4305894 Test Loss:0.3341273EarlyStopping counter:3 out of 3Early stopping>>>>>>>testing : informer_WTH_ftM_sl96_ll48_pl24_dm512_nh8_el2_dl1_df2048_atprob_fc5_ebtimeF_dtTrue_mxTrue_test_0<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<test 6989test shape:(218,32,24,12)(218,32,24,12)test shape:(6976,24,12)(6976,24,12)mse:0.3277873396873474, mae:0.3727897107601166Use CPU>>>>>>>start training : informer_WTH_ftM_sl96_ll48_pl24_dm512_nh8_el2_dl1_df2048_atprob_fc5_ebtimeF_dtTrue_mxTrue_test_1>>>>>>>>>>>>>>>>>>>>>>>>>>train 24425val 3485test 6989iters:100, epoch:1| loss:0.4508476speed:4.7396s/iter; left time:21228.7904siters:200, epoch:1| loss:0.3859568speed:4.7742s/iter; left time:20906.0895siters:300, epoch:1| loss:0.3749838speed:4.7690s/iter; left time:20406.5500siters:400, epoch:1| loss:0.3673764speed:4.8070s/iter; left time:20088.4627siters:500, epoch:1| loss:0.3068828speed:4.7643s/iter; left time:19433.6961siters:600, epoch:1| loss:0.4173551speed:4.7621s/iter; left time:18948.4516siters:700, epoch:1| loss:0.2720438speed:4.7609s/iter; left time:18467.4719sEpoch:1 cost time:3639.997560977936Epoch:1, Steps:763| Train Loss:0.3788956 Vali Loss:0.3947107 Test Loss:0.3116618Validation loss decreased (inf -->0.394711).Saving model ...Updating learning rate to 0.0001iters:100, epoch:2| loss:0.3547252speed:12.6113s/iter; left time:46863.7093siters:200, epoch:2| loss:0.3236437speed:4.7504s/iter; left time:17177.4475siters:300, epoch:2| loss:0.2898968speed:4.7720s/iter; left time:16778.2666siters:400, epoch:2| loss:0.3107039speed:4.7412s/iter; left time:16195.8892siters:500, epoch:2| loss:0.2816701speed:4.7244s/iter; left time:15666.2476siters:600, epoch:2| loss:0.2226012speed:4.7348s/iter; left time:15227.0618siters:700, epoch:2| loss:0.2239729speed:4.8806s/iter; left time:15208.0025sEpoch:2 cost time:3635.6160113811493Epoch:2, Steps:763| Train Loss:0.2962583 Vali Loss:0.4018708 Test Loss:0.3213752EarlyStopping counter:1 out of 3Updating learning rate to 5e-05iters:100, epoch:3| loss:0.2407307speed:12.5584s/iter; left time:37084.8281siters:200, epoch:3| loss:0.2294409speed:5.1105s/iter; left time:14580.3263siters:300, epoch:3| loss:0.3180184speed:5.9484s/iter; left time:16376.0364siters:400, epoch:3| loss:0.2101320speed:5.7987s/iter; left time:15384.0189siters:500, epoch:3| loss:0.2701742speed:5.5463s/iter; left time:14159.6749siters:600, epoch:3| loss:0.2391748speed:4.8338s/iter; left time:11857.4335siters:700, epoch:3| loss:0.2280931speed:4.7718s/iter; left time:11228.1147sEpoch:3 cost time:3975.2745430469513Epoch:3, Steps:763| Train Loss:0.2494072 Vali Loss:0.4189631 Test Loss:0.3308771EarlyStopping counter:2 out of 3Updating learning rate to 2.5e-05iters:100, epoch:4| loss:0.2260314speed:12.7037s/iter; left time:27821.0994siters:200, epoch:4| loss:0.2191769speed:4.7906s/iter; left time:10012.3575siters:300, epoch:4| loss:0.2044496speed:4.7498s/iter; left time:9452.0362siters:400, epoch:4| loss:0.2167130speed:4.7545s/iter; left time:8985.9758siters:500, epoch:4| loss:0.2340788speed:4.7329s/iter; left time:8471.8863siters:600, epoch:4| loss:0.2137127speed:4.7037s/iter; left time:7949.1748siters:700, epoch:4| loss:0.1899967speed:4.7049s/iter; left time:7480.8388sEpoch:4 cost time:3624.2080821990967Epoch:4, Steps:763| Train Loss:0.2222918 Vali Loss:0.4390603 Test Loss:0.3350959EarlyStopping counter:3 out of 3Early stopping>>>>>>>testing : informer_WTH_ftM_sl96_ll48_pl24_dm512_nh8_el2_dl1_df2048_atprob_fc5_ebtimeF_dtTrue_mxTrue_test_1<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<test 6989test shape:(218,32,24,12)(218,32,24,12)test shape:(6976,24,12)(6976,24,12)mse:0.3116863965988159, mae:0.36840054392814636