1. 02 8月, 2019 18 次提交
    • Hong Xu's avatar
      Migrate neg's CUDA implementation to ATen. (#23617) · b2f6e2bd
      Hong Xu 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/23617
      
      Doesn't seem to cause any performance regression. Performance difference
      in the benchmarks is negligible.
      
      Benchmark script:
      
      ```python
      import timeit
      
      for n, t in [(10, 100000),
                   (1000, 10000)]:
          print('a.neg() (a.numel() == {}) for {} times'.format(n, t))
          for device in ('cpu', 'cuda'):
              for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64', 'torch.float', 'torch.double') + (('torch.half',) if device == 'cuda' else ()):
                  print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
                  print(timeit.timeit(f'a.neg()\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.ones({n}, device="{device}", dtype={dtype})', number=t))
      ```
      
      Before:
      
      ```
      a.neg() (a.numel() == 10) for 100000 times
      device: cpu, dtype: torch.int8, 100000 times            2.5537249100016197
      device: cpu, dtype: torch.uint8, 100000 times           2.512518662999355
      device: cpu, dtype: torch.int16, 100000 times           2.548207502000878
      device: cpu, dtype: torch.int32, 100000 times           2.5974994509997487
      device: cpu, dtype: torch.int64, 100000 times           2.6533011499996064
      device: cpu, dtype: torch.float, 100000 times           2.6474813019995054
      device: cpu, dtype: torch.double, 100000 times          2.6949866009999823
      device: cuda, dtype: torch.int8, 100000 times           5.820120684998983
      device: cuda, dtype: torch.uint8, 100000 times          5.732108927997615
      device: cuda, dtype: torch.int16, 100000 times          5.791249125999457
      device: cuda, dtype: torch.int32, 100000 times          5.816761754998879
      device: cuda, dtype: torch.int64, 100000 times          5.935873205999087
      device: cuda, dtype: torch.float, 100000 times          6.276509613999224
      device: cuda, dtype: torch.double, 100000 times         6.122782447000645
      device: cuda, dtype: torch.half, 100000 times           6.161522764999972
      a.neg() (a.numel() == 1000) for 10000 times
      device: cpu, dtype: torch.int8, 10000 times             0.3766637519984215
      device: cpu, dtype: torch.uint8, 10000 times            0.37288786600038293
      device: cpu, dtype: torch.int16, 10000 times            0.3485262310023245
      device: cpu, dtype: torch.int32, 10000 times            0.41810554200128536
      device: cpu, dtype: torch.int64, 10000 times            0.5609612200023548
      device: cpu, dtype: torch.float, 10000 times            0.39054008099992643
      device: cpu, dtype: torch.double, 10000 times           0.4946578170020075
      device: cuda, dtype: torch.int8, 10000 times            0.5843639539998549
      device: cuda, dtype: torch.uint8, 10000 times           0.5780841570012853
      device: cuda, dtype: torch.int16, 10000 times           0.5819949180004187
      device: cuda, dtype: torch.int32, 10000 times           0.5827294059999986
      device: cuda, dtype: torch.int64, 10000 times           0.5861426519986708
      device: cuda, dtype: torch.float, 10000 times           0.5929420489992481
      device: cuda, dtype: torch.double, 10000 times          0.594638443999429
      device: cuda, dtype: torch.half, 10000 times            0.5903799709994928
      ```
      
      After:
      
      ```
      a.neg() (a.numel() == 10) for 100000 times
      device: cpu, dtype: torch.int8, 100000 times            2.4983287129980454
      device: cpu, dtype: torch.uint8, 100000 times           2.479393904999597
      device: cpu, dtype: torch.int16, 100000 times           2.5382055320005747
      device: cpu, dtype: torch.int32, 100000 times           2.5587980189993687
      device: cpu, dtype: torch.int64, 100000 times           2.637738788002025
      device: cpu, dtype: torch.float, 100000 times           2.602799075997609
      device: cpu, dtype: torch.double, 100000 times          2.6648931070012623
      device: cuda, dtype: torch.int8, 100000 times           5.793338211999071
      device: cuda, dtype: torch.uint8, 100000 times          5.782462584000314
      device: cuda, dtype: torch.int16, 100000 times          5.824340334998851
      device: cuda, dtype: torch.int32, 100000 times          5.851659068001027
      device: cuda, dtype: torch.int64, 100000 times          5.8898071570001775
      device: cuda, dtype: torch.float, 100000 times          5.913144636000652
      device: cuda, dtype: torch.double, 100000 times         5.963339805999567
      device: cuda, dtype: torch.half, 100000 times           5.87889370099947
      a.neg() (a.numel() == 1000) for 10000 times
      device: cpu, dtype: torch.int8, 10000 times             0.37244726499920944
      device: cpu, dtype: torch.uint8, 10000 times            0.36641623199830065
      device: cpu, dtype: torch.int16, 10000 times            0.3449854829996184
      device: cpu, dtype: torch.int32, 10000 times            0.4127863069988962
      device: cpu, dtype: torch.int64, 10000 times            0.5551902160004829
      device: cpu, dtype: torch.float, 10000 times            0.38593814199703047
      device: cpu, dtype: torch.double, 10000 times           0.48877579500185675
      device: cuda, dtype: torch.int8, 10000 times            0.5862828740027908
      device: cuda, dtype: torch.uint8, 10000 times           0.5836667540024791
      device: cuda, dtype: torch.int16, 10000 times           0.5918155769977602
      device: cuda, dtype: torch.int32, 10000 times           0.5961457039993547
      device: cuda, dtype: torch.int64, 10000 times           0.5963898690024507
      device: cuda, dtype: torch.float, 10000 times           0.5985483309996198
      device: cuda, dtype: torch.double, 10000 times          0.6027148480025062
      device: cuda, dtype: torch.half, 10000 times            0.5961164370019105
      ```
      
      Test Plan: Imported from OSS
      
      Differential Revision: D16617574
      
      Pulled By: ezyang
      
      fbshipit-source-id: c90aa410f6385ce94fe6b84ebeceffa5effd0267
      b2f6e2bd
    • Dmytro Dzhulgakov's avatar
      Adjust maintainers list (#23693) · acc5cedf
      Dmytro Dzhulgakov 提交于
      Summary:
      Adds new people and reorders sections to make more sense
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/23693
      
      Differential Revision: D16618230
      
      Pulled By: dzhulgakov
      
      fbshipit-source-id: 74191b50c6603309a9e6d14960b7c666eec6abdd
      acc5cedf
    • Owen Anderson's avatar
      Compress debug symbols when serializing TorchScript models. · d1e0a3dd
      Owen Anderson 提交于
      Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23659
      
      Differential Revision: D16603775
      
      fbshipit-source-id: f2912048bdee36b3bcaa779e801c61bfbb5f30e5
      d1e0a3dd
    • Nikolay Korovaiko's avatar
      Remove more uses of `DimensionedTensorType` · 3d15ee1b
      Nikolay Korovaiko 提交于
      Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23060
      
      Differential Revision: D16460391
      
      Pulled By: Krovatkin
      
      fbshipit-source-id: b50ee87d22ad18b8cbfff719b199ea876ef172f1
      3d15ee1b
    • James Reed's avatar
      fix conv2d · 3314d60a
      James Reed 提交于
      Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23690
      
      Test Plan: Imported from OSS
      
      Reviewed By: suo
      
      Differential Revision: D16610734
      
      Pulled By: jamesr66a
      
      fbshipit-source-id: e190174f11d1810e6f87e2df256543028e9154ef
      3314d60a
    • Hao Lu's avatar
      Support Copy Op · df8638b0
      Hao Lu 提交于
      Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23705
      
      Reviewed By: yinghai
      
      Differential Revision: D16354204
      
      fbshipit-source-id: 158b0ee556606c117e52bee875d3dc89cc944b5a
      df8638b0
    • Wanchao Liang's avatar
      Support nn.GRU in script · 9d2cc2c9
      Wanchao Liang 提交于
      Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23266
      
      Test Plan: Imported from OSS
      
      Differential Revision: D16466586
      
      Pulled By: wanchaol
      
      fbshipit-source-id: 0f5b8013167bb7b246bd7e28d87a4a9e9c3b34d5
      9d2cc2c9
    • Mikhail Zolotukhin's avatar
      Reduce input sets for tests to speed them up. (#23692) · b22c88b8
      Mikhail Zolotukhin 提交于
      Summary:
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/23692
      
      Before tests took ~40s to finish, with this change it's ~2s.
      
      Test Plan: Imported from OSS
      
      Differential Revision: D16611479
      
      Pulled By: ZolotukhinM
      
      fbshipit-source-id: 391235483029d2ab860fcc4597ce84f4964025f1
      b22c88b8
    • svcscm's avatar
      Updating submodules · c91f2091
      svcscm 提交于
      Reviewed By: zpao
      
      fbshipit-source-id: ff6387055e7fa2cde88bd870081a05c3adbf56ef
      c91f2091
    • Tongzhou Wang's avatar
      Fix pin_memory_thread not exiting quickly (#23646) · 0539462c
      Tongzhou Wang 提交于
      Summary:
      fixes https://github.com/pytorch/pytorch/issues/23642
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/23646
      
      Differential Revision: D16600874
      
      Pulled By: soumith
      
      fbshipit-source-id: 50f0828d774a558d6f21e9dd21135906bd5be128
      0539462c
    • Vitaly Fedyunin's avatar
      Move addcmul to Aten (#22874) · 3b5daef6
      Vitaly Fedyunin 提交于
      Summary:
      Move CPU implementation of the `addcmul` operator to Aten ( https://github.com/pytorch/pytorch/issues/22797 )
      
      ### before
      
      ```python
      In [11]: timeit x.addcmul(a, b)
      1.31 ms ± 18.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
      ```
      
      ### after
      
      ```python
      In [9]: timeit x.addcmul(a, b)
      588 µs ± 22.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
      ```
      
      Adding custom code for the case when `value == 1`, doesn't provide significant performance gain.
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/22874
      
      Differential Revision: D16359348
      
      Pulled By: VitalyFedyunin
      
      fbshipit-source-id: 941ead835672fca78a1fcc762da052e64308b111
      3b5daef6
    • Soumith Chintala's avatar
      add setup metadata to help PyPI flesh out content on pypi package page (#22085) · dded794e
      Soumith Chintala 提交于
      Summary:
      add setup metadata to help PyPI flesh out content on pypi package page.
      
      Apparently this might help flesh out the "Used By" feature according to driazati
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/22085
      
      Differential Revision: D16604703
      
      Pulled By: soumith
      
      fbshipit-source-id: ddb4f7ba7c24fdf718260aed28cc7bc9afb46de9
      dded794e
    • Bram Wasti's avatar
      Add in-place check to AliasDb · ff3dd724
      Bram Wasti 提交于
      Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23210
      
      Test Plan: Imported from OSS
      
      Differential Revision: D16444529
      
      Pulled By: bwasti
      
      fbshipit-source-id: 83af54d423989a2a726158b521093660584ee9c2
      ff3dd724
    • Tongzhou Wang's avatar
      Slightly improve dataloader docs on when auto-batching is disabled (#23671) · 336c9be7
      Tongzhou Wang 提交于
      Summary:
      cc gchanan
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/23671
      
      Differential Revision: D16604387
      
      Pulled By: soumith
      
      fbshipit-source-id: 0ebc120bcaa0f6fa09158b1d0459a72ab11a53d6
      336c9be7
    • Rui Zhu's avatar
      Remove useless code from shape info · 7ac41b1c
      Rui Zhu 提交于
      Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23663
      
      Reviewed By: yinghai
      
      Differential Revision: D16592163
      
      fbshipit-source-id: de1482305abef45f7ef0e3e57b0c93cd2acac450
      7ac41b1c
    • Farhad Ramezanghorbani's avatar
      Adam/AdamW implementation minor fix (#22628) · fed5ca19
      Farhad Ramezanghorbani 提交于
      Summary:
      I have noticed a small discrepancy between theory and the implementation of AdamW and in general Adam. The epsilon in the denominator of the following Adam update should not be scaled by the bias correction [(Algorithm 2, L9-12)](https://arxiv.org/pdf/1711.05101.pdf). Only the running average of the gradient (_m_) and squared gradients (_v_) should be scaled by their corresponding bias corrections.
      
      ![adam_update](https://user-images.githubusercontent.com/13050245/60894105-11117f00-a230-11e9-9ba0-adad2ae2e0ae.png)
      
      In the current implementation, the epsilon is scaled by the square root of `bias_correction2`.  I have plotted this ratio as a function of step given `beta2 = 0.999` and `eps = 1e-8`. In the early steps of optimization, this ratio slightly deviates from theory (denoted by the horizontal red line).
      
      ![plot](https://user-images.githubusercontent.com/13050245/60893952-cabc2000-a22f-11e9-8dc2-6353ad5d674d.png)
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/22628
      
      Differential Revision: D16589914
      
      Pulled By: vincentqb
      
      fbshipit-source-id: 8791eb338236faea9457c0845ccfdba700e5f1e7
      fed5ca19
    • Jerry Zhang's avatar
      ConvBn2d/ConvBnReLU2d (#23357) · 6cf9ed4a
      Jerry Zhang 提交于
      Summary:
      Added _intrinsic.qat.ConvBn2d/_intrinsic.qat.ConvBnReLU2d.
      
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/23357
      ghstack-source-id: 87519573
      
      Differential Revision: D16295500
      
      fbshipit-source-id: 81e6d1d10d05bf6e343721fc5701d3d6bd7e07e6
      6cf9ed4a
    • Elias Ellison's avatar
      allow forward hooks in tracing (#23613) · 029c8e77
      Elias Ellison 提交于
      Summary:
      As far as I could tell forward hooks work out of the box, so allow them in the tracing. We don't have any way of supporting backward hooks though.
      
      Fixes https://github.com/pytorch/pytorch/issues/20862 and fixes https://github.com/pytorch/pytorch/issues/17571
      Pull Request resolved: https://github.com/pytorch/pytorch/pull/23613
      
      Differential Revision: D16601437
      
      Pulled By: eellison
      
      fbshipit-source-id: ecf5dc6201ca08b3b9afdb9fcdb0fda8741133a9
      029c8e77
  2. 01 8月, 2019 22 次提交