PyTorch documentation¶
PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
Features described in this documentation are classified by release status:
Stable: These features will be maintained long-term and there should generally be no major performance limitations or gaps in documentation. We also expect to maintain backwards compatibility (although breaking changes can happen and notice will be given one release ahead of time).
Beta: These features are tagged as Beta because the API may change based on user feedback, because the performance needs to improve, or because coverage across operators is not yet complete. For Beta features, we are committing to seeing the feature through to the Stable classification. We are not, however, committing to backwards compatibility.
Prototype: These features are typically not available as part of binary distributions like PyPI or Conda, except sometimes behind run-time flags, and are at an early stage for feedback and testing.
- CUDA Automatic Mixed Precision examples
- Autograd mechanics
- Broadcasting semantics
- CPU threading and TorchScript inference
- CUDA semantics
- Distributed Data Parallel
- Extending PyTorch
- Extending torch.func with autograd.Function
- Frequently Asked Questions
- Gradcheck mechanics
- HIP (ROCm) semantics
- Features for large-scale deployments
- Modules
- MPS backend
- Multiprocessing best practices
- Numerical accuracy
- Reproducibility
- Serialization semantics
- Windows FAQ
- torch.compile
- Getting Started
- PyTorch 2.0 Troubleshooting
- Frequently Asked Questions
- Technical Overview
- Guards Overview
- Custom Backends
- TorchDynamo APIs to control fine-grained tracing
- Profiling to understand torch.compile performance
- TorchInductor GPU Profiling
- TorchDynamo Deeper Dive
- CUDAGraph Trees
- PyTorch 2.0 Performance Dashboard
- torch.func interaction with torch.compile
- IRs
- Dynamic shapes
- Fake tensor
- torch._logging
- Writing Graph Transformations on ATen IR
- torch
- torch.nn
- Parameter
- UninitializedParameter
- UninitializedBuffer
- Containers
- Convolution Layers
- Pooling layers
- Padding Layers
- Non-linear Activations (weighted sum, nonlinearity)
- Non-linear Activations (other)
- Normalization Layers
- Recurrent Layers
- Transformer Layers
- Linear Layers
- Dropout Layers
- Sparse Layers
- Distance Functions
- Loss Functions
- Vision Layers
- Shuffle Layers
- DataParallel Layers (multi-GPU, distributed)
- Utilities
- Quantized Functions
- Lazy Modules Initialization
- torch.nn.functional
- torch.Tensor
- Tensor Attributes
- Tensor Views
- torch.amp
- torch.autograd
- torch.autograd.backward
- torch.autograd.grad
- Forward-mode Automatic Differentiation
- Functional higher level API
- Locally disabling gradient computation
- Default gradient layouts
- In-place operations on Tensors
- Variable (deprecated)
- Tensor autograd functions
- Function
- Context method mixins
- Numerical gradient checking
- Profiler
- Anomaly detection
- Autograd graph
- torch.library
- torch.cuda
- StreamContext
- torch.cuda.can_device_access_peer
- torch.cuda.current_blas_handle
- torch.cuda.current_device
- torch.cuda.current_stream
- torch.cuda.default_stream
- device
- torch.cuda.device_count
- device_of
- torch.cuda.get_arch_list
- torch.cuda.get_device_capability
- torch.cuda.get_device_name
- torch.cuda.get_device_properties
- torch.cuda.get_gencode_flags
- torch.cuda.get_sync_debug_mode
- torch.cuda.init
- torch.cuda.ipc_collect
- torch.cuda.is_available
- torch.cuda.is_initialized
- torch.cuda.memory_usage
- torch.cuda.set_device
- torch.cuda.set_stream
- torch.cuda.set_sync_debug_mode
- torch.cuda.stream
- torch.cuda.synchronize
- torch.cuda.utilization
- torch.cuda.temperature
- torch.cuda.power_draw
- torch.cuda.clock_rate
- torch.cuda.OutOfMemoryError
- Random Number Generator
- Communication collectives
- Streams and events
- Graphs (beta)
- Memory management
- NVIDIA Tools Extension (NVTX)
- Jiterator (beta)
- Stream Sanitizer (prototype)
- torch.mps
- torch.backends
- torch._export.export
- torch.distributed
- Backends
- Basics
- Initialization
- Post-Initialization
- Distributed Key-Value Store
- Groups
- Point-to-point communication
- Synchronous and asynchronous collective operations
- Collective functions
- Profiling Collective Communication
- Multi-GPU collective functions
- Third-party backends
- Launch utility
- Spawn utility
- Debugging
torch.distributed
applications - Logging
- torch.distributed.algorithms.join
- torch.distributed.elastic
- torch.distributed.fsdp
- torch.distributed.optim
- torch.distributed.tensor.parallel
parallelize_module()
RowwiseParallel
ColwiseParallel
PairwiseParallel
SequenceParallel
make_input_replicate_1d()
make_input_reshard_replicate()
make_input_shard_1d()
make_input_shard_1d_last_dim()
make_output_replicate_1d()
make_output_reshard_tensor()
make_output_shard_1d()
make_output_tensor()
TensorParallelMultiheadAttention
enable_2d_with_fsdp()
- torch.distributed.checkpoint
- torch.distributions
- Score function
- Pathwise derivative
- Distribution
- ExponentialFamily
- Bernoulli
- Beta
- Binomial
- Categorical
- Cauchy
- Chi2
- ContinuousBernoulli
- Dirichlet
- Exponential
- FisherSnedecor
- Gamma
- Geometric
- Gumbel
- HalfCauchy
- HalfNormal
- Independent
- Kumaraswamy
- LKJCholesky
- Laplace
- LogNormal
- LowRankMultivariateNormal
- MixtureSameFamily
- Multinomial
- MultivariateNormal
- NegativeBinomial
- Normal
- OneHotCategorical
- Pareto
- Poisson
- RelaxedBernoulli
- LogitRelaxedBernoulli
- RelaxedOneHotCategorical
- StudentT
- TransformedDistribution
- Uniform
- VonMises
- Weibull
- Wishart
- KL Divergence
- Transforms
- Constraints
- Constraint Registry
- torch.compiler
- torch.fft
- torch.func
- torch.futures
- torch.fx
- torch.hub
- torch.jit
- torch.linalg
- torch.monitor
- torch.signal
- torch.special
- torch.overrides
- torch.package
- torch.profiler
- torch.nn.init
- torch.onnx
- torch.onnx diagnostics
- torch.optim
- Complex Numbers
- DDP Communication Hooks
- Pipeline Parallelism
- Quantization
- Distributed RPC Framework
- torch.random
- torch.masked
- torch.nested
- torch.sparse
- torch.Storage
- torch.testing
- torch.utils.benchmark
- torch.utils.bottleneck
- torch.utils.checkpoint
- torch.utils.cpp_extension
- torch.utils.data
- torch.utils.jit
- torch.utils.dlpack
- torch.utils.mobile_optimizer
- torch.utils.model_zoo
- torch.utils.tensorboard
- Type Info
- Named Tensors
- Named Tensors operator coverage
- torch.__config__
- torch._logging