## Comparison of built-in functions and numerical methods

*PCA is an important tool for dimensionality reduction in data science and for computing grasp poses for robot manipulation from point cloud data. Because PCA is differentiable, it can also be used directly within larger machine learning frameworks. We derive a numerical implementation of PCA using two principal components of a point cloud for robotic grasping as an example. This will help you understand what PCA is and how it works.*

Principal component analysis (PCA) is widely used in data analysis and machine learning to reduce the dimensionality of datasets. The goal is to find a set of linearly uncorrelated (orthogonal) variables called . *Main component*, captures the maximum variance in the data. The first principal component represents the direction of maximum variance, the second principal component is orthogonal to the first and represents the direction of next highest variance, and so on. PCA is also used in the following places *robot operation* You can find the principal axis of the point cloud and use it to orient the gripper.

Mathematically, principal component orthogonality is achieved by finding the eigenvectors of *covariance matrix* of the original data. These eigenvectors form a set of orthogonal basis vectors, and the corresponding eigenvalues indicate the amount of variance captured by each principal component. The orthogonality property is essential to the interpretability and usefulness of principal components in reducing dimensionality and capturing key patterns in data.

As you know, *sample* *distributed* The value of a set of data points is a measure of the spread or dispersion of the values.For random variables *X* and *n* data points, sample distribution *Square meter* is calculated as follows.

place where entry is prohibited *X *is the average of the dataset. Dividing by n-1 instead of n is done to correct for bias in estimating population variance when using samples. This correction is known as the Bessel correction and helps provide an unbiased estimate of population variance based on sample data.

Assuming that the data points are represented by n-dimensional vectors, *X² *The result is an nxn matrix.This is known as *sample covariance matrix*. Therefore, the covariance matrix is defined as:

This allows you to implement this in Pytorch using the built-in PCA function.

`import torch`

from sklearn.decomposition import PCA

from sklearn.datasets import make_blobs

import matplotlib.pyplot as plt# Create synthetic data

data, labels = make_blobs(n_samples=100, n_features=2, centers=1,random_state=15)

# Convert NumPy array to PyTorch tensor

tensor_data = torch.from_numpy(data).float()

# Compute the mean of the data

mean = torch.mean(tensor_data, dim=0)

# Center the data by subtracting the mean

centered_data = tensor_data - mean

# Compute the covariance matrix

covariance_matrix = torch.mm(centered_data.t(), centered_data) / (tensor_data.size(0) - 1)

# Perform eigendecomposition of the covariance matrix

eigenvalues, eigenvectors = torch.linalg.eig(covariance_matrix)

# Sort eigenvalues and corresponding eigenvectors

sorted_indices = torch.argsort(eigenvalues.real, descending=True)

eigenvalues = eigenvalues[sorted_indices]

eigenvectors = eigenvectors[:, sorted_indices].to(centered_data.dtype)

# Pick the first two components

principal_components = eigenvectors[:, :2]*eigenvalues[:2].real

# Plot the original data with eigenvectors

plt.scatter(tensor_data[:, 0], tensor_data[:, 1], c=labels, cmap='viridis', label='Original Data')

plt.axis('equal')

plt.arrow(mean[0], mean[1], principal_components[0, 0], principal_components[1, 0], color='red', label='PC1')

plt.arrow(mean[0], mean[1], principal_components[0, 1], principal_components[1, 1], color='blue', label='PC2')

# Plot an ellipse around the Eigenvectors

angle = torch.atan2(principal_components[1,1],principal_components[0,1]).detach().numpy()/3.1415*180

ellipse=pat.Ellipse((mean[0].detach(), mean[1].detach()), 2*torch.norm(principal_components[:,1]), 2*torch.norm(principal_components[:,0]), angle=angle, fill=False)

ax = plt.gca()

ax.add_patch(ellipse)

plt.legend()

plt.show()

Note that the output of the PCA function is not necessarily ordered by the largest eigenvalue. We do this by using the real part of each eigenvalue and plot the resulting eigenvectors below. Also note that we scale each eigenvector by multiplying it by the real part of its eigenvalue.

It can be seen that the first principal component, the dominant eigenvector, is aligned with the long axis of the random point cloud, while the second eigenvector is aligned with the short axis. In the robot application, we can grab this object by rotating the gripper so that he is parallel to PC1 and closing along PC2. Of course this also works in 3D and you can also adjust the pitch of the gripper.

But what does the PCA function actually do? In practice, PCA is implemented using the following method: *singular value decomposition *Deterministic equation det(*a*−*λI*)=0 find the eigenvalues *λ *matrix* a *Therefore, computation of eigenvectors may be infeasible. Instead, we'll look at simple numerical methods for building intuition.

Power iteration is a simple and intuitive algorithm used to find the dominant eigenvectors and eigenvalues of a square matrix. This is especially useful in data science and machine learning contexts with large, high-dimensional datasets, when only the first few eigenvectors are needed.

Eigenvectors are nonzero vectors that, when multiplied by a square matrix, result in a scaled version of the same vector.In other words, if *a* is a square matrix, **v** If is a nonzero vector, then **v** is the eigenvector of *a* if scalar exists *λ* (called eigenvalues):

*a***v**=*λ***v**

here:

*a*is a square matrix.**v**are eigenvectors.*λ*is the eigenvalue associated with the eigenvector**v**.

In other words, a matrix is reduced to one number when multiplied by its eigenvectors. The power iteration method takes advantage of this by simply multiplying a random vector. *a* Repeat over and over again to normalize it.

The details of the power iteration method are as follows.

**Initialization**: Start with a random vector. This vector does not require any specific properties. It's just a starting point.**iterative process**:

– Multiply the matrix by the current vector. This will generate a new vector.

– Normalize the new vector to be unit length (divide by the norm).**convergence**:

– Repeat the iterative process for a certain number of iterations or until the vector converges (stops changing significantly).**result**:

– The last vector is the dominant eigenvector.

– The corresponding eigenvalues are obtained by multiplying the transpose of the eigenvector by the original matrix and eigenvector.

And here is the code:

`def power_iteration(A, num_iterations=100):`

# 1. Initialize a random vector

v = torch.randn(A.shape[1], 1, dtype=A.dtype)for _ in range(num_iterations):

# 2. Multiply matrix by current vector and normalize

v = torch.matmul(A, v)

v /= torch.norm(v)

# 3. Repeat

# Compute the dominant eigenvalue

eigenvalue = torch.matmul(torch.matmul(v.t(), A), v)

return eigenvalue.item(), v.squeeze()

# Example: find the dominant eigenvector and Eigenvalue of a covariance matrix

dominant_eigenvalue, dominant_eigenvector = power_iteration(covariance_matrix)

Essentially, the power iteration method utilizes the repeated application of a matrix to a vector to align the vector with the dominant eigenvector. This is a simple and computationally efficient method that is particularly useful for large datasets where direct computation of eigenvectors is not practical.

Power iterations find only the dominant eigenvectors and eigenvalues. To find the next eigenvector, you need to do the following: *make it flat* matrix. For this, we subtract the eigenvector contributions from the original matrix.

*a*'=*a*−*λ***vv^**t

or in code:

`def deflate_matrix(A, eigenvalue, eigenvector):`

deflated_matrix = A - eigenvalue * torch.matmul(eigenvector, eigenvector.t())

return deflated_matrix

The purpose of obtaining a contracted matrix is to facilitate subsequent iterative calculations of eigenvalues and eigenvectors. After finding the first eigenvalue-eigenvector pair, we can use the deflated matrix to find the next pair.

However, the results of this approach are not necessarily orthogonal. Gram-Schmidt orthogonalization is a process in linear algebra used to transform a set of linearly independent vectors into a set of orthogonal (perpendicular) vectors. This method is named after the mathematicians Jørgen Gramm and Erhard Schmidt and is particularly useful when dealing with non-orthogonal bases.

`def gram_schmidt(v, u):`

# Gram-Schmidt orthogonalization

projection = torch.matmul(u.t(), v) / torch.matmul(u.t(), u)

v = v.squeeze(-1) - projection * u

return v

- Input parameters:

–`v`

: Vector to be orthogonalized.

–`u`

: Vector to be orthogonalized`v`

. - Orthogonalization step:

–`torch.matmul(u.t(), v)`

: This calculates the dot product between:`u`

and`v`

.

–`torch.matmul(u.t(), u)`

: This calculates the dot product of:`u`

itself (squared norm of)`u`

).

–`projection = ... / ...`

: This computes the scalar projection of:`v`

to`u`

. The projection amount is`v`

it is in the direction of`u`

.`- v = v.squeeze(-1) - projection * u`

: This subtracts the projection of.`v`

to`u`

from`v`

.The result is a new vector`v`

it is orthogonal to`u`

. - return:

-function returns orthogonalized vector`v`

.

You can now add the Gram-Schmidt orthogonalization to the power law and compute both the dominant vector and the second eigenvector.

`# Power iteration to find dominant eigenvector`

dominant_eigenvalue, dominant_eigenvector = power_iteration(covariance_matrix)deflated_matrix = deflate_matrix(covariance_matrix, dominant_eigenvalue, dominant_eigenvector)

second_eigenvector = deflated_matrix[:, 0].view(-1, 1)

# Power iteration to find the second eigenvector

for _ in range(100):

second_eigenvector = torch.matmul(covariance_matrix, second_eigenvector)

second_eigenvector /= torch.norm(second_eigenvector)

# Orthogonalize with respect to the first eigenvector

second_eigenvector = gram_schmidt(second_eigenvector, dominant_eigenvector)

second_eigenvalue = torch.matmul(torch.matmul(second_eigenvector.t(), covariance_matrix), second_eigenvector)

Eigenvectors computed using either method can now be plotted side by side.

`# Scale Eigenvectors by Eigenvalues`

dominant_eigenvector *= dominant_eigenvalue

second_eigenvector *= second_eigenvalue# Plot the original data with eigenvectors computed either way

plt.scatter(tensor_data[:, 0], tensor_data[:, 1], c=labels, cmap='viridis', label='Original Data')

plt.axis('equal')

plt.arrow(mean[0], mean[1], principal_components[0, 0], principal_components[1, 0], color='red', label='PC1')

plt.arrow(mean[0], mean[1], principal_components[0, 1], principal_components[1, 1], color='blue', label='PC2')

plt.arrow(mean[0], mean[1], dominant_eigenvector[0], dominant_eigenvector[1], color='orange', label='approx1')

plt.arrow(mean[0], mean[1], second_eigenvector[0], second_eigenvector[1], color='purple', label='approx2')

plt.legend()

plt.show()

…The result should look like this:

Notice that the second eigenvector resulting from the power iteration (purple) is painted over the second eigenvector computed in the exact manner (blue). The dominant eigenvector from the power iteration (yellow) points in the opposite direction from the exact dominant eigenvector (red). This isn't too surprising since you start with a random vector during the power iterations and the result could have gone either way.

We presented a simple numerical method to derive the principal components of a dataset by calculating the eigenvectors of the covariance matrix and shed light on the properties of the eigenvectors and eigenvalues.