Faster solver for Sylvester-like systems

Preliminaries

The operators of a tensor can provide a wide lens into its structure. In particular, the operators can distinguish two tensors under the equivalence relation of change of bases. Throughout, we suppose tt is a (d×d×d)(d\times d\times d)-grid of elements of some field KK. For 1i,j,kd1\leq i,j,k\leq d, we write tijkKt_{ij}^k\in K for the (i,j,k)(i,j,k)-entry of tt. Then tt can be interpreted as a KK-multilinear map V×VVV\times V\rightarrowtail V, where V=KdV=K^d. Taking this interpretation a bit further, we can evaluate tt at (u,v)V×V(u, v)\in V\times V, and we write tu,vV\langle t \mid u, v\rangle \in V for this evaluation.

The ring of adjoint operators of tt is the KK-algebra

Adj(t)={(X,Y)Md(K)×Md(K) | u,vV,  tuX,v=tu,Yv},\begin{aligned} \mathrm{Adj}(t) &= \left\{(X, Y) \in \mathbb{M}_d(K)\times\mathbb{M}_d(K) ~\middle|~ \forall u,v \in V,\; \langle t \mid uX^\dagger, v\rangle = \langle t \mid u, Yv\rangle \right\}, \end{aligned}

where ^\dagger denotes the matrix transpose. There is a naive, super-quadratic-time algorithm to compute the adjoint operators of tt.

Lemma. 1 There exists a linear system of 2d22d^2 variables and d3d^3 equations whose kernel determines a basis for Adj(t)\mathrm{Adj}(t). This is solved using O(d7)O(d^7) field operations.

This computation is the bottleneck of many computations aimed at the isomorphism problem for tensors. In some nice cases, we can construct a basis for Adj(t)\mathrm{Adj}(t) in nearly linear time. Before we discuss special cases, we need to discuss special cases of slicing tensors. These have general descriptions for general tensors, but we keep the discussion to what is necessary for bilinear maps.

For 1i,jd1\leq i,j\leq d, a 11-slice of tt is the tensor t,ej:VV\langle t\mid\cdot, e_j \rangle : V \rightarrow V, and a 22-slice of tt is tei,:VV\langle t\mid e_i, \cdot \rangle : V \rightarrow V. A 00-slice is slightly different, resulting in a bilinear form V×VKV\times V\rightarrowtail K. These \ell-slices correspond to matrices in the (d×d×d)(d\times d\times d)-grid. For example, a 22-slice for i=1i=1 corresponds to the matrix whose (j,k)(j,k)-entry is t1jkt_{1j}^k for 1j,kd1\leq j,k\leq d.

For 1kd1\leq k\leq d, denote the 00-slice of tt at eke_k by tkt^k, so tk:V×VK\langle t^k\mid : V\times V\rightarrowtail K. The defining equations for Adj(t)\mathrm{Adj}(t) can be divided into dd different equations based on the 00-slices.

tuX,v=tu,Yv(k)tkuX,v=tku,Yv\begin{aligned} \langle t \mid uX^\dagger, v\rangle &= \langle t | u, Yv\rangle &\Longleftrightarrow & & (\forall k) \quad \langle t^k \mid uX^\dagger, v\rangle &= \langle t^k \mid u, Yv\rangle \end{aligned}

If TkT^k is the corresponding matrix from the 00-slice tkt^k, then each of these dd equations can be represented as a Sylvester-like system:

tkuX,v=tku,YvXTkTkY=0.\begin{aligned} \langle t^k \mid uX^\dagger, v\rangle &= \langle t^k \mid u, Yv\rangle & \Longleftrightarrow & & XT^k - T^kY &= 0. \end{aligned}

The Sylvester equation determines TkT^k given the matrices XX and YY.

Theorem 1. [Maglione–Wilson (2018)] There exists an algorithm that, given a (d×d×d)(d\times d\times d)-grid with an invertible 11- and 22-slice, constructs a basis for Adj(t)\mathrm{Adj}(t) using O(d4)O(d^4) field operations.

The key idea of Theorem 1 is to consider the structure of the linear system in Lemma 1 in terms of slices. Because of the invertibility conditions, we can clear most of the entries in the system. What remains is a homogeneous system whose solution is given by a formula in terms of certain slices.

Problems

We propose the following questions as a gradual approach to more general solutions. We think the approach taken in Theorem 1 might lead to an affirmative answer to the first question.

Problem 1. Does there exist a sub-quadratic-time or quadratic-time algorithm to construct a basis for Adj(t)\mathrm{Adj}(t), given a (d×d×d)(d\times d\times d) tensor tt?

We suspect that an affirmative answer to Open Problem 1 would apply immediately to general tensors and general nuclei, improving the computational complexity in all cases. In addition, computations of centroids should also benefit from such an affirmative answer as they are, in some sense, the intersection of nuclei. It is not clear if such a solution would apply to derivations. The derivation algebra of a (d×d×d)(d\times d\times d)-grid is

Der(t)={(X,Y,Z)Md(K)3 | u,vV,tuX,v+tu,vY=tu,vZ}.\begin{aligned} \mathrm{Der}(t) &= \left\{ (X, Y, Z) \in \mathbb{M}_d(K)^{\oplus 3} ~\middle|~ \forall u, v\in V,\right. \\ & \qquad\qquad\qquad \left. \langle t | uX, v\rangle + \langle t \mid u, vY\rangle = \langle t \mid u, v\rangle Z \right\}. \end{aligned}

The proof of Lemma 1 applies to Der(t)\mathrm{Der}(t), and so computing Der(t)\mathrm{Der}(t) has the same complexity.

Problem 2 Does there exist a sub-quadratic-time or quadratic-time algorithm to construct a basis for Der(t)\mathrm{Der}(t), given a (d×d×d)(d\times d\times d) tensor tt?

An affirmative answer to Open Problem 2 implies an affirmative answer to Open Problem 2. Assuming such an answer extends to general tensors, this implies that every operator set of a tensor tt whose annihilator is generated by linear polynomials can be constructed more efficiently than the naive approach.

Additional Questions?

If you have additional questions, feel free to reach out to a maintainer / contributor on the contact page.


Shrunk subspaces of alternating matrix tuples

Let p3p\geq 3 be a prime. Let Λ(n,p)\Lambda(n,p) be the set of alternating matrices over...

Min-Rank

Takunari Miyazaki (Trinity College, Har...

Tensors
Algebras
Filters
Solvers
Identity
Resources