Coffee Space


Listen:

Super Permutations

Preview Image

Preview Image

People sometimes ask me where my brain goes when I check out - and the answer is that I like to work on small, well defined problems and attack them using different approaches. This could be tiny single-pass neural networks, small social media networks, small embedded device drivers, arbitrary encryption algorithms (creating and cracking), and sometimes super permutations.

What?

Super permutations is a simple problem to explain, but a hard problem to solve. Essentially you take all the permutations of a string and try to generate a minimal large string containing all permutations, where overlaps are allowed.

For example, the permutations of a symbol set of length 3 (i.e. {1, 2, 3}):

Depending on the algorithm you use, the minimal string length is: 123121321. If you look carefully, you will find each permutations represented in the set.

There are some generally open questions about the problem space:

  1. Minimal string -- We have no idea how to produce minimal strings without serious amounts of computation. The best method so far has been to express the problem as the Travelling Salesman Problem and to spend tonnes of computing power to finding minimal solutions.
  2. Minimal size -- We don't even know when we have found the minimal size - for all we know we find it the first time we look, maybe years. There is no method for determining exactly what the minimal length is.
  3. Lower bound -- We don't even know what the minimal bound is - there are estimations of course, but generally these are still yet to be proven.
  4. Upper bound -- We don't even know how large a solution may be. Again, there are estimations, but there is really no idea how long a solution may be.

You can compute lengths of 1, 2, 3 near instantly with brute force, 4 and 5 near instantly with tricks, and length 6 is still being computed. As it currently stands, unless something changes, it's predicted that a minimal solution for super permutations of length 7 will not be found.

Why?

Super permutations are interesting because they are simple. In a short time you could explain the problem to anybody, even a child - and yet the problem is hard to solve. Insight into solving the problem could literally come from anywhere - and I think this is what really makes the problem interesting.

In reality, it's really not clear what problems a solution could help to solve. Perhaps it could reveal deep insight into solving the Travelling Salesman Problem, or perhaps it reveals some ideas about better compression methods.

One real life problem space I have heard is a garage door that accepts a code, but it checks for the opening key using a sliding window. Any security based on sliding windows could potentially be susceptible to an attack generated by an efficient solution to this problem.

Insights

Here I am going to detail some insights I've had before now, in the hope that they may help somebody else.

Palindromes

These are a special case of super permutation, where they read the same left-right and right-left. It's suspected that these may not always produce the most efficient super permutations, but in theory should be the easiest to crack, given you only have to find half of the super permutation string.

I did generate a method for producing super permutation palindromes from 1-5 in a split second, by studying the patterns in the numbers generated. Unfortunately this shows multiple solutions for 5 (although recoverable) and fails for length 6 or more.

Repetition

As echoed by others, for every palindrome found so far, it seems to be the case that there are no repetitions in the strings produced. You might think "this is obvious", but actually it really is not. It's not yet been proven that this holds up past lengths of 6.

Search Reduction

Almost exclusively for palindromes, as you are building up the strings it allows you to say two things:

  1. The string or it's reverse should not be repeated in this half on the palindrome.
  2. The reverse of the string should appear in the other half of the palindrome.

These allow you to drastically reduce the search space.

Approaches

These are some approaches explored so far...

Brute Force

For this method we are checking every combination possible, starting from the smallest via string size predicted by the lower bound, up until the upper bound (which should not be met). This is of course very slow, so we make a few optimizations:

  • We skip any strings with symbol repeats - that sort of waste should never be seen in an optimal string.
  • We then check whether the combinations are all present within the string using a pre-calculated list of permutations. We quit the first time we are unable to find a permutation and check the proposal string a window at a time.

To give an idea of speed: For length 4 we can process approximately 3 million proposals a second using 8 cores on a i7-6700HQ.

Compression

Another method is to take all the combinations of the permutations and performing compression on the strings. By compression, I mean that we remove substrings that are already represented.

Whilst in theory this could be faster, in reality there may be issues as you have to pick one of two or more repeats to keep. The consequence is that it would affect subsequent compressions, meaning that you also need to test all combinations of compressions too.

TSP

The problem can be represented as the travelling salesman problem - by far the current most popular method is to use TSP solvers after representing the problem in the appropriate way. Whilst this has yielded minimal strings of length 7, this method does not seem feasible for 8 or higher and also doesn't really add greater insight into the problem space.

Pattern

This method is near instant for values less than 6 and fails at 6 or more symbols. This is a problem I am currently working on solving for palindromes, as it seems like the most viable approach to getting near minimal strings. At least for strings up to 5 in length, strings are generated near instantly.

Hunches

These are just a series of unsupported hunches about the problem space:

  1. Patterns -- The solution will likely be some form of low-computation pattern. I think that once we crack lengths up to 10, these patterns may be more obvious.
  2. 1 year -- I suspect some advancements in the space will happen in the next year. It's been relatively quiet in 2020 and I think we're overdue some new insight.
  3. Hardware -- I believe it's only a matter of time we see somebody implement the problem in something like an FPGA, where we should be able to get much closer to clock speed checking of proposed super permutation strings.