What is the longest common subsequence problem?

IONOS editorial team2023-10-139 mins

Contents

Longest common subsequences (LCS) are a typical problem in IT . Approaches to solve the LCS problem often appear in interviews for programmers and algorithm courses.

What is the longest common subsequence?

The aim is to find the longest common subsequence in two sequences. This is where a subsequence is derived from a sequence. The subsequence has the same elemental order, in some instances when elements are removed. Let’s take a look at some examples to get a better idea of the principle:

Sequence `X`	Sequence `Y`	`LCS(X, Y)`
`FATHER`	`MATTER`	`ATER`
`MOTHER`	`HERBAL`	`HER`
`DAVID`	`DANIEL`	`DAI`

Note

There is an important difference between the longest common subsequence and the longest common substring being that a substring may not have any gaps. In the example of “DAVID” and “DANIEL” the longest common substring would be “DA” since “I” is separated by “V” and “N”.

Are there any practical examples of the LCS problem?

The longest common subsequence problem is an important issue in all areas in which sequences which stem from each other are used. There are certain ways to find LCS to see whether there are similarities or differences and, in turn, discover any plagiarism. The well-known “diff” utility which checks for changes to source text files is based on the LCS problem.

In bioinformatics the longest common subsequence problem is often used when analyzing DNA sequences. DNA bases change in certain positions over time due to mutations. The presence of a long common subsequence in two sequences would suggest a genetic relationship. This can allow us to retrace evolution between species thanks to genetic development.

Linguists can also use the longest common subsequence problem to research linguistic development over centuries. If you have two words from different languages which have the same meaning and a long common subsequence, it suggests that they have the same root and etymology. This would be considered, then, a similar historical development.

What are the solutions to the longest common subsequence problem?

First of all, you can solve the LCS problem with a naïve approach. This is a simple approach without using any optimizations or special methods. To do so you simply compute all subsequences from two sequences and find the longest common subsequence. Unfortunately, this approach isn’t very efficient and is, therefore, only suitable for short sequences.

Below you will find three efficient solutions to the LCS problem:

Recursive approach
Optimization through memoization
Dynamic programming

What all approaches have in common is that in regard to the two sequences, they differ in three cases:

The last letter is the same.
The last letter is not the same.
One of the sequences’ length is zero.

The approaches each differ in their time complexity (asymptotic run time) and space complexity (memory use):

Approach	Run time	Memory
Naive approach	`O(n * n²)`	`O(n)`
Recursive approach	`O(n²)`	`O(1)`
Optimization via memoization	`O(n *m)`	`O(n* m)`
Dynamic programming	`O(n *m)`	`O(n* m)`

Note

The algorithms set out below all compute the length of the longest common subsequence. There are, if applicable, multiple set subsequences of this length which can be determined using other steps.

Recursive determination of the longest common subsequence

If you look at the LCS problem, you can see that it has an “optimal substructure”. This means that the problem can be reduced to subproblems. As a solution you can use a recursive approach. Below you can find some examples of algorithms to implement this in three of the most common programming languages.

Python

def lcs(X, Y, m, n):
    # Base case
    if m == 0 or n == 0:
        return 0
    # Last elements are equal
    elif X[m-1] == Y[n-1]:
        return 1 + lcs(X, Y, m-1, n-1)
    # Last elements differ
    else:
        return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n))
# Let's test
X, Y = "DANIEL", "DAVID"
lcs_len = lcs(X, Y, len(X), len(Y))
print(f"Length of LCS is: {lcs_len}")

python

Java

import java.io.*;
class LCS {
    public static int lcs(String A, String B, int m, int n)
    {
        // Base case
        if (m == 0 || n == 0)
            return 0;
        // Last elements are equal
        if (A.charAt(m - 1) == B.charAt(n - 1))
            return 1 + lcs(A, B, m - 1, n - 1);
        // Last elements differ
        else
            return Math.max(lcs(A, B, m, n - 1),
             lcs(A, B, m - 1, n));
    }
    
    // Let's test
    public static void main(String[] args)
        {
            String X = "DAVID";
            String Y = "DANIEL";
            int lcsLength = LCS.lcs(X, Y, X.length(), Y.length());
            System.out.println("Length of LCS is: " + lcsLength);
        }
}

java

$1 Domain Names – Register yours today!

Simple registration
Premium TLDs at great prices
24/7 personal consultant included
Free privacy protection for eligible domains

C++

#include <iostream>
using namespace std;
int lcs(string X, string Y, int m, int n)
{
    // Base case
    if (m == 0 || n == 0) {
        return 0;
    }
    // Last elements are equal
    if (X[m - 1] == Y[n - 1]) {
        return 1 + lcs(X, Y, m - 1, n - 1);
    }
    // Last elements differ
    else {
        return max(lcs(X, Y, m, n - 1), lcs(X, Y, m - 1, n));
    }
}
// Let's test
int main()
{
    // Initialize variables
    string X = "DAVID";
    string Y = "DANIEL";
    // Compute and output length of LCS
    cout << "Length of LCS is " << lcs(X, Y, X.size(), Y.size());
    return 0;
}

c++

Optimizing the recursive approach using memoization

When looking again at the recursive approach, you can see that the overlapping parts are computed. These characteristics, called “overlapping subproblems” are known from Fibonacci sequences. In this case as well recursive sequences are constantly computed for a solution. To make the process more efficient, it’s worthwhile using memoization. In other words, you can cache the subproblems that have already been computed in a two-dimensional matrix.

Python

def lcs(X, Y, m, n, table):
    
    # Base case
    if (m == 0 or n == 0):
        return 0
    # Already computed value at given position
    if (table[m][n] != -1):
        return table[m][n]
    # Last elements are equal
    if X[m - 1] == Y[n - 1]:
        table[m][n] = 1 + lcs(X, Y, m - 1, n - 1, table)
    # Last elements differ
    else:
        table[m][n] = max(lcs(X, Y, m, n - 1, table), lcs(X, Y, m - 1, n, table))
    return table[m][n]
# Let's test
X = "DAVID"
Y = "DANIEL"
m, n = len(X), len(Y)
# Initialize table fields to `-1`
table = [[-1 for i in range(n + 1)] for j in range(m + 1)]
// Compute and output length of LCS
print(f"Length of LCS is: {lcs(X, Y, m, n, table)}")

python

Java

import java.io.*;
class LCS {
    public static int lcs(String X, String Y, int m, int n, int[][] table) {
        // Base case
        if (m == 0 || n == 0) {
            return 0;
        }
        // Already computed value at given position
        if (table[m][n] != -1) {
            return table[m][n];
        }
        // Last elements are equal
        if(X.charAt(m - 1) == Y.charAt(n - 1)) {
            table[m][n] = 1 + lcs(X, Y, m - 1, n - 1, table);
            return table[m][n];
        }
        // Last elements differ
        else {
            table[m][n] = Math.max(lcs(X, Y, m, n - 1, table),lcs(X, Y, m - 1, n, table));
            return table[m][n];
        }
    }
    // Let's test
    public static void main(String args[]){
        // Initialize variables
        String X = "DAVID";
        String Y = "DANIEL";
        int m = X.length();
        int n = Y.length();
        int[][] table = new int[m + 1][n + 1];
        
        // Initialize table fields to `-1`
        for(int i=0; i < m + 1; i++) {
            for(int j=0; j < n + 1; j++) {
                table[i][j] = -1;
            }
        }
        // Compute and output length of LCS
        int lcsLength = lcs(X, Y, m, n, table);
        System.out.println("Length of LCS is: " + lcsLength);
    }
}

java

C++

#include <bits/stdc++.h>
using namespace std;
int lcs(char *X, char* Y, int m, int n, vector<vector<int>>& table)
{
    // Base case
    if (m == 0 || n == 0)
        return 0;
    // Already computed value at given position
    if (table[m][n] != -1) {
        return table[m][n];
    }
    // Last elements are equal
    if (X[m - 1] == Y[n - 1]) {
        table[m][n] = 1 + lcs(X, Y, m - 1, n - 1, table);
        return table[m][n];
    }
    // Last elements differ
    else { 
        table[m][n] = max(lcs(X, Y, m, n - 1, table), lcs(X, Y, m - 1, n, table));
        return table;
    }
}
// Let's test
int main()
{
    // Initialize variables
    char X[] = "DAVID";
    char Y[] = "DANIEL";
    int m = strlen(X);
    int n = strlen(Y);
    // Initialize table with `-1`
    vector<vector<int>> table(m + 1, vector<int>(n + 1, -1));
    
    // Compute and output length of LCS
    cout << "Length of LCS is: " << lcs(X, Y, m, n, table);
    return 0;
}

c++

Dynamic programming for the longest common subsequence

Dynamic programming is a non-recursive way to solve optimization problems by dividing them into smaller subproblems and then solving them “bottom up” Among other things, dynamic programming is applied as a solution for pathfinding algorithms. The longest common subsequence problem can also be solved using dynamic programming when using a two-dimensional matrix.

Python

def lcs(X , Y, m, n): 
    
    # Initialize dynamic programming table fields to `None`
    table = [[None] * (n + 1) for _ in range(m + 1)]
    
    # Compute table values
    for i in range(m + 1):
        for j in range(n + 1):
            # Base case
            if i == 0 or j == 0 :
                table[i][j] = 0
            # Last elements are equal
            elif X[i - 1] == Y[j - 1]:
                table[i][j] = table[i - 1][j - 1]+ 1
            # Last elements differ
            else:
                table[i][j] = max(table[i - 1][j] , table[i][j - 1])
    
    return table[m][n]
# Let's test
X = "DAVID"
Y = "DANIEL"
# Compute and output length of LCS
lcs_len = lcs(X, Y, len(X), len(Y))
print(f"Length of LCS is: {lcs_len}")

python

Java

import java.io.*;
class LCS {
    public static int lcs(String X, String Y, int m, int n)
    {
        // Initialize dynamic programming table fields
        int table[][] = new int[m + 1][n + 1];
        for (int i = 0; i <= m; i++) {
            for (int j = 0; j <= n; j++) {
                // Base case
                if (i == 0 || j == 0)
                    table[i][j] = 0;
                // Last elements are equal
                else if (X.charAt(i - 1) == Y.charAt(j - 1))
                    table[i][j] = table[i - 1][j - 1] + 1;
                // Last elements differ
                else
                    table[i][j] = Math.max(table[i - 1][j], table[i][j - 1]);
            }
        }
        return table[m][n];
    }
    // Let's test
    public static void main(String args[]){
        // Initialize variables
        String X = "DAVID";
        String Y = "DANIEL";
        int m = X.length();
        int n = Y.length();
        // Compute and output length of LCS
        int lcsLength = lcs(X, Y, m, n);
        System.out.println("Length of LCS is: " + lcsLength);
    }
}

java

C++

#include <bits/stdc++.h>
using namespace std;
int lcs(string X, string Y, int m, int n) {
	// Initialize dynamic programming table
	int table[m + 1][n + 1];
	// Compute table values
	for (int i = 0; i <= m; i++) {
		for (int j = 0; j <= n; j++) {
			// Base case
			if (i == 0 || j == 0)
				table[i][j] = 0;
			// Last elements are equal
			else if (X[i - 1] == Y[j - 1])
				table[i][j] = table[i - 1][j - 1] + 1;
			// Last elements differ
			else
				table[i][j] = max(table[i - 1][j], table[i][j - 1]);
		}
	}
	return table[m][n];
}
// Let's test
int main() {
  // Initialize variables
  string X = "DAVID";
  string Y = "DANIEL";
  int m = X.size();
  int n = Y.size();
  // Compute and output length of LCS
  cout << "Length of LCS is " << lcs(X, Y, m, n);
  return 0;
}

c++

Was this article helpful?

How to use Python operators

You’re probably already familiar with arithmetic operators from your school days. Plus, minus, multiply, and divide are all symbols for mathematical operations. A programming language like Python knows many more operators. It’s not only numbers which can be processed, but also…

Tutorials
Python

Kunst BilderShutterstock

What is the layer 8 issue?

When problems occur with a system or network due to user error, IT professionals sometimes call it a layer 8 issue. This is actually an in-joke in the industry. Layer 8 refers to the OSI model, which actually only has seven layers. In our article, we’ll tell you about the Layer 8…

Security

ra2 studioShutterstock

Which programming language should you learn? A practical guide

Expertise in programming languages determines career opportunities, lucrative jobs and regular assignments. The number of programming languages continues to increase but classics like Java or Python aren’t always the best choice. So which programming language should you learn? We…

PHP
HTML
JavaScript