3 ways to sort a list unique

3 ways to sort a list unique

Written by Christian Ritter on Mar 20th, 2023 Views Report Post

Today I would like to show you a performance table comparing different ways to sort a list/array unique.

Sometimes it is necessary to sort a list or an array unique to get rid of duplicates this can be a time consuming task


In this post we will have a look at 3 ways to sort a list unique.

  • Sort-Object -Unique
  • Get-Unique
  • HashSet-Class

First we will create 3 different lists containing random strings in several sizes (small, medium, large)

#List elements
$ListOptionA ="Blue","Red","Green"
$ListOptionb ="Dog","Horse","Cat"

#Create a small set of strings based on list elemtents and a random number
$ListSmall = (0..100).ForEach({
 "$($ListOptionA[$(Get-Random -Minimum 0 -Maximum ($ListOptionA.count-1))])_$($ListOptionB[$(Get-Random -Minimum 0 -Maximum ($ListOptionB.count-1))])_$(Get-Random -Maximum 10 -Minimum 0)"
})

#Create a medium set of strings based on list elemtents and a random number
$ListMedium = (0..10000).ForEach({
 "$($ListOptionA[$(Get-Random -Minimum 0 -Maximum ($ListOptionA.count-1))])_$($ListOptionB[$(Get-Random -Minimum 0 -Maximum ($ListOptionB.count-1))])_$(Get-Random -Maximum 10 -Minimum 0)"
})

#Create a large set of strings based on list elemtents and a random number
$ListLarge = (0..1000000).ForEach({
 "$($ListOptionA[$(Get-Random -Minimum 0 -Maximum ($ListOptionA.count-1))])_$($ListOptionB[$(Get-Random -Minimum 0 -Maximum ($ListOptionB.count-1))])_$(Get-Random -Maximum 10 -Minimum 0)"
})

Now we can start to fetch results:

$Results = New-Object -TypeName System.Collections.Generic.List[PSCustomObject]

$ListOptions = "Small","Medium","Large"
$Method = "Sort-Object -Unique"
$Index = 0
($ListSmall,$ListMedium,$ListLarge).ForEach({
    $StopWatch = New-Object System.Diagnostics.Stopwatch
    $StopWatch.Start()
    $UniqueList = $($_ | Sort-Object -Unique)
    $StopWatch.Stop()
    $Results.Add([PSCustomObject]@{
        MethodName = $Method
        ListSize = "$($ListOptions[$Index]) $($_.Count)"
        Result = $UniqueList.count
        TimeElapsed = $StopWatch.Elapsed
        TimeElapsedMS = $StopWatch.ElapsedMilliseconds
    })
    $Index++
})
$Method = "get-unique"
$Index = 0
($ListSmall,$ListMedium,$ListLarge).ForEach({
    $StopWatch = New-Object System.Diagnostics.Stopwatch
    $StopWatch.Start()
    $UniqueList = $($_ | Sort-Object | get-Unique)
    $StopWatch.Stop()
    $Results.Add([PSCustomObject]@{
        MethodName = $Method
        ListSize = "$($ListOptions[$Index]) $($_.Count)"
        Result = $UniqueList.count
        TimeElapsed = $StopWatch.Elapsed
        TimeElapsedMS = $StopWatch.ElapsedMilliseconds
    })
    $Index++
})
$Method = "Hashset"
$Index = 0
($ListSmall,$ListMedium,$ListLarge).ForEach({
    $StopWatch = New-Object System.Diagnostics.Stopwatch
    $StopWatch.Start()
    $HashSet = New-Object System.Collections.Generic.HashSet[string]
    foreach($Listelement in $_){
        $HashSet.Add($Listelement) | Out-Null
    }
    $StopWatch.Stop()
    $Results.Add([PSCustomObject]@{
        MethodName = $Method
        ListSize = "$($ListOptions[$Index]) $($_.Count)"
        Result = $HashSet.count
        TimeElapsed = $StopWatch.Elapsed
        TimeElapsedMS = $StopWatch.ElapsedMilliseconds
    })
    $Index++
})

The result from this run looks on my machine like this:

MethodName ListSize Result TimeElapsed TimeElapsedMS
Sort-Object -Unique Small 101 34 00:00:00.0003934 0
Sort-Object -Unique Medium 10001 40 00:00:00.0582319 58
Sort-Object -Unique Large 1000001 40 00:00:12.6371431 12637
get-unique Small 101 34 00:00:00.0005651 0
get-unique Medium 10001 40 00:00:00.0877467 87
get-unique Large 1000001 40 00:00:15.0103995 15010
Hashset Small 101 34 00:00:00.0050367 5
Hashset Medium 10001 40 00:00:00.0995172 99
Hashset Large 1000001 40 00:00:07.8959100 7895

Which conclusion can we get from this table above? At first not one of them is the best solution for any situation. We should choose Sort-Object -unique for lists from 0 up to 1000 elements. If the list increases dramatically we should choose the Hashset approach. Also we should not use get-unique, because to make this work we have to sort the list first and this is more time consuming as to use the plain sort-object method like you can see this in the result-table.

Best regards, Christian

Comments (0)