Threat group attribution with open-source datasets

Case study: Reading between the lines of a press release.

You're a threat analyst doing their job, reading the news — it's December 13, 2020. FireEye just announced that they identified the SolarWinds Orion solution being used by Russia as a backdoor initial access into the systems of all types of governments and organizations. The full scope of the breach is still unclear; 100s, 1000s, and probably more are impacted.

Fast forward — It's now late January of 2021. Chaos in the media has ensued over recent weeks, many individuals and even companies are claiming attribution to specific Russian groups using flimsy proof and no hard evidence. Finally, US officials share that it was "an Advanced Persistent Threat (APT) actor, likely Russian in origin" that is responsible for the SolarWinds attack, described as "an intelligence-gathering effort." (CISA)

We look at 2 highlights, our key takeaways...

 "likely Russian in origin"

 "an intelligence-gathering effort"

"Who caused the SolarWinds attack?", we ask ourselves.

It is alleged that the “APT” is Russian in origin, with few other details of attribution. We are unlikely to know who the APT is for a while. We ask, "based on the known APT Groups from Russia that are focused on intelligence gathering, who could have caused the SolarWinds supply chain attack?" We decide to find this answer using open-source threat group information and text manipulation with command-line.

To clarify, this is an educational threat analysis and tool workshop, and nothing more. We will learn about text manipulation on top of two datasets:

 ThaiCERT threat actor encyclopedia

 “EternalLiberty.csv” threat actor attribution dataset.

Our toolset: Text manipulation essentials.

BASH command-line interpreter (we use this for navigating everything; mostly web downloads, command execution, and excessive for loops)

nano – command-line text editor for editing our TXT and JSON files

cat – print the content of the files, top-to-bottom

tac – print the content of the files in reverse, bottom-to-top (useful for sorting lists greatest-to-least or least-to-greatest)

grep – print lines that match patterns

awk – pattern scanning and processing language

sed – filtering and transforming text

sort – sort lines of text files

uniq – report or omit repeated lines

head – output the first part of files (default first 10 lines, adjust variable with -n flag)

tail – output the last part of files (default last 10 lines, adjust variable with -n flag)

curl – transfer a URL (transfer data from or to a server, typically HTTP/HTTPS)

wget – simply download files (HTTP/HTTPS/FTP)

wc – multi-purpose printing for newline, word, and byte (we use it for counting how many lines are in a file with -l flag)

Our dataset: ThaiCERT threat actor encyclopedia.

ThaiCERT kindly maintains an encyclopedia of threat actors and threat groups that shares insights for cybercrime and nation-state cyber campaigns. For a threat researcher, this is an ideal resource to keep on hand.

On January 20, 2021, the encyclopedia has documented 345 threat groups (260 APT, 51 other, 34 unknown). Also, it tracks group aliases (924), total operations (1396), total counter operations (92), unique source countries (28), unique tools (1434), total tool aliases (2084), and more. The top five countries with documented threat actors are respectively China, Russia, Iran, North Korea, and the USA; notably, China (111), Russia (46) and Iran (31) are more significant than others (i.e. 10, 8, 5, 4, 3, 3, 3, 3, 2, 2, …).

We can use this data to aid our intelligence analysis when we have to create conclusions with a requirement of supportive statistical data. This idea was inspired by Lab52 in their blog post “Exploiting APT data for fun and no profit”, and this section is credited to their write-up (thank you): https://lab52.io/blog/exploiting-apt-data-for-fun-and-no-profit/

We download the ThaiCERT threat actor encyclopedia:

curl -o out.json https://apt.thaicert.or.th/cgi-bin/getmisp.cgi?o=g

Download and audit a third-party script to help us manipulate ThaiCERT’s file content:

curl -o JSON.sh https://raw.githubusercontent.com/dominictarr/JSON.sh/master/JSON.sh

Search for outbound connections in the file (there should be none):

nano JSON.sh

If the file is not making outbound connections, adjust the file permissions to allow the script to be executable.

chmod +x JSON.sh

Run the shell script to transform the JSON format into a flat format.

cat out.json |./JSON.sh -l > work.txt


Next, we split all the files into a few hundred smaller files, creating "threat info cards" for each group in their own independent files.

n=`awk -F, 'index($1,"values")>0 {print $2}' work.txt |grep -v value| sort -n|uniq|tail -1` export n

for i in $(seq 1 $n);do grep "values\",$i," work.txt >$i.txt;done

Identify the profiles that are both from “Russia” and is classified as “Information theft and espionage”.

Next, we find all Russian APTs focused on "Information theft and espionage".

First, we grab all of the "RU" (Russia) tag countries by parsing for the "country" tag inside all files that start with a number and end with ".txt"

grep \"country\" [0-9]*.txt|grep -w RU|awk -F":" '{print $1}' > russia

We search all the files for the "motivation" tag, we limit this using grep to only "Information theft and espionage" tag, we save the output to a file "infotheft"

for x in `cat russia`;do grep "\"motivation\"" $x|grep "Information theft and espionage";done|awk -F"\"," '{print $2}'|sed 's/,\"meta/\.txt/g' > infotheft

We approach the next part in 5 big steps:

1. grab all synonym threat group names for associated key, save to infotheft-ru1

for x in `cat infotheft`;do grep "synonyms" $x;done|awk -F"\"" '{print $3"~"$8}'|awk '{print substr($0,2)}'|sed 's/,~/~/g'|sed 's/"/"/g'|sed 's/"/"/g' > infotheft-ru1.txt

2. grab all primary threat group names for associated key, save to infotheft-ru2

for x in `cat infotheft`;do head -n 1 $x; done|awk -F"\"" '{print substr($3"~"$6,2)}'|sed 's/,~/~/g' > infotheft-ru2.txt

3. we combine infotheft-ru1 and infotheft-ru2 into infotheft-all and remove duplicates

cat infotheft-*|sort|awk -F"," '{print $1}'|sort -u > infotheft-all.txt

4. print all of the final keys/IDs inside of infotheft-all, we save it to a file all-id

cat infotheft-all.txt|awk -F"~" '{print $1}'|sort|uniq > all-id

5. grab all threat group names from file all-id and print all output into one line, deduplicated.

for x in `cat all-id`;do grep "$x" infotheft-all.txt|awk -F"~" '{printf $2", "}' && echo;done|sed 's/\, $//' > all-id-names

Assess the results of ThaiCERT threat actor encyclopedia.

Finally, we have a list of possible threat groups from Russia, based on ThaiCERT datasets, that are focused on specifically "Information theft and espionage", meaning that if the Solarwinds incident was caused by an existing threat group, then it would be one of the groups listed below (sorted by newline, multiple names are listed for each group)

BlueAlpha, Gamaredon Group, Primitive Bear, Winterflounder

IAmTheKing

ATK 116, Cloud Atlas, Inception Framework, Operation “Cloud Atlas”, Operation “RedOctober”, Oxygen, The Rocra

InvisiMole

Operation BugDrop

Operation Domino, Operation Kremlin, Operation “Domino”, Operation “Kremlin”

APT 28, ATK 5, Fancy Bear, Grizzly Steppe, Group 74, ITG05, Iron Twilight, Operation “DealersChoice”, Operation “Dear Joohn”, Operation “Komplex”, Operation “Pawn Storm”, Operation “Russian Doll”, Pawn Storm, SIG40, Sednit, Snakemackerel, Sofacy, Strontium, Swallowtail, T-APT-12, TAG-0700, TG-4127, Tsar Team

Iron Lyric, SIG39, TeamSpy Crew

ATK 13, Belugasturgeon, CTG-8875, Group 88, ITG12, Iron Hunter, Krypton, Makersmark, Operation “Epic Turla”, Operation “Moonlight Maze”, Operation “Penguin Turla”, Operation “Satellite Turla”, Operation “Skipper Turla”, Operation “Turla Mosquito”, Operation “WITCHCOVEN”, Pacifier APT, Popeye, SIG15, SIG2, SIG23, TAG-0530, Turla, Venomous Bear, Waterbug, Wraith

Dark Halo, SolarStorm, StellarParticle, UNC2452

APT 29, ATK 7, CloudLook, Cozy Bear, Grizzly Steppe, Group 100, ITG11, Iron Hemlock, Minidionis, Operation “Ghost”, Operation “Office monkeys”, The Dukes, Yttrium

CTG-8875, Group 88, Cyber Berkut, Kiberberkut

APT-C-34, DustSquad, Golden Falcon, Nomadic Octopus

The current output represents threat groups from all-time. To be more specific, we could limit the results by the activity dates, but we are not doing that today.

During our analysis, we observe that there are name schematics that may not be used in the same way for our next steps. Different datasets have small, but important, differences. For example, Group A may say “APT29, SIG15, TAPT12” but Group B may say “APT-29, SIG 15, T-APT-12”, in both cases, we want positive matches, so we generate a list of names (including synonym names) for the enrichment phase of our analysis.

1. split each entity of all-id-names into a newline, one-by-one, and generate possible naming schematics into a list (tedious, I know, but this is how the real world threat analyst’s get one-time stuff done quickly).

cat all-id-names | sed 's/, /\n/g' > all-id-names_enriched

cat all-id-names | sed 's/, /\n/g' | sed 's/Operation //g' | sed 's/“\|”\|"//g' >> all-id-names_enriched

cat all-id-names | sed 's/, /\n/g' | sed 's/Operation //g' | sed 's/“\|”\|"//g' | sed 's/ //g' >> all-id-names_enriched

cat all-id-names | sed 's/, /\n/g' | sed 's/Operation //g' | sed 's/“\|”\|"//g' | sed 's/ /-/g' >> all-id-names_enriched

cat all-id-names | sed 's/, /\n/g' | sed 's/Operation //g' | sed 's/“\|”\|"//g' | sed 's/ATK /ATK/g' | sed 's/SIG/SIG /g' | sed 's/UNC/UNC /g' >> all-id-names_enriched

cat all-id-names | sed 's/, /\n/g' | sed 's/Operation //g' | sed 's/“\|”\|"//g' | sed 's/-/ /g' >> all-id-names_enriched

cat all-id-names | sed 's/, /\n/g' | sed 's/Operation //g' | sed 's/“\|”\|"//g' | sed 's/-//g' >> all-id-names_enriched

2. sort and de-duplicate the all-id-names_enriched list.

sort all-id-names_enriched | uniq > all-id-names_enriched-clean

Enrich the results of ThaiCERT with the EternalLiberty library.

We use this data to pivot, enrich, and expand with the EternalLiberty library: https://github.com/StrangerealIntel/EternalLiberty/blob/main/EternalLiberty.csv

We download the EternalLiberty.csv threat actor moniker file...

wget https://raw.githubusercontent.com/StrangerealIntel/EternalLiberty/main/EternalLiberty.csv 

Then search for the info and extract it...

IFS=$'\n'

for x in `cat all-id-names_enriched-clean|sed 's/\"//g'|awk -F"," '{print $1}'`;do grep -in "$x" EternalLiberty.csv|grep -v "$x[0-9]";done|sort|uniq|grep -v "Threat Actor Official Name"|sort -n > thaicert_eternalliberty_data

unset IFS

This presents us with a more complete set of data... or so we think. We see a problem that jeopardizes the reliability of our analysis – inconsistencies. If the ThaiCERT name does not exist in the EternalLiberty dataset, then it will not be included in our final output. In this case, we are choosing to trust the accuracy of EternalLiberty over ThaiCERT, due to our analyst bias, empirically assessing it as a reliable dataset; if a name is not in it, it may have been irrelevant anyway. Having acknowledged this, we accept the risk of missed attribution observations, and we move on to analyzing and further processing the output.

48:APT 28,High,APT,Russia,APT 28,,Sofacy,Fancy Bear,Strontium,G0007,ITG05,,Iron Twilight,TG-4127,,,ATK 5,,Swallowtail,T-APT-12,APT-C-20,Pawn Storm,,,,,,,,,,,,,,,,,,,,Sednit,,,,,,,Group 74,,,,,,,Tsar Team,,,,,,,,Grizzly Steppe,,,,,,,,,,,,Snakemackerel,,,,,,,,,,,,,,,SIG40

49:APT 29,High,APT,Russia,APT 29,,CloudLook,Cozy Bear,Yttrium,G0016,ITG11,,Iron Hemlock,,,,ATK7,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Group 100,,,,,,Minidionis,,,,,,,,,Grizzly Steppe,,,,,,,,,,,The Dukes,,,,,,,,,,,,,,,,

60:APT 40,High,APT,China,TEMP.Periscope/TEMP.Jumper,Leviathan,,Kryptonite Panda,Gadolinium,G0065,ITG09,,Bronze Mohawk,,,,ATK29,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Flaccid Rose,Nanhaishu,Mudcarp,,,,,,,,,,,,,,,

91:Cyber Berkut,Mid,Hacktivists,Russia,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Cyber Berkut,,,,,,,,,,,,,,,,,,,,,,

100:APT-C-34,High,APT,Russia,,,DustSquad,,,,,,,,,,,,,,Golden Falcon,,,,,,,,,,,,,,,,,,,,,Nomadic Octopus,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

112:Gamaredon Group,High,APT,Russia,Temp.Armageddon/UNC530,,,Primitive Bear,,G0047,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Gamaredon Group,,,,,,,,,,,,,,,,,,,,,Winterflounder,BlueAlpha,,,,,,,,,,,,,,

115:GhostNet,High,APT,China,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Snooping Dragon,GhostNet,,,,,,,,,,

121:IAmTheKing,Mid,APT,Russia,,,IAmTheKing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,PowerPool,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

123:Cloud Atlas,High,APT,Russia,,,Cloud Atlas,,Oxygen,G0100,,,,,,,ATK 116,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

149:BugDrop,Mid,APT,Russia,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,BugDrop,,,,,,,,,,,,,,,,,,

201:TeamSpy Crew,High,APT,Russia,,,TeamSpy Crew,,,,,,Iron Lyric,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,SIG39

206:Turla,High,APT,Russia,,,Turla,Venomous Bear,Krypton,G0010,ITG12,,Iron Hunter,CTG-8875,,,ATK 13,,Waterbug,,,,,,,,,,,,,,,,,,,,,,Pacifier APT,Makersmark,,,,,,,Group 88,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,SIG2/SIG15/SIG23

263:Ghost Jackal,Unknown,Hacktivists,Unknown,,,,Ghost Jackal,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

297:GHOST SQUAD HACKERS,High,Hacktivists,Worldwide,,,,,,,,,,,,,ATK 135,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

316:Skipper Turla,High,APT,Russia,,,White Bear,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

332:UNC2452,Mid,APT,Russia,UNC2452,,,StellarParticle,Solorigate,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,SolarStorm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Dark Halo,,,,,,

 

We can still clean that up a bit (again, I am a threat analyst, not a developer).

cat thaicert_eternalliberty_data |sed 's/,,/,/g'|sed 's/,,/,/g'|sed 's/,,/,/g'|sed 's/,,/,/g'|sed 's/,,/,/g'|sed 's/,,/,/g'|sed 's/,,/,/g'|sed 's/,,/,/g'|sed 's/\,$//'

Finally, we are presented with a formatted list of Russian threat groups that are focused on information theft and espionage.

48:APT 28,High,APT,Russia,APT 28,Sofacy,Fancy Bear,Strontium,G0007,ITG05,Iron Twilight,TG-4127,ATK 5,Swallowtail,T-APT-12,APT-C-20,Pawn Storm,Sednit,Group 74,Tsar Team,Grizzly Steppe,Snakemackerel,SIG40

49:APT 29,High,APT,Russia,APT 29,CloudLook,Cozy Bear,Yttrium,G0016,ITG11,Iron Hemlock,ATK7,Group 100,Minidionis,Grizzly Steppe,The Dukes

60:APT 40,High,APT,China,TEMP.Periscope/TEMP.Jumper,Leviathan,Kryptonite Panda,Gadolinium,G0065,ITG09,Bronze Mohawk,ATK29,Flaccid Rose,Nanhaishu,Mudcarp

91:Cyber Berkut,Mid,Hacktivists,Russia,Cyber Berkut

100:APT-C-34,High,APT,Russia,DustSquad,Golden Falcon,Nomadic Octopus

112:Gamaredon Group,High,APT,Russia,Temp.Armageddon/UNC530,Primitive Bear,G0047,Gamaredon Group,Winterflounder,BlueAlpha

115:GhostNet,High,APT,China,Snooping Dragon,GhostNet

121:IAmTheKing,Mid,APT,Russia,IAmTheKing,PowerPool

123:Cloud Atlas,High,APT,Russia,Cloud Atlas,Oxygen,G0100,ATK 116

149:BugDrop,Mid,APT,Russia,BugDrop

201:TeamSpy Crew,High,APT,Russia,TeamSpy Crew,Iron Lyric,SIG39

206:Turla,High,APT,Russia,Turla,Venomous Bear,Krypton,G0010,ITG12,Iron Hunter,CTG-8875,ATK 13,Waterbug,Pacifier APT,Makersmark,Group 88,SIG2/SIG15/SIG23

263:Ghost Jackal,Unknown,Hacktivists,Unknown,Ghost Jackal

297:GHOST SQUAD HACKERS,High,Hacktivists,Worldwide,ATK 135

316:Skipper Turla,High,APT,Russia,White Bear

332:UNC2452,Mid,APT,Russia,UNC2452,StellarParticle,Solorigate,SolarStorm,Dark Halo

A word on attribution.

We have a list of known threat group names now. So, what?

The threat group(s) that targeted SolarWinds could certainly be one of the listed groups, or they could be a new group that has no overlap, we do not know. We simply do not know because a government has not confirmed it to us, beyond being “Russian” and focusing on an “intelligence gathering effort”.

We should leave threat group attribution to governments and reputable cybersecurity vendors, but that does not mean that we cannot do some fun analysis with a smidge of weight behind it, but do not go around using this analysis as a basis to say that any particular known group is responsible for the SolarWinds attack.

FireEye started tracking the threat group as UNC2452 in December 2020, but it was not until January 2021 the US government formally attributed the attack to Russia. In April 2021, citing MITRE ATT&CK on APT29, "the US and UK governments attributed the SolarWinds supply chain compromise cyber operation to the SVR; public statements included citations to APT29, Cozy Bear, and The Dukes Victims of this campaign included government, consulting, technology, telecom, and other organizations in North America, Europe, Asia, and the Middle East. Industry reporting referred to the actors involved in this campaign as UNC2452, NOBELIUM, StellarParticle, and Dark Halo." APT29 is the threat group that has been attributed to Russia's Foreign Intelligence Service (SVR).

A word on text manipulation.

Text manipulation is so important for any type of analyst! If you are proficient with text manipulation tools, this will become a great asset for your skills both as a professional and as a researcher. We may even be surprised by how many people are not comfortable using text manipulation tools, making this a great area to excel in as an analyst, to fill that skill gap in our teams.

Truly, we can accomplish so much just by being able to move text around in JSON, CSV, and TXT files based on how we want to! For a threat analyst, this could be parsing credentials, event logs, or random datasets from the Internet to conclude a threat analysis (i.e. ThaiCERT, EternalLiberty).

Popular posts from this blog

Reflecting on the Fortinet VPN victim exposure