Threat group attribution with open-source datasets
Case study: Reading between the lines of a press release.
You're a threat analyst doing their job, reading the news — it's December 13, 2020. FireEye just announced that they identified the SolarWinds Orion solution being used by Russia as a backdoor initial access into the systems of all types of governments and organizations. The full scope of the breach is still unclear; 100s, 1000s, and probably more are impacted.
Fast forward — It's now late January of 2021. Chaos in the media has ensued over recent weeks, many individuals and even companies are claiming attribution to specific Russian groups using flimsy proof and no hard evidence. Finally, US officials share that it was "an Advanced Persistent Threat (APT) actor, likely Russian in
origin" that is responsible for the SolarWinds attack, described
as "an intelligence-gathering effort."
We look at 2
highlights, our key takeaways...
– "likely Russian in origin"
– "an
intelligence-gathering effort"
"Who caused the SolarWinds attack?", we ask ourselves.
It is alleged
that the “APT” is Russian in origin, with few other details of attribution. We are
unlikely to know who the APT is for a while. We ask, "based on the known
APT Groups from Russia that are focused on intelligence gathering,
who could have caused the SolarWinds supply chain attack?" We decide to find this answer using open-source threat group information and text
manipulation with command-line.
To clarify, this is an educational threat analysis and tool workshop, and nothing more. We will learn about text manipulation on top of two datasets:
– ThaiCERT threat actor encyclopedia
– “EternalLiberty.csv” threat actor attribution dataset.
Our toolset: Text manipulation essentials.
BASH command-line interpreter (we use this for navigating everything; mostly web downloads, command execution, and excessive for loops)
nano –
command-line text editor for editing our TXT and JSON files
cat – print
the content of the files, top-to-bottom
tac – print
the content of the files in reverse, bottom-to-top (useful for sorting lists
greatest-to-least or least-to-greatest)
grep – print
lines that match patterns
awk – pattern
scanning and processing language
sed –
filtering and transforming text
sort – sort
lines of text files
uniq – report
or omit repeated lines
head – output
the first part of files (default first 10 lines, adjust variable with -n flag)
tail – output
the last part of files (default last 10 lines, adjust variable with -n flag)
curl –
transfer a URL (transfer data from or to a server, typically HTTP/HTTPS)
wget – simply
download files (HTTP/HTTPS/FTP)
wc –
multi-purpose printing for newline, word, and byte (we use it for counting how
many lines are in a file with -l flag)
Our dataset: ThaiCERT threat actor encyclopedia.
ThaiCERT
kindly maintains an encyclopedia of threat actors and threat groups that shares
insights for cybercrime and nation-state cyber campaigns. For a threat
researcher, this is an ideal resource to keep on hand.
On January 20, 2021, the encyclopedia has documented 345 threat groups (260 APT, 51 other, 34
unknown). Also, it tracks group aliases (924), total operations (1396), total
counter operations (92), unique source countries (28), unique tools (1434),
total tool aliases (2084), and more. The top five countries with documented
threat actors are respectively China, Russia, Iran, North Korea, and the USA;
notably, China (111), Russia (46) and Iran (31) are more significant than
others (i.e. 10, 8, 5, 4, 3, 3, 3, 3, 2, 2, …).
We can use this
data to aid our intelligence analysis when we have to create conclusions with a
requirement of supportive statistical data. This idea was inspired by Lab52 in
their blog post “Exploiting APT data for fun and no profit”, and this section
is credited to their write-up (thank you): https://lab52.io/blog/exploiting-apt-data-for-fun-and-no-profit/
We download the
ThaiCERT threat actor encyclopedia:
curl -o out.json
https://apt.thaicert.or.th/cgi-bin/getmisp.cgi?o=g
Download and audit a third-party script to help us
manipulate ThaiCERT’s file content:
curl -o JSON.sh https://raw.githubusercontent.com/dominictarr/JSON.sh/master/JSON.sh
Search for
outbound connections in the file (there should be none):
nano JSON.sh
If the file
is not making outbound connections, adjust the file permissions to allow the
script to be executable.
chmod +x JSON.sh
Run the shell
script to transform the JSON format into a flat format.
cat out.json |./JSON.sh -l > work.txt
Next, we split
all the files into a few hundred smaller files, creating "threat info
cards" for each group in their own independent files.
n=`awk -F,
'index($1,"values")>0 {print $2}' work.txt |grep -v value| sort
-n|uniq|tail -1` export n
for i in $(seq 1 $n);do
grep "values\",$i," work.txt >$i.txt;done
Identify the profiles that are both from
“Russia” and is classified as “Information theft and espionage”.
Next, we find
all Russian APTs focused on "Information theft and espionage".
First, we
grab all of the "RU" (Russia) tag countries by parsing for the
"country" tag inside all files that start with a number and end with
".txt"
grep
\"country\" [0-9]*.txt|grep -w RU|awk -F":" '{print $1}'
> russia
We search all
the files for the "motivation" tag, we limit this using grep to only
"Information theft and espionage" tag, we save the output to a file
"infotheft"
for x in `cat russia`;do
grep "\"motivation\"" $x|grep "Information theft and
espionage";done|awk -F"\"," '{print $2}'|sed
's/,\"meta/\.txt/g' > infotheft
We approach
the next part in 5 big steps:
1. grab
all synonym threat group names for associated key, save to infotheft-ru1
for x in `cat
infotheft`;do grep "synonyms" $x;done|awk -F"\""
'{print $3"~"$8}'|awk '{print substr($0,2)}'|sed 's/,~/~/g'|sed
's/"/"/g'|sed 's/"/"/g' > infotheft-ru1.txt
2. grab
all primary threat group names for associated key, save to infotheft-ru2
for x in `cat
infotheft`;do head -n 1 $x; done|awk -F"\"" '{print
substr($3"~"$6,2)}'|sed 's/,~/~/g' > infotheft-ru2.txt
3. we
combine infotheft-ru1 and infotheft-ru2 into infotheft-all and remove
duplicates
cat infotheft-*|sort|awk
-F"," '{print $1}'|sort -u > infotheft-all.txt
4. print
all of the final keys/IDs inside of infotheft-all, we save it to a file all-id
cat infotheft-all.txt|awk
-F"~" '{print $1}'|sort|uniq > all-id
5. grab
all threat group names from file all-id and print all output into one line,
deduplicated.
for x in `cat all-id`;do
grep "$x" infotheft-all.txt|awk -F"~" '{printf $2",
"}' && echo;done|sed 's/\, $//' > all-id-names
Assess the results of ThaiCERT threat actor
encyclopedia.
Finally, we
have a list of possible threat groups from Russia, based on ThaiCERT datasets,
that are focused on specifically "Information theft and espionage",
meaning that if the Solarwinds incident was caused by an existing threat group,
then it would be one of the groups listed below (sorted by newline, multiple
names are listed for each group)
BlueAlpha, Gamaredon
Group, Primitive Bear, Winterflounder
IAmTheKing
ATK 116, Cloud Atlas,
Inception Framework, Operation “Cloud Atlas”, Operation “RedOctober”, Oxygen,
The Rocra
InvisiMole
Operation BugDrop
Operation Domino,
Operation Kremlin, Operation “Domino”, Operation “Kremlin”
APT 28, ATK 5, Fancy
Bear, Grizzly Steppe, Group 74, ITG05, Iron Twilight, Operation
“DealersChoice”, Operation “Dear Joohn”, Operation “Komplex”, Operation “Pawn
Storm”, Operation “Russian Doll”, Pawn Storm, SIG40, Sednit, Snakemackerel,
Sofacy, Strontium, Swallowtail, T-APT-12, TAG-0700, TG-4127, Tsar Team
Iron Lyric, SIG39,
TeamSpy Crew
ATK 13, Belugasturgeon,
CTG-8875, Group 88, ITG12, Iron Hunter, Krypton, Makersmark, Operation “Epic
Turla”, Operation “Moonlight Maze”, Operation “Penguin Turla”, Operation
“Satellite Turla”, Operation “Skipper Turla”, Operation “Turla Mosquito”,
Operation “WITCHCOVEN”, Pacifier APT, Popeye, SIG15, SIG2, SIG23, TAG-0530,
Turla, Venomous Bear, Waterbug, Wraith
Dark Halo, SolarStorm,
StellarParticle, UNC2452
APT 29, ATK 7, CloudLook,
Cozy Bear, Grizzly Steppe, Group 100, ITG11, Iron Hemlock, Minidionis,
Operation “Ghost”, Operation “Office monkeys”, The Dukes, Yttrium
CTG-8875, Group 88, Cyber
Berkut, Kiberberkut
APT-C-34, DustSquad,
Golden Falcon, Nomadic Octopus
The current
output represents threat groups from all-time. To be more specific, we could limit
the results by the activity dates, but we are not doing that today.
During our
analysis, we observe that there are name schematics that may not be used in the
same way for our next steps. Different datasets have small, but important,
differences. For example, Group A may say “APT29, SIG15, TAPT12” but Group B
may say “APT-29, SIG 15, T-APT-12”, in both cases, we want positive matches, so
we generate a list of names (including synonym names) for the enrichment phase
of our analysis.
1. split
each entity of all-id-names into a newline, one-by-one, and generate possible
naming schematics into a list (tedious, I know, but this is how the real world
threat analyst’s get one-time stuff done quickly).
cat all-id-names | sed
's/, /\n/g' > all-id-names_enriched
cat all-id-names | sed
's/, /\n/g' | sed 's/Operation //g' | sed 's/“\|”\|"//g' >>
all-id-names_enriched
cat all-id-names | sed
's/, /\n/g' | sed 's/Operation //g' | sed 's/“\|”\|"//g' | sed 's/ //g'
>> all-id-names_enriched
cat all-id-names | sed
's/, /\n/g' | sed 's/Operation //g' | sed 's/“\|”\|"//g' | sed 's/ /-/g'
>> all-id-names_enriched
cat all-id-names | sed
's/, /\n/g' | sed 's/Operation //g' | sed 's/“\|”\|"//g' | sed 's/ATK
/ATK/g' | sed 's/SIG/SIG /g' | sed 's/UNC/UNC /g' >>
all-id-names_enriched
cat all-id-names | sed
's/, /\n/g' | sed 's/Operation //g' | sed 's/“\|”\|"//g' | sed 's/-/ /g'
>> all-id-names_enriched
cat all-id-names | sed
's/, /\n/g' | sed 's/Operation //g' | sed 's/“\|”\|"//g' | sed 's/-//g'
>> all-id-names_enriched
2. sort
and de-duplicate the all-id-names_enriched list.
sort
all-id-names_enriched | uniq > all-id-names_enriched-clean
Enrich the results of ThaiCERT with the
EternalLiberty library.
We use this data to pivot, enrich, and expand with the EternalLiberty library: https://github.com/StrangerealIntel/EternalLiberty/blob/main/EternalLiberty.csv
We download
the EternalLiberty.csv threat actor moniker file...
wget https://raw.githubusercontent.com/StrangerealIntel/EternalLiberty/main/EternalLiberty.csv
Then search for the info and extract it...
IFS=$'\n'
for x in `cat
all-id-names_enriched-clean|sed 's/\"//g'|awk -F"," '{print
$1}'`;do grep -in "$x" EternalLiberty.csv|grep -v
"$x[0-9]";done|sort|uniq|grep -v "Threat Actor Official
Name"|sort -n > thaicert_eternalliberty_data
unset IFS
This presents
us with a more complete set of data... or so we think. We see a problem that
jeopardizes the reliability of our analysis – inconsistencies. If the ThaiCERT
name does not exist in the EternalLiberty dataset, then it will not be included
in our final output. In this case, we are choosing to trust the accuracy of
EternalLiberty over ThaiCERT, due to our analyst bias, empirically assessing it as a reliable dataset; if a name is not in it, it may have
been irrelevant anyway. Having acknowledged this, we accept the risk of missed
attribution observations, and we move on to analyzing and further processing
the output.
48:APT
28,High,APT,Russia,APT 28,,Sofacy,Fancy Bear,Strontium,G0007,ITG05,,Iron
Twilight,TG-4127,,,ATK 5,,Swallowtail,T-APT-12,APT-C-20,Pawn
Storm,,,,,,,,,,,,,,,,,,,,Sednit,,,,,,,Group 74,,,,,,,Tsar Team,,,,,,,,Grizzly
Steppe,,,,,,,,,,,,Snakemackerel,,,,,,,,,,,,,,,SIG40
49:APT
29,High,APT,Russia,APT 29,,CloudLook,Cozy Bear,Yttrium,G0016,ITG11,,Iron
Hemlock,,,,ATK7,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Group
100,,,,,,Minidionis,,,,,,,,,Grizzly Steppe,,,,,,,,,,,The Dukes,,,,,,,,,,,,,,,,
60:APT
40,High,APT,China,TEMP.Periscope/TEMP.Jumper,Leviathan,,Kryptonite
Panda,Gadolinium,G0065,ITG09,,Bronze
Mohawk,,,,ATK29,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Flaccid
Rose,Nanhaishu,Mudcarp,,,,,,,,,,,,,,,
91:Cyber
Berkut,Mid,Hacktivists,Russia,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Cyber
Berkut,,,,,,,,,,,,,,,,,,,,,,
100:APT-C-34,High,APT,Russia,,,DustSquad,,,,,,,,,,,,,,Golden
Falcon,,,,,,,,,,,,,,,,,,,,,Nomadic
Octopus,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
112:Gamaredon
Group,High,APT,Russia,Temp.Armageddon/UNC530,,,Primitive
Bear,,G0047,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Gamaredon
Group,,,,,,,,,,,,,,,,,,,,,Winterflounder,BlueAlpha,,,,,,,,,,,,,,
115:GhostNet,High,APT,China,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Snooping
Dragon,GhostNet,,,,,,,,,,
121:IAmTheKing,Mid,APT,Russia,,,IAmTheKing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,PowerPool,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
123:Cloud
Atlas,High,APT,Russia,,,Cloud Atlas,,Oxygen,G0100,,,,,,,ATK
116,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
149:BugDrop,Mid,APT,Russia,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,BugDrop,,,,,,,,,,,,,,,,,,
201:TeamSpy
Crew,High,APT,Russia,,,TeamSpy Crew,,,,,,Iron
Lyric,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,SIG39
206:Turla,High,APT,Russia,,,Turla,Venomous
Bear,Krypton,G0010,ITG12,,Iron Hunter,CTG-8875,,,ATK
13,,Waterbug,,,,,,,,,,,,,,,,,,,,,,Pacifier APT,Makersmark,,,,,,,Group 88,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,SIG2/SIG15/SIG23
263:Ghost
Jackal,Unknown,Hacktivists,Unknown,,,,Ghost
Jackal,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
297:GHOST SQUAD
HACKERS,High,Hacktivists,Worldwide,,,,,,,,,,,,,ATK
135,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
316:Skipper
Turla,High,APT,Russia,,,White
Bear,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
332:UNC2452,Mid,APT,Russia,UNC2452,,,StellarParticle,Solorigate,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,SolarStorm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Dark
Halo,,,,,,
We can still
clean that up a bit (again, I am a threat analyst, not a developer).
cat
thaicert_eternalliberty_data |sed 's/,,/,/g'|sed 's/,,/,/g'|sed 's/,,/,/g'|sed
's/,,/,/g'|sed 's/,,/,/g'|sed 's/,,/,/g'|sed 's/,,/,/g'|sed 's/,,/,/g'|sed
's/\,$//'
Finally, we
are presented with a formatted list of Russian threat groups that are focused
on information theft and espionage.
48:APT
28,High,APT,Russia,APT 28,Sofacy,Fancy Bear,Strontium,G0007,ITG05,Iron
Twilight,TG-4127,ATK 5,Swallowtail,T-APT-12,APT-C-20,Pawn Storm,Sednit,Group
74,Tsar Team,Grizzly Steppe,Snakemackerel,SIG40
49:APT
29,High,APT,Russia,APT 29,CloudLook,Cozy Bear,Yttrium,G0016,ITG11,Iron
Hemlock,ATK7,Group 100,Minidionis,Grizzly Steppe,The Dukes
60:APT
40,High,APT,China,TEMP.Periscope/TEMP.Jumper,Leviathan,Kryptonite
Panda,Gadolinium,G0065,ITG09,Bronze Mohawk,ATK29,Flaccid Rose,Nanhaishu,Mudcarp
91:Cyber Berkut,Mid,Hacktivists,Russia,Cyber
Berkut
100:APT-C-34,High,APT,Russia,DustSquad,Golden
Falcon,Nomadic Octopus
112:Gamaredon
Group,High,APT,Russia,Temp.Armageddon/UNC530,Primitive Bear,G0047,Gamaredon
Group,Winterflounder,BlueAlpha
115:GhostNet,High,APT,China,Snooping
Dragon,GhostNet
121:IAmTheKing,Mid,APT,Russia,IAmTheKing,PowerPool
123:Cloud
Atlas,High,APT,Russia,Cloud Atlas,Oxygen,G0100,ATK 116
149:BugDrop,Mid,APT,Russia,BugDrop
201:TeamSpy
Crew,High,APT,Russia,TeamSpy Crew,Iron Lyric,SIG39
206:Turla,High,APT,Russia,Turla,Venomous
Bear,Krypton,G0010,ITG12,Iron Hunter,CTG-8875,ATK 13,Waterbug,Pacifier
APT,Makersmark,Group 88,SIG2/SIG15/SIG23
263:Ghost
Jackal,Unknown,Hacktivists,Unknown,Ghost Jackal
297:GHOST SQUAD
HACKERS,High,Hacktivists,Worldwide,ATK 135
316:Skipper
Turla,High,APT,Russia,White Bear
332:UNC2452,Mid,APT,Russia,UNC2452,StellarParticle,Solorigate,SolarStorm,Dark
Halo
A word on attribution.
We have a list of known threat group names now. So, what?
The threat
group(s) that targeted SolarWinds could certainly be one of the listed groups,
or they could be a new group that has no overlap, we do not know. We simply do
not know because a government has not confirmed it to us, beyond being
“Russian” and focusing on an “intelligence gathering effort”.
We should leave
threat group attribution to governments and reputable cybersecurity vendors,
but that does not mean that we cannot do some fun analysis with a smidge of
weight behind it, but do not go around using this analysis as a basis to say
that any particular known group is responsible for the SolarWinds attack.
FireEye started tracking the threat group as
A word on text manipulation.
Text
manipulation is so important for any type of analyst! If you are proficient
with text manipulation tools, this will become a great asset for your skills
both as a professional and as a researcher. We may even be surprised by how
many people are not comfortable using text manipulation tools, making this a
great area to excel in as an analyst, to fill that skill gap in our teams.
Truly, we can accomplish so much just by being able to move text around in JSON, CSV, and TXT files based on how we want to! For a threat analyst, this could be parsing credentials, event logs, or random datasets from the Internet to conclude a threat analysis (i.e. ThaiCERT, EternalLiberty).