SheIn.com Data Breach Analysis

SheIn.com Data Breach Analysis

Foreword

A women’s fashion retailer SHEIN, also spelled SheIn, is a US-based online store that had apparently suffered a data breach somewhere in June 2018, but the company only discovered the breach in late August 2018. SHEIN stated that the intruders managed to gain access to customers’ email addresses and encrypted passwords.

What data is at risk?

When the data breach was discovered, SHEIN stated that the hackers managed to gain access to email addresses and encrypted passwords that were stored in the system, but the leaked data does not contain any signs of encryption – it is likely that the passwords were decrypted before publishing the data.

Email addresses

In this data breach, there is a very wide array of email providers being used. Lets take a look:

#Email DomainQuantity
1gmail.com13,679,190
2hotmail.com4,019,832
3Yahoo.com2,192,258
4icloud.com729,539
5HOTMAIL.FR528,368
6mail.ru526,108
7web.de403,258
8aol.com401,925
9outlook.com318,435
10hotmail.co.uk313,180
11gmx.de297,622
12orange.fr226,573
13yahoo.fr195,841
14yandex.ru182,469
15live.com156,084
16hotmail.de131,248
17hotmail.it110,296
18yahoo.de106,191
19yahoo.co.uk104,798
20live.fr104,342
21libero.it102,292
22googlemail.com97,320
23t-online.de95,561
24msn.com94,348
25laposte.net85,934
26comcast.net84,970
27hotmail.es83,819
28ymail.com81,731
29free.fr73,624
30outlook.fr72,114
31me.com68,649
32sfr.fr68,064
33wanadoo.fr63,208
34yahoo.com.tw63,159
35yahoo.es57,809
36live.co.uk52,470
37gamil.com51,544
38gmx.net46,147
39bk.ru45,914
40btinternet.com44,002
41gmail.con43,782
42sbcglobal.net38,499
43yahoo.it38,245
44freenet.de37,974
45att.net37,381
46yahoo.co.in35,944
47bigpond.com33,423
48wp.pl32,642
49live.de31,715
50live.it31,586
51mail.com31,243
52outlook.de30,401
53outlook.sa30,082
54list.ru28,461
55rambler.ru28,223
56rediffmail.com26,302
57inbox.ru26,123
58sky.com26,116
59neuf.fr25,320
60qq.com25,123
61rocketmail.com24,858
62yahoo.in24,577
63yahoo.com.au24,125
64verizon.net24,023
65windowslive.com23,852
66gmil.com23,120
67alice.it20,452
68hotmil.com19,979
69bellsouth.net19,002
70hotmail.con18,627
71cox.net18,206
72arcor.de18,109
73virgilio.it18,080
74aim.com17,910
75live.nl17,908
76live.com.au17,629
77gmai.com16,991
78yahoo.com.hk16,093
79outlook.es16,037
80bbox.fr14,134
81tiscali.it13,796
82seznam.cz12,907
83online.de12,612
84o2.pl12,477
85yahoo.com.br12,295
86email.com12,278
87outlook.it11,201
88live.com.mx11,021
89optonline.net9,594
90charter.net9,006
91interia.pl8,947
92yahoo.com.mx8,857
93mac.com8,549
94yahoo.ca8,492
95gmail.co8,491
96optusnet.com.au8,306
97abv.bg7,984
98ntlworld.com7,926
99live.se7,674
100ya.ru7,624

The length of the chosen email addresses in this data breach also varies widely – if we take a range from the smallest number to the largest we can see that:

  • The smallest amount – 7 emails were more than or equal to 100 characters in length;
  • There’s 11 emails which were less than or equal to 5 characters in length;
  • 13 emails which contained more than or equal to 90 characters in length;
  • 25 emails which contained more than or equal to 80 characters in length;
  • 117 emails which contained more than or equal to 70 characters in length;
  • 178 emails which contained more than or equal to 60 characters in length;
  • 385 emails which contained more than or equal to 50 characters in length;
  • 10,183 emails which contained more than or equal to 40 characters in length;
  • 16,755 emails which contained less than or equal to 10 characters in length;
  • 843,073 emails which contained more than or equal to 30 characters in length;
  • 9,848,312 emails which contained less than or equal to 20 characters in length;
  • 22,322,666 emails which contained more than or equal to 20 characters in length.

Looking at the top-level domains (TLDs), we can also create a list of countries that SheIn users were using the service from:

#Email DomainQuantityPurpose / Country
1.com17,699,022Commercial / United States
2.edu1,813Education
3.net85,934Network Infrastructure
4.de403,258Germany
5.fr754,941France
6.au24,125Australia
7.it110,296Italy
8.ru526,108Russia
9.uk313,180United Kingdom
10.es83,819Spain
11.pl45,119Poland
12.con43,782None, probably misspelled
13.br12,295Brazil
14.ca8,492Canada
15.nl17,908The Netherlands
16.mx11,021Mexico
17.co8,491Colombia
18.no5,712Norway
19.be2,130Belgium
20.in35,944India
21.se7,674Sweden
22.at6,910Austria
23.ch4,639Switzerland
24.dk2,675Denmark
25.nz2,321New Zealand
26.pt2,243Portugal
27.ar2,229Argentina
28.tw63,159Taiwan
29.ae1,532United Arab Emyrates
30.cz12,907Czech Republic
31.cn1,393China
32.bg7,984Bulgaria
33.gr4,178Greece
34.cim3,815None, probably misspelled
35.ua828Ukraine
36.hu3,141Hungary
37.eu2,393European Union
38.cm1,945None, probably misspelled
39.sk1,813Slovakia
40.sa30,082Saudi Arabia
41.ie1,496Ireland
42.ro1,330Romania
43.fm1,221Federated States of Micronesia
44.id1,206Indonesia
45.cl1,200Chile
46.om1,188Oman
47.lv6,980Latvia
48.comm1,177None, probably misspelled
49.me1,029Montenegro
50.qa1,003Qatar
51.clm853None, probably misspelled
52.fi840Finland
53.ee773Estonia
54.ph2,847The Philippines
55.by736Belarus
56.cpm714None, probably misspelled
57.cat703Catalonia
58.hr699Croatia
59.XOM621None, probably misspelled
60.fe598Footballia
61.vn2,206Vietnam
62.cok586None, probably misspelled
63.il2,202Israel
64.te562None, probably misspelled
65.jp1,928Japan
66.come1,858None, probably misspelled
67.vom1,615None, probably misspelled
68.hk16,093Hong Kong
69.col1,517None, probably misspelled
70.sg1,464Singapore

Here’s the letters email addresses begin with. If the analysis is being run on a database with duplicates, the results show that there are 29,026,175 email addresses that begin with letters. The most popular letter is R followed by the letter A, which is followed by the letter S. Email addresses beginning with letters contain 99.05978747356848% of the entire user base:

#The letter an email address begins withQuantity
1A3,206,739
2B1,187,451
3C1,770,137
4D1,195,226
5E1,053,108
6F670,340
7G842,864
8H872,318
9I567,572
10J1,497,023
11K1,524,405
12L1,795,120
13M3,133,130
14N1,267,323
15O300,603
16P997,513
17Q56,536
18R1,308,177
19S3,101,369
20T1,007,586
21U107,682
22V635,428
23W293,056
24X96,957
25Y306,122
26Z232,390

Now that letters have been covered, we could also take a look at the numbers. It should be noted that email addresses beginning with numbers are much less prevalent than those beginning with letters. Combined, there are just 213,390 email addresses that begin with numbers – that’s less than 1% of the entire user base. Email addresses beginning with numbers contain 0.7282519329186425% of the total entries in the SheIn data breach.

The number an email address begins withQuantity
017,052
163,972
239,719
317,427
411,964
59,447
68,266
715,081
814,165
916,337

0.2119605935128775% of the email addresses in the SheIn data breach did not start with any numbers or letters – that’s exactly 62,108 accounts if we check the records against the database with duplicate entries or slightly more than 58,457 accounts if we check the records against the database without duplicate entries – the exact record count then would be 58,457.41329595996.

Passwords

There is a very interesting password distribution in the SheIn data breach – there are hundreds of different passwords that have been used by multiple different people. Of course, there are the ordinary combinations, but there are also thousands of passwords like “sheinside” potentially meaning that the users who chose such a password probably thought of it on-the-spot or “shein18” and “Shein2018”, potentially meaning that the users created their accounts in 2018. There were also 293,688 users that used multiple empty spaces as their passwords. Here’s the list:

#PasswordQuantity
1290,394
212345689,122
312345678941,637
4123456789022,968
51234567820,673
6Shein12313,773
7shopping11,664
8123456711,634
9password11,298
1012312311,155
11aa12345611,072
12sheinside10,063
13shein7,978
1412347,297
15123457,153
16112233446,767
17shein16,679
181122335,874
1909876543215,415
201111115,281
2111223344555,071
221233214,781
23Aa1231234,742
24qwerty4,737
25Shein20184,715
26sheinshein4,403
27qwert3,949
28qwertyuiop3,904
291231231233,902
30Aa1122333,881
31Aa112233443,785
3212345123453,737
33shein20173,682
34onedirection3,542
35password13,473
36iloveyou3,295
373,294
38qwer12343,156
39123443213,086
40azerty2,979
41123456789102,934
42chocolate2,920
43motdepasse2,885
44abc1232,784
45sunshine2,754
46princess2,745
47asDF12342,662
48asdfghjkl2,586
490000002,567
50[email protected]2,554
51shein.com2,547
52loulou2,524
53SheIn20162,522
54Mm1234562,515
5512345543212,456
56as1234562,401
579876543212,399
58qwerty1232,389
59shein12342,381
60justinbieber2,363
611122334455662,354
62abcd12342,330
63shopping12,329
64chouchou2,313
65doudou2,289
666543212,276
67passwort2,267
68hallo1232,254
69chocolat2,246
701212122,204
71forever212,176
72hellokitty2,165
73Aa123412342,126
74ichliebedich2,110
75clothes2,092
76ss1234562,024
77fashion1,934
78incorrect1,888
79shopping1231,881
80Aa1234567891,877
81hello1231,849
82123456789001,842
83soleil1,778
84123412341,766
85charlotte1,756
86compras1,735
87michelle1,715
88111111111,707
89butterfly1,704
90Rr1234561,701
91azertyuiop1,661
92shein181,651
93sheinpassword1,633
94Password1231,621
95charlie1,620
96Aa12345671,618
97zxcvbnm1,600
98200920121,592
99123456aA1,590
100welcome11,586

It should also be noted that the system contained 3,294 one-character passwords meaning that it is probably safe to assume that SheIn did not implement many security rules to enforce password strength.

Judging by the passwords that the users chose, we can safely assume that the service has been in operation at least since 2015 and since then grown steadily – “shein2015” password has been chosen by 699 users, “shein2016” password has been chosen by 2,522 users, “shein2017” password has been chosen by 3,682 users and the “shein2018” password has been chosen by 4,715 users.

This allows us to make an assumption that the choices of year-based passwords grew by 1,823 users in 2016, by 1,160 users in 2017 and by 1,033 users in 2018. Average growth per year – 1338.666666666667 users who chose new year-based passwords, so we can assume that the service would have had approximately 2,372 new users who would have chosen new year-based passwords in 2019 and approximately 3,711 new users who would have chosen new year-based passwords in 2020.

More interesting password choices include one-character passwords like “&”, “S”, “43”, and “(“, the word “sonnenschein” has been used 1,356 times, “papillon” has been used 1,131 times, “1q2w3e4r5t” has been used 1,065 times and “ritinhasantos4” has been used 1,021 times.

We can also see that there are multiple passwords that have been used the same number of times – there are 73 of them:

#PasswordQuantityPassword Repeat Times
1estrella1,0012
2000001,0602
3happy1231,0622
4;1,0952
5Iloveshein1,1312
6Aa11223344551,3562
7102030406163
81234567886192
9jesus1236252
109999996274
11samantha16312
12123123AA6343
13chicken6352
1426392
15Computer6423
16Aa1001006483
17alessandro6493
18Daisy1236524
19lolipop6552
20family6562
21purple1236572
22love2shop6663
23ashley6672
24monkey1236732
25(6762
26justine6793
27112233445566776842
28angela6922
29123456789Aa6972
30fuckyou6982
31michelle16992
322244667022
331234abcd7052
3476543217122
35Mm112233447152
361230987174
37aa123457183
381313137203
39alessia7212
40elizabeth17242
41beatrice7252
42cooper7302
43a12345677312
44buddy1237333
45amandine7384
46motherlode7392
470909097403
48fatima7462
49banana7512
50hannah1237542
51lovelove7572
52barbie7592
53888888887732
54asd1237793
55asdfgh7832
561122334455667788997963
57121212128002
58pepper8112
59000000008232
600099888243
61aB1234568422
62123456a8472
63876543218532
64cocacola8602
65coucou8742
661236548844
6718852
68lalala8972
69d9252
701234559522
71Asd123459642
72marina9812
73patricia9982

Best guess would be that these passwords were created by users who had more than one account in the system and thus, the times passwords repeated would match the count of multiple accounts the user had.

Apart from this, there are also a lot of passwords that begin with alphabetical letters and numbers. Here is the list of passwords that begin with letters:

#The letter the password begins withQuantity
1A1,992,010
2B1,298,884
3C1,455,374
4D964,988
5E710,719
6F803,599
7G789,006
8H887,892
9I660,077
10J978,300
11K906,390
12L1,342,680
13M2,109,455
14N904,379
15O458,681
16P1,213,355
17Q349,112
18R940,811
19S2,286,583
20T955,001
21U289,858
22V536,327
23W492,077
24X255,748
25Y381,001
26Z391,859

Here is the list of passwords that begin with numbers:

The number the password begins withQuantity
0656,343
11,423,879
2613,947
3299,329
4236,096
5231,729
6232,289
7235,043
8247,879
9341,314

In the data dump there are 408,406 passwords that are less than or equal to 5 characters in length, 20,919,888 passwords that are less than or equal to 10 characters in length, 29,187,461 passwords that are less than or equal to 20 characters in length, 65,519 passwords that are more than or equal to 20 characters in length, 40,642 passwords that are more than or equal to 30 characters in length. There are even passwords that are more than or equal to 40 characters in length – the total count of such passwords is 48. It is very likely that the passwords that are more than or equal to 20 characters in length were generated by password managers.

Summary

To summarize, the SheIn data breach, although relatively small compared to the bigger ones, did bring a lot of damage to the company and to its customers. The good thing is that SheIn notified all of their customers that their data is at risk – they also collaborated with cybersecurity investigators who monitored the network and tried to ensure that future data breaches can be prevented.

One thought on “SheIn.com Data Breach Analysis

Leave a Reply

Your email address will not be published. Required fields are marked *