Analysis of Vulnerabilities That Can Occur When Generating One-Time Password

: A one-time password (OTP) is a password that is valid for only one login session or transaction, in IT systems or digital devices. This is one of the human-centered security services and is commonly used for multi-factor authentication. This is very similar to generating pseudo-random bit streams in cryptography. However, it is only part of what is used as OTP in the bit stream. Therefore, the OTP mechanism requires an algorithm to extract portions. It is also necessary to convert hexadecimal to decimal so that the values of the bit strings are familiar to human. In this paper, we classify three algorithms for extracting the ﬁnal data from the pseudo random bit sequence. We also analyze the fact that a vulnerability occurs during the extraction process, resulting in a high frequency of certain numbers; even if cryptographically secure generation algorithms are used.


Introduction
In general, fixed passwords are vulnerable to re-use attacks due to malicious attacker's password collection. One-Time Password (OTP), on the other hand, is a password that is used differently each time for a login session or transaction [1]. Thus, this makes it impossible to reuse previously collected things. It also has the property of randomness, which makes it mathematically impossible to infer the next password from the current one. It is known to be highly secure because of these features, and it can be used as an authentication service in Information & Communication Technology (ICT) [2] fields such as mobile-payment [3,4] and mobile-cloud computing [5,6].
OTP mechanism generates passwords using secrets shared in advance between the certification authority and the user, and time or event occurrence information. So the attacker can not infer this by himself. Currently, it is provided as a token, card type device (Hardware), and mobile phone application type (Software) [7,8]. Hardware type is frequently used in the financial sector, and the number of issuances and transactions of this is continuously increasing. Software in mobile phones is widely used in online games and IT industries, but financial institutions are developing mobile OTP based on USIM.
this, it is possible to generate a valid OTP and even obtain the OTP value in the future. In order to eliminate such functional and logical vulnerabilities, it is required to apply various security reviews when implementing the OTP generation program mounted on the PC. In particular, anti-reversing technology must be applied so that the attacker(s) can not reverse the OTP program to easily perform security function bypass attacks [17].

Vulnerabilities of OTP in Internet Banking
Although OTP is one of the strongest authentication method to protect against account theft, there is always a possibility of account theft due to the evolution of hacking methods. The MITM (Man-in-the-Middle) attack accesses data transaction paths, and snatches information in the middle of data transactions. The MITB (Man-in-the-Browser) attack refers hacking methods which use malignant programs in the web browser. The MITPC (Man-in-the-PC) attack abuses the vulnerability of hardware environment or the operating system. It controls and attacks every data access path in the OS including the key input, mouse movement, screen, and even memories [18].

SP800-90 (A, B, C)
SP800-90A describes a PRNG that uses approved cryptographic algorithm. This makes it impossible to distinguish between an ideal random sequence and a pseudo random sequence without infinite computational power. It is designed with a fixed finite security strength to measure the workload required to attack PRNG. Ref. [19] SP800-90B describes mechanisms and entropy sources for obtaining unpredictable bits through non-deterministic processes. The entropy value of this source is equal to the Security Strength. Ref. [20] SP800-90C describes the overall RNG structure combining 90A and 90B, as shown in Figure 1 [12].

OTP Mechanism
OTP is generated following the procedure in Figure 1. The OTP generation process can be roughly classified into three steps [21]: 'Association information generation', 'generation algorithm', and 'extraction algorithm'. This is very similar to the process of generating pseudo random numbers used in cryptography [22]. Figure 1 shows similar parts of the two processes in the same color.

Association Information Generation
Association information means to a random value that can be collected from time, event information, and the like. This may be a raw value, depending on the collection method, or it may be a random number converted through additional processes such as hashing and encryption. Association information is similar to the noise source (or entropy source) needed to generate cryptographic pseudo random numbers and is necessary for the unpredictability of OTP. This process corresponds to the red part of Figure 1.

Generation Algorithm
This is a cryptographic algorithm used to generate OTP. This encrypts the association information to produce a 20-byte bit string [23]. The Korean OTP standard [24] introduces the cryptographic algorithm used in this step. Since the generation algorithm is deterministic, it generates the same ciphertext from the same association information. The cryptographic algorithms used should provide a security strength of at least 112 bits [21,23,25]. Different OTP values may be generated according to extraction algorithms in one same bit string. This process corresponds to the blue part of Figure 1.

Extraction Algorithm
This is an algorithm for extracting 3-byte data from the cipher string for use as OTP. This can be divided into static and dynamic algorithms (RFC4226 [23], an OTP-related standard, specifies only dynamic algorithms, and the Telecommunication Technology Association (TTA)'s standard [21] specifies static, dynamic, and improved dynamic algorithms.). This algorithm uses 'extraction information' to specify the location to extract. It also uses the value as the 'extraction data' at the location specified by the extraction information. The extraction information may use a value of a specific area of the cipher text or a predefined value. Then, the extracted 3-byte hexadecimal value is encoded in decimal so that humans can recognize it. At this time, the extracted 3-byte range is [0, 16777215] (i.e., [0, 2 24 − 1]), but the top two digits have little change. Therefore, the two digits are discarded and only the lower six digits are used (Only 0 ∼ 1 for 10 7 , only 0 ∼ 6 for 10 6 ); so the final range is [0, 999999]. This process corresponds to the yellow part of Figure 1.
The static algorithm extracts data by specifying in advance the extraction information (the offset of the cipher text) to be used [21].
An example is shown in Figure 2: Data is extracted by specifying the extraction information as the first starting point of the cipher text. It extracts 3-byte contiguous data from the extraction information. If the result of the extracted data is 0x1A2B3C, the number is 1,715,004 when expressed as a decimal number, and the final output OTP is derived as 715,004 corresponding to the lower 6 digits of the decimal number. The dynamic algorithm extracts data by obtaining extraction information from the cipher text [21,23]. About this example is shown in Figure 3: The value of the lower 4-bit of the lowest byte (i.e., 19th) in the cipher text is used as extraction information. The extracted data is contiguous 3-byte with an index of the extraction information as a starting point. If the result of the extracted data is 0xE7F809, that's decimal is 15,202,313, the final output OTP is derived as 202,313 corresponding to the lower 6 digits. The improved dynamic algorithm is used when generating OTP using a chip used as a smart card or USIM [21]. Figure 4 shows an improved dynamic algorithm; it extracts three bytes from two or more pieces of extraction information in the cipher text. Extraction information of this algorithm is used to designate specific area of cipher text. If the value of the lower 4-bit of the lowest byte is used as the first extraction information (namely, LSB 4-bit 0x0D, the lowest byte 0x4D of the cipher text), then the data 0xE7 at that location (0x0D) is used as the first extraction data. And the second extraction information is 0x07, the value of the lower 4-bit of the first extraction data. The same method is repeated to select 3-byte extracted data. Final OTP is 171,883 which is the lower 6 digits of 15,171,883 that converts the extracted data 0xE7812B to decimal.

Vulnerability in OTP Generation Algorithm
The pseudo random number generation algorithm specified in the TTA's standard "One-time password (OTP) encryption key management security requirements" [24] is shown in Table 1.

Classification Standard Algorithms That Specified
Hash Micali Schnorr-DRBG Table 2 shows the security strength defined by the NIST [25]. Currently, Korea's OTP standard recommends to provide more than 112-bit security strength as a cryptographic algorithm that can be used to generate OTP. As shown in Table 2, SHA1 has a security strength of 80-bit or less, so NIST prohibits its use for key derivation and random number generation. However, the TTA standard still includes SHA1, which is unsuitable for using OTP; since HAS-160 is also a hash function built on SHA1, its security level is similar to SHA1. Therefore it should not be used anymore. In addition, SHA1 and HAS-160 are currently excluded from the National Intelligence Service (NIS)'s approved algorithm list, as are AES and Triple-DES. In order to proceed with the procedure of generating and communicating random numbers between entities for the purpose of authentication, the NIS's Approved algorithm should be used, but this is not currently observed in the OTP standard.

Vulnerabilities in OTP Extraction
The improved dynamic extraction algorithms can be vulnerable to OTP generation. Figure 5 illustrates a situation where a vulnerability can occur in that case. The improved dynamic algorithm derives new extraction information from the location indicated by the value based on the predetermined extraction information. However, in this process, if the first extraction information and the next one are the same, all three bytes of the extraction data are duplicated. For example, when the least significant byte arr [19] of the cipher text is 0x42, the extraction information is 0x02, which is LSB 4-bit, and the first extraction data is arr [2]. Then, LSB 4-bit of arr [2] is the second extraction information. At this time, if the LSB 4-bit of arr [2] is also 0x02, the second extracted data is arr [2] again. The third and last process is the same. As a result, a total of three bytes are duplicated. Example output is arr [2]||arr [2]||arr [2]. If all the extracted data values are duplicated in the above situation, OTP range is a set of elements that is arr In this case, the probability that a duplicate situation occurs in the extraction algorithm can be obtained as follows. Since the bit string is the output of the encryption algorithm, i.i.d is satisfied between all bits. That is, each bit is independent and the probability of getting 0 or 1 is 1 2

. Pr[All extraction information is duplicated] is
Pr[[arr [19] Pr[[arr [19] In addition, the probability through the total number of cases can be obtained.

Pr[All extraction information is duplicated]
= the number of cases where the lower 4-bits of the two positions are the same the number of total cases of 160-bit string This means that whenever a random cipher text 20-byte (160-bit) is generated, there is a 1 16 probability that the above duplication occurs, resulting in a vulnerability that reduces the size of the range (That is, the probability that the extracted information is duplicated is equal to the probability that the range is reduced.). Similarly, even if the data is duplicated only 2 times instead of 3 times, this process reduces the range and increases the predicted probability. Moreover, when converting from extracted 3-bytes (24-bits) to 6-digit decimal number, the ideal probability that each appears as output in the range is as follows. Let N ≥ m ≥ 1 be integers, and let (q, r) = IntDiv(N, m) (or N = mq + r). And let the random variable X be uniformly distributed over Z N . For z in Z m , let: P {N,m} (z) = Pr[X mod m = z : X randomly picked in Z N ] [23].
Then for any z in Z m , Similarly, Pr[X mod 10 6 = z: X randomly picked in Z 2 24 ] is In other words, the ideal probability is approximately 1/2 20 . On the other hand, if the above-mentioned vulnerability occurs, the probability of each element in the set R is converted is as follows. Pr[X mod 2 8 = z: X randomly picked in Z 2 24 Since there is no situation where the integer r satisfies 0 ≤ r < 0, it is only in the range 0 ≤ z < 256. In other words, the ideal probability is 1 2 8 . Therefore, if the above mentioned vulnerability occurs, then the probability of each number appears is about 4000 times higher than that of an ideal situation, which occurs in 1 out of 16 cases. As mentioned in the first part of this paper, OTP is used for authentication purposes in various ways, and is particularly widely used in online banking. If an attacker attempts to attack an online banking system that follows this vulnerability, the attacker can significantly reduce the number and time of OTP predictions. In general, an authentication system that performs user authentication through OTP defines Time-To-Live (TTL); this limits the OTP value to be valid only for a specific time. Therefore, in order to succeed in the attack, he must find a valid value within the timeframe. According to the above analysis, the probability of an attacker succeeding in prediction increases significantly from 1 2 20 to 1 2 8 . This extraction method is vulnerable in terms of unpredictability, although specified in the standard. The standard for OTP-related TTA [21] was issued in 2012, and even RFC 2289 [22], 4226 [23], and 1760 [1] were published between 1995 and 2005 and have not been updated to date. Standards for generating and applying cryptographic random numbers, such as NIST's SP800-90 series [12,19,20], have only been steadily revised in recent years. In 2015 and 2018, 90A and 90B were released as final versions, and 90C is still in draft. Thus, in the field, the period focused on various properties of cryptographic random numbers (predictability and bias, independence, entropy, etc.) is not long. And only recently has the trend to deal with the security of random numbers in depth. Therefore, looking at the OTP standards of RFCs and TTAs from a modern point of view can reveal vulnerabilities that have not been identified in the past.
To solve this problem, we suggest the following solutions.
• Use of static or dynamic algorithm: Rather than using an improved algorithm that can cause vulnerabilities, using a static or dynamic algorithm is suitable for information security by supplementing the randomness and unpredictability of OTP. • Increases the number of LSB digits used as extraction information, and increases the length of the output bit string accordingly: Since 16 offsets can be selected using LSB 4-bit extraction information, the length of the output bit stream is used as 20-bytes (= 16 + 4). If this is used as LSB 5-bit or 6-bit, the offset that can be selected increases to 32 or 64, so the length of the output bit string is 36-bytes(= 32 + 4) and 68-bytes(= 64 + 4). In this case, the probability of offset overlapping decreases by 1 2 each time the bit is increased by one. (For example, Pr[All extraction information is duplicated] for 5-bit = 1 32 , Pr for 6-bit = 1 64 ) • Increase the amount of extraction to 4-byte: When the amount of extraction is increased from 3-byte to 4-byte, the range is [0, 16777215] (i.e., [0, 2 24 − 1]) to [0, 4294967295] (i.e., [0, 2 32 − 1], About 4 billion cases). Therefore, OTP can be used up to 7 and 8 digits instead of 6 digits. In this case, since [0, 10 7 − 1] and [0, 10 8 − 1] can be used, the predictable probability decreases by 1/10 each time by one digit increment.

Experiment 1: Does This Extraction Vulnerability Really Happen?
To verify that this vulnerability actually occurs depending on the extraction algorithm, the following two experiments were conducted. Figure 6 shows an experiment to check if duplicate 3-bytes actually occur frequently when different extraction algorithms are applied to the same bit string by encryption algorithm. The experiment proceeds as follows. In order to continuously obtain 3-bytes of different OTP data, an arbitrary bit sequence (2 25 -bit, or 2 22 -byte) is input to the block cipher AES [26], and a continuous cipher sequence is generated accordingly. Data that does not apply any extraction method to this string is called a pure cipher text sequence. In this bit string, each static/dynamic/improved algorithm is applied every 20-byte to extract 3-byte. It repeats this to accumulate 1,000,000-byte per algorithm. And this was done by applying NIST's statistical randomness test [27] for each data set. The experimental group is the Improved algorithm sequence, and the control groups are the pure cipher text sequence before OTP data extraction, the static algorithm sequence, and the dynamic algorithm sequence.
The result is shown in Figure 7. All of the control group passed every statistical random number test, but the experimental group did not. Although this statistical test cannnot entirely a substitute for cryptographic analysis, it can be useful at least as a first step in determining whether a pseudo random number generator is appropriate for a particular cryptographic application. The experimental group did not pass a number of tests such as BlockFrequency, Longest Run, Fourier Transform Test, etc., and did not cross the minimum margin line to determine whether it could be used as a cryptographic pseudo random number. Therefore, it can be judged that this data set is very weak to use as a cryptographic pseudo random number.   Figure 8, it can be seen that 256 cases of 3-byte overlap frequently occur from 0x00 to 0xFF.    Figure 10. This checks how often the phenomenon shown in Experiment 1 occurs. It extracts 3-bytes from contiguous cipher text with static, dynamic, and improved algorithms, and converts them to decimal. The samples in the experiment are the result of calculating the bit stream that generated by each algorithm from mod 10 2 to mod 10 6 (the range size is increased by 10 times). At this time, and 2 20 samples were examined for each algorithm. For example, converting 3-byte data 0xE7812B of OTP to decimal is 15,171,883, then mod 10 2 result is 83, and possible range is [0, 10 2 − 1]. Or mod 10 4 result is 1883; the possible range is [0, 10 4 − 1]. Similar to before, the experimental group is an improved algorithm data set, and the controls are static/dynamic algorithm data sets. Since this experiment is a process of checking the frequency of a specific decimal number after extraction, a pure cipher text sequence is not included in the control group. The results of truncation of the values obtained by each algorithm for digits are summarized in the following Table 3.

Results
The random variable X represents the frequency of occurrence of each "element" of range. Here, element means a decimal number obtained through each extraction algorithm. For this experiment, we collect a total of 2 20 elements and check the frequency of each decimal number. E(X) is the expected value. If the probability of occurrence of all elements is uniform, the frequency of each element in the range should be distributed equally by E(X). In addition, if the probability is uniform, the maximum number of element(s) in the range that occur most in the sample, MAX(X), should not be much different from E(X). In all ranges, the control group's MAX(X) is close to E(X) and the difference between each control group (static, dynamic) is 5 or less. The experimental group also shows a similar tendency to the control group when the range was less than or equal to [0, 10 2 − 1]. However, the gap between the experimental group and the control group becomes wider when the range is greater than or equal to [0, 10 3 − 1]. The MAX(X) of the experimental group differs from its E(X) by more than 200 ∼ 300. Similarly, the difference in MAX(X) between the control group and the experimental group can be seen that the difference is about 300.
Based on the above analysis, we present Threshold H, an experimentally obtained value that can clearly classify experimental and control groups. The samples in the control group appear to be close to the expected values, while the samples in the experimental group differ significantly from E(X). Therefore, the improved algorithm could be identified from the three algorithms using the threshold values for each range. If the improved algorithm is used, OTP is selected from the reduced set R (|R| = 256) with a probability of 1/16, which maps many times to 256 elements. Evidence for this is that the number of elements larger than the threshold (#(X > H)) is 256 (in bold), and even one element in the rest of the controls does not exceed the threshold.

Discussion
Based on this, we define the eccentricity, e, as a relative measure of how far it is from normal, as

MAX(X) H
, which gives us the last column in Table 3. The larger the range, the closer e becomes to 0 in the control group, but it becomes larger than 2 in the experimental group. This means that the closer you get to the OTP that people use in real life, the less random the data in the Improved algorithm is. In other words, OTP can provide sufficient randomness because the greater the range, the smaller the uniform probability of the elements that can be mapped. However, when using this (improved) algorithm, the range is frequently reduced, leading to a sharp increase in the probability that certain 256 elements will appear.
Combining Experiments 1 and 2, if an attacker wants to predict the value of OTP using the improved algorithm, the value of OTP can be predicted with probability 1 2 8 , which is much higher than 1 2 20 . From the point of view of "Generating secure OTP values", in order to alleviate the problems presented in Vulnerabilities 1 and 2 and Experiments 1 and 2 against them, we recap the solution we proposed.
• Enhance the generation algorithm: When generating an OTP, a cryptographic algorithm that provides security strength suitable for the current era should be used. As shown in Table 2 above, there are methods to use hash functions such as SHA2 and SHA3 that provide 112-bit or higher security strength, or block ciphers such as AES-128,192,256. It is also okay to use standard cryptographic algorithms defined by the country that implements the OTP service;in Korea, LEA-128,192,256, ARIA-128, 192, 256, etc. can be used. • Use of static or dynamic algorithm: We should not use the improved algorithm. Using this algorithm increases the probability of predicting the value of OTP. In cryptosystems, OTP is widely used because of its advantages; the value is valid only once, has randomness, and can not predict the next value. However, if the values can be easily predicted as in the results of Experiments 1 and 2, the meaning of using this will fade. Therefore, static and dynamic algorithms should be used instead of improved algorithms that eliminate these advantages. • Increase the amount of extraction to 4-byte: Increase the OTP extraction range. This way, we can also use digits where the number change is not large, so that we can reduce the probability of prediction each time we increase one digit. When the amount of extraction is increased from 3-byte to 4-byte, the range is [0, 16777215] (i.e., [0, 2 24 − 1]) to [0, 4294967295] (i.e., [0, 2 32 − 1], About 4 billion cases). Therefore, OTP can be used up to 7 and 8 digits instead of 6 digits. In this case, since [0, 10 7 − 1] and [0, 10 8 − 1] can be used, the predictable probability decreases by 1/10 each time by one digit increment. • Increases the number of LSB digits used as extraction information, and increases the length of the output bit string accordingly: Since 16 offsets can be selected using LSB 4-bit (2 4 ) extraction information, the length of the output bit stream is used as 20-bytes (=16 + 4). Here, adding 4 to the length to be extracted has this meaning. Suppose that when the dynamic algorithm is used, the maximum value that can be expressed in 4 bits is 15.At this time, if 3-byte data is extracted, it becomes arr [15]||arr [16]||arr [17], and if 4-byte is extracted, it becomes arr [15]||arr [16]||arr [17]||arr [18]. If 4 is not added to the length, there is insufficient extraction data and a memory overflow occurs, or the data at the last offset used for extraction information is accessed again. That way, the data is not independent. When LSB is used as 5-bit or 6-bit, it is also appropriate to generate at least 4-byte additionally. In this case, the probability of offset overlapping decreases by 1 2 each time the bit is increased by one.

Conclusions
OTP is used for various purposes such as multi-factor authentication [28]. Although 5G and next-generation cryptography are being studied, it is still a reality to trust and use legacy cryptosystems. However, OTP standards in Korea have not been updated since enactment in early 2010, and even when cryptographically secure algorithms (satisfying security strength) are used in the OTP generation process, the vulnerability can occur as the result of this paper during the extraction process. We presented and analyzed the weaknesses of the OTP system, which is still widely used by many people in daily life. Both the process of extracting a small bit and the process of converting it to a decimal are computing processes included for human convenience. We insist that the processes applied for Human-Centered computing can cause critical defects. We also contributed to reflect on the security perspectives that current OTP systems should consider by providing solutions to address the vulnerabilities we claim. In fact, neither the system was critically attacked by the vulnerability mentioned in this paper, nor did the problem occur. Even so, you should be aware that problems can arise, and you should consider these problems in authentication systems, which are currently vulnerable situations. It is also necessary to revise the standard, and ongoing research is needed to ensure that existing authentication methods such as OTP can be trusted and used; until one of the multi-factor authentication technologies has been replaced.