0 Votes
Last modified by Сергей Коршунов on 2023/01/04 00:07

Show last authors
1 = How to check an hard drive health from the command line using smartctl =
2
3 The **smartmontools** package is generally available in the default repositories of all the major Linux distributions. It contains two utilities useful to check the status of storage with **S.M.A.R.T** support (//Self Monitoring Analysis and Reporting Technology//): **smartcl** and **smartd**. The former is the utility we use directly to check S.M.A.R.T attributes, run tests, or perform other actions; the latter is the daemon which can be used to schedule operations in the background. In this tutorial we will learn the basic usage of **smartctl**.
4
5 **In this tutorial you will learn**:
6
7 * How to install smartmontools package on various distributions
8 * What are the differences between the S.M.A.R.T self-tests
9 * How to use smartctl to check the health of a storage device
10 * How to run tests on a storage device from the command line
11
12
13
14
15 How to check an hard drive health from the command line using smartctl
16
17 == Software requirements and conventions used ==
18
19 Software Requirements and Linux Command Line Conventions
20
21 |=Category|=Requirements, Conventions or Software Version Used
22 |System|Distribution independent
23 |Software|The smartmontools package (see instructions)
24 |Other|Root permissions
25 |Conventions|# – requires given [[linux-commands>>url:https://linuxconfig.org/linux-commands]] to be executed with root privileges either directly as a root user or by use of sudo command
26 $ – requires given [[linux-commands>>url:https://linuxconfig.org/linux-commands]] to be executed as a regular non-privileged user
27
28 == Installation ==
29
30 As mentioned before the **smartmontools** package is available in the repositories of all the major Linux distributions, therefore all we have to do to install it, is to use our favorite package manager. If you are running on Debian or one of its derivatives, like Ubuntu or Mint, for example, you can run:
31
32 {{{$ sudo apt-get update && sudo apt-get install smartmontools
33 }}}
34
35 On recent versions of Red Hat Enterprise Linux, CentOS and Fedora we can use **dnf**:
36
37 {{{$ sudo dnf install smartmontools
38 }}}
39
40 If Archlinux is your favorite distribution, you can use **pacman**:
41
42 {{{$ sudo pacman -S smartmontools
43 }}}
44
45 ----
46
47 ----
48
49 == Checking if SMART is enabled ==
50
51 Let’s become familiar with the **smartctl** utility. The first thing we want to check is if S.M.A.R.T support is active on the device. To perform this operation we can run the smartctl utility with the -i option (short for ~-~-info):
52
53 {{{$ sudo smartctl -i /dev/sda
54 }}}
55
56 The output of the command is the following:
57
58 {{{=== START OF INFORMATION SECTION ===
59 Model Family: Western Digital Red
60 Device Model: WDC WD10EFRX-68FYTN0
61 LU WWN Device Id: 5 0014ee 20c672def
62 Firmware Version: 82.00A82
63 User Capacity: 1,000,204,886,016 bytes [1.00 TB]
64 Sector Sizes: 512 bytes logical, 4096 bytes physical
65 Rotation Rate: 5400 rpm
66 Device is: In smartctl database [for details use: -P show]
67 ATA Version is: ACS-2 (minor revision not indicated)
68 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
69 Local Time is: Thu Sep 24 18:13:19 2020 CEST
70 SMART support is: Available - device has SMART capability.
71 SMART support is: Disabled
72 }}}
73
74 We can see that basic information are displayed such as the device family, model, sector sizes, etc. What interests us the most, however is the content of the last two lines. From there we can see that the device has SMART capabilities and that, in this case, SMART support is disabled. What if we want to enable it? All we have to do is to run **smartctl** with the -s option, using “on” as argument:
75
76 {{{$ sudo smartctl -s on /dev/sda
77 smartctl 6.6 2017-11-05 r4594 [armv6l-linux-5.4.51+] (local build)
78 Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
79
80 === START OF ENABLE/DISABLE COMMANDS SECTION ===
81 SMART Enabled.
82 }}}
83
84 == Getting familiar with smartctl ==
85
86 To get all the available SMART information about a storage device, we can launch the utility with the -a option (short for -all) and of course pass the path of the device we want to check as the last argument of the command. Suppose we want to check the current status of the /dev/sda device; we would run:
87
88 {{{$ sudo smartctl -a /dev/sda
89 }}}
90
91 The command above produces a lot of output. Among the other things, we can see the status of various SMART parameters:
92
93 {{{SMART Attributes Data Structure revision number: 16
94 Vendor Specific SMART Attributes with Thresholds:
95 ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
96 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
97 3 Spin_Up_Time 0x0027 135 125 021 Pre-fail Always - 4216
98 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 941
99 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
100 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
101 9 Power_On_Hours 0x0032 085 085 000 Old_age Always - 11285
102 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
103 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
104 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 446
105 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 108
106 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 4258
107 194 Temperature_Celsius 0x0022 111 099 000 Old_age Always - 32
108 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
109 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
110 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
111 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
112 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
113 }}}
114
115 Very important parameters to check are, among the others, “Reallocated_Sector_Ct” and “Current_Pending_Sector”. In both cases if the **RAW_VALUE** is something other than 0, we should be very careful and start to backup data on the hard drive. The **Reallocated_Sector_Ct** is the count of sectors on the block device which cannot be used correctly.
116
117 When such a sector is found it is remapped to one
118 of the available spare sectors of the storage device, and data contained in it is relocated. The **Current_Pending_Sector** attribute, instead, is the count of bad sectors that are still waiting to be remapped. If you want to know more about the S.M.A.R.T attributes and their meaning, you can begin to take a look at the [[wikipedia S.M.A.R.T page>>url:https://en.wikipedia.org/wiki/S.M.A.R.T.]].
119
120 In the output we can also see a log of the tests performed on the device:
121
122 {{{SMART Self-test log structure revision number 1
123 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
124 # 1 Short offline Completed without error 00% 9590 -
125 # 2 Short offline Completed without error 00% 2941 -
126 # 3 Extended offline Completed without error 00% 21 -
127 # 4 Short offline Completed without error 00% 18 -
128 # 5 Short offline Completed without error 00% 0 -
129 # 6 Short offline Completed without error 00% 0 -
130 }}}
131
132 In the **Test_Description** column, we can see various kind of tests were run, and all of them were completed without error. In the next section we will see what are the differences between them and how to actually launch a test on a storage device.
133
134 == Available SMART tests ==
135
136 The **smartctl** utility can be used to launch a variety of self-tests:
137
138 * short
139 * long
140 * conveyance (ATA devices only)
141 * select (ATA devices only)
142
143 Let’s quickly see what are the differences between them.
144
145 The **short** test is meant to quickly check the most common problems that could be found on a storage device. The test should take no more than 10 minutes: mechanical, electrical and read performances of a disk are checked.
146
147 The **long** test is basically a more accurate version of the “short” test. In can take a lot of time to complete: as stated in the the smartctl manual, it can last from tens of minutes to several hours.
148
149 The **conveyance** test is meant to check for possible damages occurred during the transportation of the device. It usually takes minutes to complete a conveyance test. It is available only on ATA devices.
150
151 The **select** test, like the “conveyance” one, is available only on ATA devices, and is meant to check only the specified range of LBAs (Logical Block Addresses). The range of addresses is specified when launching the test. For example, to check addresses from 10 to 20 (inclusive), we would run:
152
153 {{{$ sudo smartctl -t select,10-20
154 }}}
155
156 It is possible to specify a maximum of 5 different ranges of LBAs to check by repeating the -t option:
157
158 {{{$ sudo smartctl -t select,0-5 -t select,5-10
159 }}}
160
161 ----
162
163 ----
164
165 The -t option is the short for ~-~-test and is used to execute a test immediately.
166
167 == Running a test ==
168
169 We saw what are the possible tests we can run with the **smartctl** utility. Now let’s see how to actually launch a test. As we saw in the end of the previous section, the -t option is used to run a test immediately; we must provide the type of test we want to run as argument of the option. To execute a **short **test on the /dev/sda device we would run:
170
171 {{{$ sudo smartctl -t short /dev/sda
172 smartctl 6.6 2017-11-05 r4594 [armv6l-linux-5.4.51+] (local build)
173 Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
174
175 === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
176 Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
177 Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
178 Testing has begun.
179 Please wait 2 minutes for test to complete.
180 Test will complete after Thu Sep 24 14:39:05 2020
181
182 Use smartctl -X to abort test.
183 }}}
184
185 The output of the command reports the time we should wait for the test to finish and the date and time when it should be complete. After the specified time interval, to check the results of the test we can run:
186
187 {{{$ sudo smartctl -a /dev/sda
188 }}}
189
190 As you can notice the test (The first in the list – #1) and its results have been added to the log list. It was completed without errors:
191
192 {{{SMART Self-test log structure revision number 1
193 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
194 # 1 Short offline Completed without error 00% 11286 -
195 # 2 Short offline Completed without error 00% 9590 -
196 # 3 Short offline Completed without error 00% 2941 -
197 # 4 Extended offline Completed without error 00% 21 -
198 # 5 Short offline Completed without error 00% 18 -
199 # 6 Short offline Completed without error 00% 0 -
200 # 7 Short offline Completed without error 00% 0 -
201 }}}
202
203 It is possible to know the estimated time a test would take to finish. Such information should be included in the output of the smartctl -a /dev/sdx command, but can be requested explicitly by launching **smartctl** with the -c option (short for ~-~-capabilities). The following are the interesting lines in the output:
204
205 {{{$ sudo smartctl -c /dev/sda
206 [...]
207 Short self-test routine
208 recommended polling time: ( 2) minutes.
209 Extended self-test routine
210 recommended polling time: ( 157) minutes.
211 Conveyance self-test routine
212 recommended polling time: ( 5) minutes.
213 [...]
214 }}}
215
216 Let’s run a conveyance test, now:
217
218 {{{$ sudo smartctl -t conveyance /dev/sda
219 }}}
220
221 We wait 5 minutes, and then check the results. As expected the test now appears in the list, and luckily no errors were found:
222
223 {{{SMART Self-test log structure revision number 1
224 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
225 # 1 Conveyance offline Completed without error 00% 11286 -
226 # 2 Short offline Completed without error 00% 11286 -
227 # 3 Short offline Completed without error 00% 9590 -
228 # 4 Short offline Completed without error 00% 2941 -
229 # 5 Extended offline Completed without error 00% 21 -
230 # 6 Short offline Completed without error 00% 18 -
231 # 7 Short offline Completed without error 00% 0 -
232 # 8 Short offline Completed without error 00% 0 -
233 }}}
234
235 ----
236
237 ----
238
239 Now, for a simple **select** test:
240
241 {{{$ sudo smartctl -t select,100-150 /dev/sda
242 smartctl 6.6 2017-11-05 r4594 [armv6l-linux-5.4.51+] (local build)
243 Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
244
245 === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
246 Sending command: "Execute SMART Selective self-test routine immediately in off-line mode".
247 SPAN STARTING_LBA ENDING_LBA
248 0 100 150
249 Drive command "Execute SMART Selective self-test routine immediately in off-line mode" successful.
250 Testing has begun.
251 }}}
252
253 This test is successfully completed:
254
255 {{{SMART Self-test log structure revision number 1
256 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
257 # 1 Selective offline Completed without error 00% 11287 -
258 # 2 Conveyance offline Completed without error 00% 11286 -
259 # 3 Short offline Completed without error 00% 11286 -
260 # 4 Short offline Completed without error 00% 9590 -
261 # 5 Short offline Completed without error 00% 2941 -
262 # 6 Extended offline Completed without error 00% 21 -
263 # 7 Short offline Completed without error 00% 18 -
264 # 8 Short offline Completed without error 00% 0 -
265 # 9 Short offline Completed without error 00% 0 -
266 }}}
267
268 Again, the results of the tests are included in the output generated when smartctl is launched with the -a option; if one wants to focus only on logs, instead, he/she can use the -l option (~-~-log) and specify what kind of logs should be displayed. To display only **error** logs, one would run:
269
270 {{{$ sudo smartctl -l error /dev/sda
271 }}}
272
273 To include also **selftests** logs, instead:
274
275 {{{$ sudo smartctl -l error -l selftest /dev/sda
276 }}}
277
278 When **smartctl** is launched with the -a option the **error**, **selftests **and **selective** logs are included in the output for ATA devices.