I assume by saying cluster you mean Windows' allocation unit, commonly called block in Linux world. It's a unit of data storage a filesystem uses. A cluster/block may consist of or correspond to multiple sectors on underlying storage media (HDD or flash storage). Sector is the minimum atomic unit of storage an OS can read or write. Block size is decided at the time of filesystem creation i.e. when we format a partition (
high-level formatting). Partitions and filesystems make it easy to categorize and organize our data on the physical storage device.
HARD DISK DRIVES
"Sector" may refer to a physical sector or logical sector. In early days of HDDs there were only physical sectors created by dividing clusters/tracks on rotating disks into small parts. Cylinder/Head/Sector (CHS) has been a popular method of addressing before the Logical Block Addressing (LBA) was devised. Instead of directly dealing with physical sectors, OS now talks to disk controller firmware (through SATA/SCSI commands) referring an LBA number. Firmware in return maintains an LBA to CHS mapping itself, which includes ECC, G-list (disk's defect table) etc. This mapping (1:1 / sequenced / linear) (ref) is created during
low-level formatting of disk drive when manufacturing, which never changes except if some sector is marked bad and remapped to some spare sector. So the OS / filesystem is aware of the physical geometry of the disk, which is propotional to the geometry of LBAs.
On flash media (SSDs, eMMC, UFS, SD cards etc.) there are no rotating disks and hence no cylinders. NAND flash is made of silicon cells, each cell consists of one (Single-Level-Cell), two (Multi-LC), three (Triple-LC) or four (Quad-LC) bits. Cells are grouped into pages (e.g. of 4KB) and pages into erase blocks (e.g. of 128 KB). LBAs to Physical Block Addresses mapping is fully controlled by Flash Translation Layer; a part of flash controller firmware. OS knows nothing about it, it can see at maximum the LBAs, not what's happening below it. Not even the ECC of failed memory cells, and that's why we don't realize the bad health of eMMC unless it fails, except by reading
EXT_CSD (requires root) using mmc-utils or from
/sys/class/mmc_host/*/*/life_time (if driver supports).
Since a page of memory can't be just overwritten unlike HDDs, a whole EraseBlock has to be Erased first before being Programmed (written). A side effect is that a number of pages are erased/re-written and the physical mapping changes even if a small file is edited. This unnecessary read-modify-write (RMW) is called Write Amplification. On HDDs, files aren't physically replaced unless shortened or elongated. OS is aware of these physical changes on HDD, but not on flash memory.
WHAT IS SECTOR?
So after all what we are concerned with is logical sector. Storage media informs the OS of its logical sector size but xe2x80x9cthe default of 512 covers most hardwarexe2x80x9d because HDDs have been using
512B from early days, though things have changed with 4Kn format for HDDs too.
512B is the size what OS sees, in actual it's a bit larger to make some room for header, ECC etc. Physical sector size on flash storage is of no use to us.
I would like to know if a certain file moves on the Flash memory (internal storage), between writes/updates to it.
It depends on the definition of xe2x80x9cmovexe2x80x9d. On flash storage, even if you don't write to a file, it may keep on changing its true physical location due to background Garbage Collection; a phenomenon controlled by FTL internally to reduce WA, to achieve Wear Leveling and to provide high write throughput (Program operations) by deleting invalid pages in background (Erase operations).
Is there some way I can get the file's location on the Flash memory (Cluster Number, Sector Number etc.)?
Yes you can get the filesystem block addresses of a file which have a linear mapping with LBAs of underlying block device (partition). But these addresses aren't the actual/physical file's location on the Flash memory. However usually one isn't concerned with true physical location unless some forensics or data recovery is involved.
~# cat /sys/block/mmcblk0/queue/logical_block_size 512 ~# blockdev --getss /dev/block/by-name/cache 512 ~# tune2fs -l /dev/block/by-name/cache | grep 'Block size' Block size: 4096
So the sector size here is 512B while filesystem block size is 4KiB. Let's create a test file:
~# echo foobar >/cache/test_file ~# cat /cache/test_file foobar ~# filefrag -sv -b512 /cache/test_file Filesystem type is: ef53 File size of /cache/test_file is 7 (8 block of 512 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 7: 307200.. 307207: 8: last,eof /cache/test_file: 1 extent found
debugfs -R 'stat test_file' /dev/block/by-name/cache can also be used in place of
Created file (of 7 bytes size) occupies 1 filesystem block.
-b 512 converts block size (4096B) to sector size (512B). "test_file" should be at 307200th sector, same for partition and filesystem because filesystem occupies whole partition:
~# blockdev --getsize64 /dev/block/by-name/cache | awk '$1 /= 4096' 65536 ~# tune2fs -l /dev/block/by-name/cache | grep 'Block count' Block count: 65536
Let's read the file directly from partition:
~# dd if=/dev/block/by-name/cache skip=307200 count=1 | head -c7 foobar
It's there. Now locate the file from the start of eMMC:
~# readlink /dev/block/by-name/cache /dev/block/mmcblk0p25 ~# cat /sys/block/mmcblk0/mmcblk0p25/start 7471104 ~# dd if=/dev/block/mmcblk0 skip=$(( 7471104 + 307200 )) count=1 | head -c7 foobar
So even if filesystem and partition are deleted, you can read the file (provided that it's not overwritten).
Let's do some more digging:
~# rm /cache/test_file; sync; echo -n 1 >/proc/sys/vm/drop_caches ~# dd if=/dev/block/mmcblk0 skip=$(( 7471104 + 307200 )) count=1 | head -c7 foobar
File is deleted from filesystem, but physically still there. Let's ask FTL to delete it permanently:
~# fstrim /cache; sync; echo -n 1 >/proc/sys/vm/drop_caches ~# dd if=/dev/block/mmcblk0 skip=$(( 7471104 + 307200 )) count=1 | head -c7
And it's gone. But most probably it's still there somewhere in Over-Provisioning Space, scheduled to be Erased in next GC, just we don't know where it is.
dd on naked partitions is a killer. Be cautious!
filefrag are part of e2fsprogs.
filefrag isn't shipped with Android, build from source or try this one.
fstrim is a busybox applet.
- When should fstrim run?
- What makes recovery of deleted data difficult / impossible?