linuxian: Finding

Twice a year, the Top500 Project publishes its list of the fastest supercomputers in the world. In the last announcement, we continue to see Linux dominating the list. This is nothing new since Linux has been dominating since the mid-2000s. In fact, Linux share in supercomputing looks a lot like Microsoft’s historical share of the desktop market. I thought it would be interesting to take a step back and look at the performance capability of these computers as a whole and also how the rise of Linux is mirroring the geographical expansion of supercomputers.

Everybody tends to watch the number of Linux systems on the top500, but there’s a fascinating story being told by the Rmax performance numbers (Rmax is the maximum performance of a computer (measured in Gflop/s) achieved in the HPL benchmark. In many ways, this is a much more enlightening statistic, because it shows us the overall nature of performance on this list, instead of just focusing on individual computers. (This time around, five Linux systems were actually bumped off the bottom of the list, even though Linux’s *total* computing power grew by 38%.)

Linux dominance in the overall compute power of this list isn’t surprising since Linux is used in every one of the top ten computers, and especially when you consider that there is a full order of magnitude difference in performance between the first machine and the twelfth machine, which only increases as you move down the list. The first non-Linux system shows up at number 40.

Supercomputing power was on the rise well before Linux arrived, but when you look historically, it was Linux-powered machines that really caused the big ramp in the mid 2000s. In fact, when you graph the historical Rmax results of the top500 by OS, you can see that not only has supercomputing gone almost entirely to Linux, it’s also been the only OS driving the exponentially rising curve since 2005.

Next let’s look at where this is happening. This time Fujitsu of Japan tops the list. We have also seen players from China and Europe entering the fray. What’s really interesting is when you look at how this is distributed worldwide, and the role that Linux (including Linux machines that are classified as “Mixed,” like BlueGene) plays in making this happen.

It’s not surprising that the list has become very geographically diverse over time. What’s interesting is that this, too, is being driven almost entirely by Linux. In the graph below, all of the colored segments reflect the computing power deployed on “Linux” and “Mixed” machines in countries around the world. Blue is the US, white is Japan, red is China, orange is France, yellow is Germany, and so on. The dark segment on the bottom is all of the computing power deployed worldwide on platforms _other_ than Linux. Notice anything? (Here’s a hint, look at what OS is enabling this national diversity in supercomputing.)

Last, the good news here is that the there is more and more raw computing power being made available on a global basis thanks to Linux - and a lot of this innovation is making its way back into the kernel. As more countries start to use smart grid technology or seek to forecast the effects of global warming there is one common thread – the need for more computing power is endless. Just like what we’ve come to understand about Watson, (other than embarrassing humans at Jeopardy!), this technology will be used in smaller systems as we address one of the more pressing business issues of today - big data.

Once again these numbers are great for Linux. But more than numbers,, it is Linux’s ability to provide access to source code for anyone, to be optimized and have those optimizations returned for the common projects for ever increasing innovation that has created an unbreakable virtuous cycle in computing. I would also like to take this opportunity to congratulate our platinum member Fujitsu, who has done impressive research and development on Linux in super computers and for the enterprise, and who in this announcement has taken the number one position.

View the original article here

What's better, a graphical interface or the Linux command line? Both of them. They blend seamlessly on Linux so you don't have to choose. A good graphical user interface (GUI) has a logical, orderly flow, helps guide you to making the right command choices, and is reasonably fast and efficient. Since this describes a minority of all GUIs, I still live on the command line a lot. The CLI has three advantages: it's faster for many operations, it's scriptable, and it is many times more flexible. Linux's Unix heritage means you can string together commands in endless ways so they do exactly what you want.

Here is a collection of some of my favorite finding-things command line incantations.

In graphical file managers like Dolphin and Nautilus you can right-click on a folder and click Properties to see how big it is. But even on my quad-core super-duper system it takes time, and for me it's faster to type the df or dh commands than to open a file manager, navigate to a directory, and then pointy-clicky. How big is my home directory?

$ du -hs ~748G /home/carla

How much space is left on my hard drive or drives? This particular incantation is one of my favorites because it uses egrep to exclude temporary directories, and shows the filesystem types:

$ df -hT | egrep -i "file|^/"Filesystem Type Size Used Avail Use% Mounted on/dev/sda2 ext4 51G 3.6G 32G 11% //dev/sda3 ext4 136G 2.3G 127G 2% /home/dev/sda1 ext3 244G 114G 70G 63% /home/carla/photoshare/dev/sdb2 ext3 54G 5.8G 45G 12% /home/carla/music

What files were changed on this day, in the current directory?

$ ls -lrt | awk '{print $6" "$7" "$9 }' | grep 'May 22' May 22 file_a.txtMay 22 file_b.txt

Using a simple grep search displays complete file information:

$ ls -lrt | grep 'May 22' -rw-r--r-- 1 carla carla 383244 May 22 20:21 file_a.txt-rw-r--r-- 1 carla carla 395709 May 22 20:23 file_b.txt

Or all files from a past year:

ls -lR | grep 2006

Run complex commands one section at a time to see how they work; for example, start with ls -lrt, then ls -lrt | awk '{print $6" "$7" "$9 }', and so on. To avoid hassles with upper- and lower-case filenames, use grep -i for a case-insensitive search.

Want to sort files by creation date? You can't in Linux, but you can in FreeBSD. Want to specify a different directory? Use ls -lrt directoryname.

Which files were changed in the last three minutes? This is quick slick way to see what changed after making changes to your system:

find / -mmin -3

You can specify a time range, like what changed in the current directory between three and six minutes ago?

find . -mmin +3 -mmin -6

The dot means current directory.

Need to track down disk space hogs? This is probably one of the top ten tasks even in this era of terabyte hard drives. This lists the top five largest directories or files in the named directory, including the top level directory:

$ du -a directoryname | sort -nr | head -n 5119216208.55389884./photos40650788./Photos37020884./photos/200720188284./carla

Omit the -a option to list only directories.

It is well worth getting acquainted with the find command because it can do everything except make good beer. This nifty incantation finds the five biggest files on your system, and sorts them from largest to smallest, in bytes:

# find / -type f -printf '%s %p\n' |sort -nr| head -51351655936 /home/carla/sda1/carla/.VirtualBox/Machines/ubuntu-hoary/Snapshots/{671041dd-700c-4506-68a8-7edfcd0e3c58}.vdi1332959240 /home/carla/sda1/carla/51mix.wav1061154816 /proc/kcore962682880 /home/carla/sda1/Photos/2007-sept-montana/video_ts/vts_01_4.vob962682880 /home/carla/sda1/photos/2007/2007-sept-montana/video_ts/vts_01_4.vob

You really don't need to include the /proc pseudo-filesystem, since it occupies no disk space. Use the wholename and prune options to exclude it:

find / -wholename '/proc' -prune -o -type f -printf '%s %p\n' |sort -nr| head -5

There is potential gotcha, and that is that find will recurse into all mounted filesystems, including remote filesystems. If you don't want it to do this then add the -xdev option:

find / -xdev -wholename '/proc' -prune -o -type f -printf '%s %p\n' |sort -nr| head -5

Another potential gotcha with -xdev is find will only search the filesystem the command is run from, and no other filesystem mounts, not even local ones. So if your filesystem is spread over multiple partitions or hard drives on one computer, and you want to search all of them, don't use -xdev. I'm sure there is a clever way to distinguish between local and remote filesystems, and when I figure it out I'll share it.

Now let's string together a splendid find incantation to convert those large indigestible blobs of bytes into a nice readable format:

# find / -type f -print0| xargs -0 ls -s | sort -rn | awk '{size=$1/1024; printf("%dMb %s\n", size,$2);}' | head -51290Mb /home/carla/sda1/carla/.VirtualBox/Machines/ubuntu-hoary/Snapshots/{671041dd-700c-4506-68a8-7edfcd0e3c58}.vdi1272Mb /home/carla/sda1/carla/51mix.wav918Mb /home/carla/sda1/Photos/2007-sept-montana/video_ts/vts_01_4.vob918Mb /home/carla/sda1/photos/2007/2007-sept-montana/video_ts/vts_01_4.vob918Mb /home/carla/sda1/Photos/2007-sept-montana/video_ts/vts_01_1.vob

Yes, I know, you can do many of these things in graphical search applications. To me they are slow and clunky, and it's a lot faster to replay searches from my Bash history, or copy them from my cheat sheet. I even have some aliased in Bash, for example I use that last long find incantation a lot. So I have this entry aliased to find5 in my .bashrc:

alias find5='find / -wholename '/proc' -prune -o -wholename '/sys' -prune -o -type f -print0| xargs -0 ls -s | sort -rn | awk '{size=$1/1024; printf("%dMb %s\n", size,$2);}' | head -5'

In this example I have excluded both the /proc and the /sys directories.

The locate is very fast because it creates a database of all of your filenames. You need to update it periodically, and many distros do this automatically. To update it manually simply run the updatedb command as root. locate and grep are powerful together. For example, find all .jpg files that are 1024 pixels wide:

locate *.jpg|grep 1024

Search for image files in three different formats for an application:

locate claws-mail|grep -iE "(jpg|gif|ico)"

Well here we are at the end already! Thanks for reading, and please consult the fine man pages for these commands to learn what the different options mean.

View the original article here

linuxian

Kamis, 30 Juni 2011

Supercomputing Freakonomics - Finding Meaning Beyond the Headlines

Sabtu, 18 Juni 2011

Things You Can't Do With a GUI: Finding Stuff on Linux