<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-2425290326823263574</id><updated>2012-01-28T07:22:47.960-08:00</updated><category term='calendar'/><category term='time wasted log'/><category term='wiki'/><category term='sysadmin'/><category term='bugs'/><category term='ISCA comp.arch BT'/><category term='shopping'/><category term='life optimization'/><category term='privacy'/><category term='upgrade'/><category term='bitching'/><category term='fnancial'/><category term='filesystems'/><category term='git'/><category term='user interface'/><category term='PC'/><category term='patching'/><category term='wesabe'/><category term='toshiba'/><category term='hg'/><category term='mint'/><category term='programming languages'/><category term='update'/><category term='version-control'/><category term='xml testing'/><category term='cvs'/><category term='bzr'/><category term='security'/><category term='politics'/><category term='software design'/><category term='Perl'/><category term='blog'/><category term='computers'/><category term='customer-service'/><category term='collaboration tools'/><category term='complaint'/><category term='human factors'/><category term='boring'/><category term='twiki'/><category term='VCS'/><category term='websites'/><category term='wireless'/><category term='software'/><category term='iPod iTunes'/><category term='quicken'/><category term='content-based'/><category term='SW Engineering'/><category term='memex'/><category term='testing'/><category term='annoying'/><category term='free speech'/><category term='google'/><category term='yodlee moneycenter'/><title type='text'>Krazy Glew's Blog</title><subtitle type='html'>&lt;p&gt;Andy "Krazy" Glew is a computer architect, a long time poster on comp.arch ... and an evangelist of collaboration tools such as wikis, calendars, blogs, etc. Plus an occasional commentator on politics, taxes, and policy. Particularly the politics of multi-ethnic societies such as Quebec, my birthplace.&lt;/p&gt;
&lt;em&gt;&lt;b&gt;The content of this blog is my personal opinion. It is not that of my employer. See Disclaimer. &lt;/b&gt;&lt;/em&gt;
&lt;p&gt;Photo credit: http://docs.google.com/View?id=dcxddbtr_23cg5thdfj&lt;/p&gt;</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default?start-index=101&amp;max-results=100'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>326</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-627719417477975488</id><published>2012-01-28T07:22:00.000-08:00</published><updated>2012-01-28T07:22:47.974-08:00</updated><title type='text'></title><content type='html'>[[Write coalescing]] is the term some GPUs, notably AMD/ATI and Nvidia, use to describe how they, umm, combine or coalesce writes from different N different SIMD threads into a single, or at least fewer than N, accesses.  There is also [[read coalescing]], and one can imagine other forms of coalescing, such as atomic fetch-and-op coalescing.&lt;br /&gt;&lt;br /&gt;At AFDS11 I (Glew) asked an AMD/ATI GPU architect&lt;br /&gt;"What is the difference between [[write coalescing]] and [[write combining]]?"&lt;br /&gt;&lt;br /&gt;He replied that [[write combining]] was an x86 CPU feature that used a [[write combining buffer]],&lt;br /&gt;whereas [[write coalescing]] was a GPU feature that performed the optimization between multiple writes that were occurring simultaneously, not in a buffer.&lt;br /&gt;&lt;br /&gt;Hmmm...&lt;br /&gt;&lt;br /&gt;Since I (Glew) had a lot to do with x86 write combining&lt;br /&gt;- arguably I invented it on P6, although I was inspired by a long line of work in this area,&lt;br /&gt;most notably the [[NYU Ultracomputer]] [[fetch-and-op]] [[combining network]]&lt;br /&gt;- I am not sure that this distinction is fundamental.&lt;br /&gt;&lt;br /&gt;Or, rather, it _is_ useful to distinguish between buffer based implementations and implementations that look at simultaneous accesses.&lt;br /&gt;&lt;br /&gt;However, in the original NYU terminology, [[combining]] referred to both:&lt;br /&gt;operations received at the same time by a switch in the [[combining network]],&lt;br /&gt;and operations received at a later time that match an operation buffered in the switch,&lt;br /&gt;awaiting either to be forwarded on,&lt;br /&gt;or a reply.&lt;br /&gt;(I'm not sure which was in the Ultracomputer.)&lt;br /&gt;&lt;br /&gt;A single P6 processor  only did one store per cycle, so a buffer based implementation that performed [[write combining]] between stores &lt;br /&gt;at different times was the only possibility. Or at least the most useful.&lt;br /&gt;Combining stores from different processors was not done (at least, not inside the processor, and could not legally be done to all UC stores).&lt;br /&gt;&lt;br /&gt;The NYU Ultracomputer performed this optimization in a switch for multiple processors,&lt;br /&gt;so combining both simultaneous operations and operations performed at different times&lt;br /&gt;was a possibility.&lt;br /&gt;&lt;br /&gt;GPUs do many, many, stores at the same time, in a [[data memory coherent]] manner.&lt;br /&gt;This creates a great opportunity for optimizing simultaneous stores.&lt;br /&gt;Although I would be surprised and disappointed to learn that &lt;br /&gt;GPUs did not combine or coalesce&lt;br /&gt;(a) stores from different cycles in the typically 4 cycle wavefront or warp,&lt;br /&gt;and&lt;br /&gt;(b) stores from different SIMD engines, if they encounter each other on the way to memory.&lt;br /&gt;&lt;br /&gt;I conclude therefore that the difference between [[write combining]] and [[write coalescing]] is really one of emphasis.&lt;br /&gt;Indeed, this may be yet another example  where my&lt;br /&gt;(Glew's) predilection is to [[create new terms by using adjectives]],&lt;br /&gt;e.g. [[write combining buffer]] or [[buffer-based write combining]]&lt;br /&gt;versus [[simultaneous write combining]] (or the [[AFAIK]] hypiothetical special case [[snoop based write combining]]),&lt;br /&gt;rather than creating gratuitous new terminology,&lt;br /&gt;such as [[write combining]] (implicitly restricted to buffer based)&lt;br /&gt;versus [[write coalescing]] (simultaneous, + ...).&lt;br /&gt;&lt;br /&gt;= See Also =&lt;br /&gt;&lt;br /&gt;This discussion prompts me to create&lt;br /&gt;&lt;br /&gt;* [[a vocabulary of terms for memory operation combining]]&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-627719417477975488?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/627719417477975488/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=627719417477975488' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/627719417477975488'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/627719417477975488'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/write-coalescing-is-term-some-gpus.html' title=''/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-6881430511547694226</id><published>2012-01-28T07:16:00.000-08:00</published><updated>2012-01-28T07:20:03.722-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SW Engineering'/><category scheme='http://www.blogger.com/atom/ns#' term='VCS'/><title type='text'>Project structure: code, tests, external depndencies</title><content type='html'>Say you have a project foo, which I will call .../foo, to emphasize that there is probably a project directory somewhere in the namespace.&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;.../foo&lt;/blockquote&gt;Where do you put the tests? &amp;nbsp;It's nice to put them in .../foo/tests, so that when you checkout the project, you get the tests as well:&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;.../foo/tests&lt;/blockquote&gt;&lt;ul&gt;by which I mean&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;.../foo&lt;br /&gt;&amp;nbsp; &amp;nbsp; .../foo/tests&lt;/blockquote&gt;or&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;...&lt;br /&gt;&amp;nbsp; &amp;nbsp;foo&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; tests&lt;/blockquote&gt;but &amp;nbsp;emphasizing the full context via .../ and .../foo/&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;It's also good to minimize the external dependencies of .../foo.&lt;br /&gt;&lt;br /&gt;But what if the tests have more external dependencies than the non-test part of the project. &amp;nbsp;Should you increase the external dependencies of the non-test part just to have the tests? &lt;br /&gt;&lt;br /&gt;Conversely, should you make it harder to write the tests by forbidding external dependencies in them? &amp;nbsp;Tests are hard enough to write, and they often depend on extra libraries that the source code per se does not?&lt;br /&gt;&lt;br /&gt;More specifically, should you get stuff that has increased external dependencies if you check out .../foo and .../foo/tests comes along for the ride?&lt;br /&gt;&lt;br /&gt;If you don't want the extra dependencies in .../foo/tests to come along with .../foo, you might structure them as separate modules, possibly separate in the file space:&lt;br /&gt;&lt;br /&gt;.../foo+tests&lt;br /&gt;&amp;nbsp; &amp;nbsp; .../foo&lt;br /&gt;&amp;nbsp; &amp;nbsp; .../tests&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This works, but it creates extra levels of such "meta-modules": .../foo+tests, .../foo+interactive_tests, .../foo+debugging_tools+tests, etc.&lt;br /&gt;&lt;br /&gt;You might structure it as a series of optional modules, all within a single metamodule:&lt;br /&gt;&lt;br /&gt;.../foo+stuff&lt;br /&gt;&amp;nbsp; &amp;nbsp; .../foo&lt;br /&gt;&amp;nbsp; &amp;nbsp; .../tests&lt;br /&gt;&amp;nbsp; &amp;nbsp; .../stress-tests&lt;br /&gt;&lt;br /&gt;or, rather&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;.../foo+stuff&lt;br /&gt;&amp;nbsp; &amp;nbsp; .../foo+stuff/foo&lt;br /&gt;&amp;nbsp; &amp;nbsp; .../foo+stuff/tests&lt;br /&gt;&amp;nbsp; &amp;nbsp; .../foo+stuff/stress-tests&lt;br /&gt;&lt;br /&gt;and so on.&lt;br /&gt;&lt;br /&gt;It's still annoying to have an extra level of indirection, but the purity of the bodily fluids of .../foo is maintained.&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;&lt;/blockquote&gt;&lt;blockquote class="tr_bq"&gt;&lt;/blockquote&gt;&lt;blockquote class="tr_bq"&gt;&lt;/blockquote&gt;&lt;div&gt;Now, of course, it is highly likely that you will have some tests within foo and some that are assoviated with foo, but that foo won't let in:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;.../foo+stuff&lt;br /&gt;&amp;nbsp; &amp;nbsp; .../foo+stuff/foo&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .../foo+stuff/foo/tests&lt;br /&gt;&amp;nbsp; &amp;nbsp; .../foo+stuff/tests&lt;br /&gt;&amp;nbsp; &amp;nbsp; .../foo+stuff/stress-tests&lt;/div&gt;&lt;div&gt;This can be confusing, but apart from changing names it may be unavoidable: if a country has immigration control country, you often get refugee camps on the border. &amp;nbsp;The country may ignore them, but the UNHCR does not.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Some systems allow this "stuff" to be overlays within foo:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;.../foo&lt;/div&gt;&lt;div&gt;&amp;nbsp; &amp;nbsp; .../foo/tests -- optional&lt;/div&gt;&lt;div&gt;&amp;nbsp; &amp;nbsp; .../foo/stress-tests &amp;nbsp;-- optional&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-6881430511547694226?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/6881430511547694226/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=6881430511547694226' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6881430511547694226'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6881430511547694226'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/project-structure-code-tests-external.html' title='Project structure: code, tests, external depndencies'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-8369532577860688127</id><published>2012-01-26T12:21:00.001-08:00</published><updated>2012-01-28T07:18:53.208-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SW Engineering'/><category scheme='http://www.blogger.com/atom/ns#' term='Perl'/><category scheme='http://www.blogger.com/atom/ns#' term='testing'/><title type='text'>Perl test brackets with deterministic finalization</title><content type='html'>I just realized that you can use deterministic finalization in Perl just like C++. Yay!&lt;br /&gt;&lt;pre&gt;{&lt;br /&gt;    package Brackets;&lt;br /&gt;    # my first attempt to use determinsitic finalization in Perl;&lt;br /&gt;    sub name {&lt;br /&gt; my $self = {};&lt;br /&gt; $self-&amp;gt;{name} = shift @_;&lt;br /&gt; bless $self;&lt;br /&gt; print "&lt;test $self-="" start=""&gt;{name}&amp;gt;\n";&lt;br /&gt; return $self;&lt;br /&gt;    }&lt;br /&gt;    sub DESTROY {&lt;br /&gt; my $self = shift @_;&lt;br /&gt; print "&lt;/test&gt;{name}&amp;gt;\n";&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;Note that I use my "pseudo-XML" notation:&lt;br /&gt;&lt;pre&gt;&amp;lt;TEST START name&amp;gt;&lt;br /&gt;...&lt;br /&gt;&amp;lt;/TEST END name&amp;gt;&lt;br /&gt;&lt;/pre&gt;It's not nice to violate the standard, but IMHO this is a alot easirr to read than&lt;br /&gt;&lt;pre&gt;&amp;lt;TEST context="START" name="name"&amp;gt;&lt;br /&gt;...&lt;br /&gt;&amp;lt;/TEST context="END" name="name"&amp;gt;&lt;br /&gt;&lt;/pre&gt;and readability is a concern - since I plop these things in test output read by my human coworkers.Human coworkers who really don't like XML because it is so ugly.I can readily translate my pseudo XML into real XML, in case I wanted to use any real XML tools.And can do many operations without such translation.Unfortunately, there aren't many real XML tools to be used withot a lot of work :-(.I need to fix up my pseudo-XML UNIX command line tools, suitable fr use in pipezs and so on.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-8369532577860688127?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/8369532577860688127/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=8369532577860688127' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8369532577860688127'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8369532577860688127'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/perl-test-brackets-with-deterministic.html' title='Perl test brackets with deterministic finalization'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-6310369170465902453</id><published>2012-01-25T12:45:00.000-08:00</published><updated>2012-01-28T07:19:21.780-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bugs'/><category scheme='http://www.blogger.com/atom/ns#' term='google'/><title type='text'>googletalkplugin issues</title><content type='html'>I started having problems with google talk, specifically the googletalkplugin that I get when clicking on "phone" in gmail.I believe I have had these before.  Not sure of old workarounds have all been tried, none so far work.Started off with not being able to hear gtalk dial.Then got messages like "XXX-XXX-XXXX (a phone number) cannot be reached."Uninstalled/reinstalled. Witho rebooting. Running as Asministrator.  No joy.Just noticed many googletalkplugin processes running.  This is what tickled my memory. A new one starts every time I start Chrome.  Doesn't always go away.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-6310369170465902453?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/6310369170465902453/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=6310369170465902453' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6310369170465902453'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6310369170465902453'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/googletalkplugin-issues.html' title='googletalkplugin issues'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-4821702270240559917</id><published>2012-01-24T16:08:00.000-08:00</published><updated>2012-01-28T07:21:10.365-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='programming languages'/><title type='text'>Code ordering, want none: perl example</title><content type='html'>I have several times posted how I find languages without code ordering pleasant, more readable.&lt;br /&gt;&lt;br /&gt;E.g.&lt;br /&gt;&lt;pre&gt;  let x = y + z&lt;br /&gt;      where z = some log and complicated expressions&lt;br /&gt;&lt;/pre&gt;Here's an example from writing "shell scripts" - although here I am doing the shell script in Perl:Start off with:&lt;br /&gt;&lt;pre&gt;   system("command with some long and complicated command line");&lt;br /&gt;&lt;/pre&gt;Realize that you want to repeat the command in a pre-announcement of what you are doing,and in an error message:Start off with:&lt;br /&gt;&lt;pre&gt;   print "RUNNING: command with some long and complicated command line\n";&lt;br /&gt;   my $exitcode = system("command with some long and complicated command line");&lt;br /&gt;   print "error: exitcode=$exitcode: command with some long and complicated command line\n"&lt;br /&gt;      if there_is_a_problem($exitcode);&lt;br /&gt;&lt;/pre&gt;Now avoid repetition:Standard way:&lt;br /&gt;&lt;pre&gt;   my $cmd = command with some long and complicated command line";&lt;br /&gt;   print "RUNNING: $cmd";&lt;br /&gt;   my $exitcode = system("cmd");&lt;br /&gt;   print "error: exitcode=$exitcode: $cmd\n"&lt;br /&gt;      if there_is_a_problem($exitcode);&lt;br /&gt;&lt;/pre&gt;In my opinion the non-code ordered way is more readable:&lt;br /&gt;&lt;pre&gt;{&lt;br /&gt;   my $cmd;  # maybe some declaration to say it is not ordered?&lt;br /&gt;   print "RUNNING: $cmd";&lt;br /&gt;   my $exitcode = system($cmd = "command with some long and complicated command line");&lt;br /&gt;   print "error: exitcode=$exitcode: $cmd\n"&lt;br /&gt;      if there_is_a_problem($exitcode);&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;I think it is more readable because you see the value set at the point where it matters,where it is used most intensely (the other uses are mild, just prints).Note that I have used a scope.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-4821702270240559917?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/4821702270240559917/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=4821702270240559917' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4821702270240559917'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4821702270240559917'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/code-ordering-want-none-perl-example.html' title='Code ordering, want none: perl example'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7491142055908740273</id><published>2012-01-20T19:17:00.000-08:00</published><updated>2012-01-28T07:20:37.311-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='memex'/><title type='text'>Tagging is so passe'</title><content type='html'>Tagging is so passe'.&lt;br /&gt;&lt;br /&gt;Manually adding keywords to stuff you do.&lt;br /&gt;&lt;br /&gt;What we need is "tag suggestion" software. &amp;nbsp;Software that looks at what you have written, compares it to a corpus - perhaps your stuff, but perhaps stuff from others - and gives you the choice.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;Automatic email folder classification rules are so passe'...&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;Gmail's labels are so passe'. &amp;nbsp;Same reasons. &lt;br /&gt;&lt;br /&gt;(Plus the absolute lack of structure.)&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;I have played around with Bayesian codes, for determining if tags or labels should apply.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;Gmail's "important" filter is a step. &amp;nbsp;But more needed. &amp;nbsp;Plus, a more personal classification system.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;I remember GNUS gnus-topic-mode.el &amp;nbsp;in EMCS fondly. &amp;nbsp;Realizes that at different times of the day, or in different modes, I may prioritize things differehntly.&lt;br /&gt;&lt;br /&gt;THERE ARE NO FIXED PRIORITIES for personal information management.&lt;br /&gt;&lt;br /&gt;My priorities when I am reading email on vacation, or in the evening at home, are different than in the day at work.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;Why tag?&lt;br /&gt;&lt;br /&gt;Why not just use search?&lt;br /&gt;&lt;br /&gt;Tagging is a crystallization of information content. &amp;nbsp;E.g. it records the fact that, at some time, you decided that a post was about VCS, Version Control Software, even though it might not contain the phrase in a way that seach would turn up.&lt;br /&gt;&lt;br /&gt;Tags make it easier to track [[terminology drift]]. &amp;nbsp;(TBD, need to write a wiki/blog on that).&lt;br /&gt;&lt;br /&gt;E.g. what we call now VCS (Version Control) might have been called REvision Control years ago, or CM (Configuration Management).&lt;br /&gt;&lt;br /&gt;Terminology drifts over time. &amp;nbsp;Tags make it easier to track such drift, although even tags drift.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7491142055908740273?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7491142055908740273/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7491142055908740273' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7491142055908740273'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7491142055908740273'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/tagging-is-so-passe.html' title='Tagging is so passe&apos;'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-8274184647850519389</id><published>2012-01-20T19:04:00.000-08:00</published><updated>2012-01-20T19:04:40.043-08:00</updated><title type='text'>AAdvantage versus Account Aggregation</title><content type='html'>I just learned that American Airlines' AAdvantage frequent flyer program has sent cease and desist letters to account aggregators.&lt;br /&gt;&lt;br /&gt;(This shows how often I check my AAdvantage miles balance - only when I am planning vacation. &amp;nbsp;It also shows exactly why I depend on an account aggregator - for all of these bleeding accounts...)&lt;br /&gt;&lt;br /&gt;Now, perhaps I am exposing myself to hackers because I admit that I use an account aggregator. &amp;nbsp;Single point of failure, and all that.&lt;br /&gt;&lt;br /&gt;(By the way, I would be much happier if the aggregators had read-only access to my accounts - if they could only see balances, but not change passwords.)&lt;br /&gt;&lt;br /&gt;But the overall thing is: there are, I have, too many bleeding accounts. Too many blinking passwords.&lt;br /&gt;&lt;br /&gt;Account aggregators are one major tool to manage this.&lt;br /&gt;&lt;br /&gt;If a company will not let *any* account aggregator access them, well, then I do not need to be a customer of that company.&lt;br /&gt;&lt;br /&gt;I was considering dropping American Airlines anyway, because of their financial position. This is just more incentive.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;Heck, if AAdvantage was implementing better security, such as captchas, I would be happy.But sending "cease and desist" letters - that's garbage.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-8274184647850519389?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/8274184647850519389/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=8274184647850519389' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8274184647850519389'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8274184647850519389'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/aadvantage-versus-account-aggregation.html' title='AAdvantage versus Account Aggregation'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-8256239205760632994</id><published>2012-01-19T11:33:00.000-08:00</published><updated>2012-01-19T11:33:12.722-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='VCS'/><title type='text'>Branch purpose commit message</title><content type='html'>I would still like to have a commit message that I write at the time I generate a branch, saying why I am bothering tio generate a named branch.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-8256239205760632994?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/8256239205760632994/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=8256239205760632994' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8256239205760632994'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8256239205760632994'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/branch-purpose-commit-message.html' title='Branch purpose commit message'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-4898073026270679463</id><published>2012-01-18T23:50:00.000-08:00</published><updated>2012-01-20T10:28:14.585-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='VCS'/><title type='text'>VCS thoughts</title><content type='html'>Some fast, probably cryptic, thoughts after a day merging with a "I wasn't familiar with it originally but I am painfully familiar with it now" VCS tool.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&amp;nbsp;Merges are branches.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Mercurial tracks merge workflow in a workspace, assuming that a merge will be a single commit.&lt;/li&gt;&lt;li&gt;But today I had a complicated enough merge that I started a named branch off just for it. Accomplished the merge between my tasj branch and the trunk. And then merged back from this "merge branch" to the trunk.&lt;/li&gt;&lt;li&gt;Worked fine, but it would have been nice to have some workflow tracking along the branch. Like "hg resolve", but "hg resolve" stops at the first commit boundary.&lt;/li&gt;&lt;li&gt;Had to fall back to tracking things by hand, in a text file.&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;Mercurial 's names are awkard&lt;/li&gt;&lt;ul&gt;&lt;li&gt;"hg revert" isn't revert&lt;/li&gt;&lt;ul&gt;&lt;li&gt;"hg revert -r REV file" isn't "revert". &amp;nbsp;It is "include this revision of the file in the candidate commit that you are building in your workspace.&lt;/li&gt;&lt;li&gt;hg revert corresponds to cvs update&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;"hg update" isn't update&lt;/li&gt;&lt;ul&gt;&lt;li&gt;"hg update -r REVorBRABCH" isn't "update". It is "switch the revision or branch that&amp;nbsp;the candidate commit you are building ijn the workspace will be applied to as a child."&lt;/li&gt;&lt;li&gt;hg update corresponds to cvs checkout, although cvs is pretty sucky there too. &amp;nbsp;&lt;/li&gt;&lt;li&gt;It's rather like a rebase when you have no revisions checked in yet&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Mercurial's branchs are not branches.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;They are floating tags&lt;/li&gt;&lt;li&gt;Not necessarily lines of evolution.&lt;/li&gt;&lt;li&gt;What would be a better name? "Stream of development?" "Genealogy line"?&lt;/li&gt;&lt;li&gt;I'd like to have evolutionary branch0lines, as well as what Mercurial has.&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Tags should be versioned.&amp;nbsp;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;At least Mercurial got that right.&lt;/li&gt;&lt;li&gt;But old tag versions should be visible in the log.&lt;/li&gt;&lt;li&gt;And it should be possible to refer to an old tag version, something like "last week's official release"&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Mercurial's anonymous branches rule&lt;/li&gt;&lt;ul&gt;&lt;li&gt;See, I am not purely dissing hg&lt;/li&gt;&lt;li&gt;But "hg tip" has obviously not been brought up0 to date with named branches&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Can anyone tell me how to do the equivalent of "cvs update -j branch-base -j branch" in Mercurial?&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Without using an external patch&lt;/li&gt;&lt;li&gt;Hint: "hg merge -r branch" doesn't work, if you have anti-patches on the branch&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;I want to be able to "pull onto a branch" or "push a branch". &amp;nbsp;Not just "push or pull all the branches in the repository". &amp;nbsp;I.e. I want branch renaming or mapping in pull and push&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Mercurial doesn't do partial checkouts or checkins?&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Maybe not everywhere&lt;/li&gt;&lt;li&gt;But you can screw yourself up in the same way with "hg ci incomplete list of files"&lt;/li&gt;&lt;ul&gt;&lt;li&gt;I'm not suggesrting denying hg ci files partial&lt;/li&gt;&lt;li&gt;But I'd like to do it in general.&lt;/li&gt;&lt;li&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;li&gt;Partial checkins and checjkourts should correspond to brancges, with merging actively encouraged.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;I've talked about this many times.&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;Enough already.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-4898073026270679463?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/4898073026270679463/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=4898073026270679463' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4898073026270679463'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4898073026270679463'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/vcs-thoughts.html' title='VCS thoughts'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-4608470409912681534</id><published>2012-01-18T11:11:00.000-08:00</published><updated>2012-01-18T11:17:17.262-08:00</updated><title type='text'>"system/command" versus "system command"</title><content type='html'>A lot of tools have a master command with subcommands.  For example&lt;pre&gt;&lt;br /&gt;hg clone ...&lt;br /&gt;hg merge&lt;br /&gt;&lt;br /&gt;cvs co&lt;br /&gt;cvs update&lt;br /&gt;&lt;br /&gt;git clomne&lt;br /&gt;...&lt;br /&gt;&lt;/pre&gt;I think that the first place I encountered this was with mh. Or, since my friend MH says that mh did not have subcommands, I may have made the following change myself for mh.Why subcommands?  I think mainly to avoid name collisions in the bin.Years ago, I kluged - I think it may have been the shell, whoever processes PATH -to search not just for "executable" on the PATH, but also "dir/executable".I.e. instead of saying&lt;pre&gt;&lt;br /&gt;hg update&lt;br /&gt;&lt;/pre&gt;I colud have said&lt;pre&gt;&lt;br /&gt;hg/update&lt;br /&gt;&lt;/pre&gt;Not much of a difference.  But it makes it easier for guys to code systems that have lots of subcommands.Also, for users of shells like bash and csh: !hg/update works, whereas "!hg update" doesn't (unless your shell has tweaks.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-4608470409912681534?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/4608470409912681534/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=4608470409912681534' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4608470409912681534'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4608470409912681534'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/systemcommand-versus-system-command.html' title='&quot;system/command&quot; versus &quot;system command&quot;'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7473160667489946487</id><published>2012-01-17T11:55:00.001-08:00</published><updated>2012-01-17T11:55:50.152-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='VCS'/><category scheme='http://www.blogger.com/atom/ns#' term='hg'/><title type='text'></title><content type='html'>Mercurial whine:&lt;pre&gt;&lt;br /&gt;hg merge -r tip&lt;br /&gt;&lt;/pre&gt;is NOT the same as &lt;pre&gt;&lt;br /&gt;hg merge -r default&lt;br /&gt;&lt;/pre&gt;Because Mercurial's tip may be on a branch.  tip is just the most recent changeset, anywhere.I have hit bugs caused by following hg recipes that talk about using -r tip.Sigh.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7473160667489946487?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7473160667489946487/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7473160667489946487' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7473160667489946487'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7473160667489946487'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/mercurial-whine-hg-merge-r-tip-is-not.html' title=''/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-9180715506511227130</id><published>2012-01-17T11:47:00.000-08:00</published><updated>2012-01-28T07:21:50.546-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='VCS'/><title type='text'>Tentative checkin for branch only, not to be merged?</title><content type='html'>Here's another:Working on a branch, I just did a pull, and merged from trunk onto the beranch.I notice that somebody else has a minor bug on the trunk: it looks like they forgot to check in a test reference pattern,although it is also possible that the test has been checked in before the golden behavior is established.I report the bug.But I would like to silence the error, at least on my branch.  I can make the change to the reference patterns.However, I do not know if that change is goosd.  I just want to silence it, or mark it as a known error.I don't want my change to propagate back to the trunk when eventually I merge and push.I.e. my branch now contains some tentative stuff, as well as some stuff that I am confident will soon be merghed.On CVS I would create a per file branch for the reference, and work off a merged directory.Eventually, I would update to the main branch, ahnd not get5 my patched test.Unclear how to do this in Mercurial, apart from remembering that I have to delete it eventually.Oh, here's a way:&lt;br /&gt;&lt;pre&gt;// working on branch B&lt;br /&gt;hg pull&lt;br /&gt;hg merge -r tip&lt;br /&gt;make test&lt;br /&gt;// make change to silence test error&lt;br /&gt;hg branch Bb&lt;br /&gt;// somehow copy old version of patched file to pre brahnch&lt;br /&gt;hg ci // on Bb&lt;br /&gt;hg update -r B&lt;br /&gt;// now work&lt;br /&gt;// when ready to merge, do the usual&lt;br /&gt;hg pull&lt;br /&gt;hg merge&lt;br /&gt;make test&lt;br /&gt;// now do the extra step to be confident that you aren't pushing anything broken&lt;br /&gt;hg update -r Bb&lt;br /&gt;hg merge -r B&lt;br /&gt;// now should have B, except for that one change that was tenative&lt;br /&gt;make test&lt;br /&gt;// see if the bug was fixed by someone else...&lt;br /&gt;// ok, now merge to default branch, and then push&lt;br /&gt;// - since I am paranoid, I might pull and test again&lt;br /&gt;hg update -r default&lt;br /&gt;hg merge -r Bb&lt;br /&gt;make test&lt;br /&gt;hg push&lt;br /&gt;&lt;/pre&gt;Works, but is complicated.  I am quite likely to forget that I am supposed to merge from branch B onto Bb before merging to the trunk.{{Category-VCS}}{{Category-hg}}&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-9180715506511227130?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/9180715506511227130/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=9180715506511227130' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/9180715506511227130'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/9180715506511227130'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/tentatuive-checkin-for-branch-only-not.html' title='Tentative checkin for branch only, not to be merged?'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-668655027215530883</id><published>2012-01-17T11:28:00.000-08:00</published><updated>2012-01-28T07:22:11.546-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='VCS'/><category scheme='http://www.blogger.com/atom/ns#' term='hg'/><title type='text'></title><content type='html'>AFAICT Mercurial only allows you to merge entire changesets.&lt;br /&gt;&lt;br /&gt;Here is an example of why I may want to do a merge at lower granularity: &lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;I am working on a branch B&lt;/li&gt;&lt;li&gt;I do a pull from the parent, hg pull&lt;/li&gt;&lt;li&gt;I merge from the parent on to my branch B, because I am not yet ready to merge back onto the trunk (aka default branch)&lt;/li&gt;&lt;li&gt;I run my tests&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Now I notice that the .hgignore that I got from the parent is missing a file. &amp;nbsp;I fix it in my repository, to shut it up.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Then I realize that I should push such a generic change back to the trunk asap.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What I want to do is something like"&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;pre&gt;  hg clone work-repo quick-fix&lt;br /&gt;  cd quick-fix&lt;br /&gt;  hg update -r default&lt;br /&gt;  hg merge -r B  .hgignore     # to merge just the change to .hgignore into the trunk&lt;br /&gt;  make clean;hg purge&lt;br /&gt;  make test&lt;br /&gt;  hg ci&lt;br /&gt;  ...&lt;br /&gt;  hg push&lt;br /&gt;&lt;/pre&gt;I.e. I want to merge JUST the change to .hgignore, using "hg merge -r B  .hgignore".Instead I do&lt;br /&gt;&lt;pre&gt;  hg clone work-repo quick-fix&lt;br /&gt;  cd quick-fix&lt;br /&gt;  hg update -r default&lt;br /&gt;  cp ../work-repo/.hgignore .&lt;br /&gt;  make clean;hg purge&lt;br /&gt;  make test&lt;br /&gt;  hg ci&lt;br /&gt;  ...&lt;br /&gt;  hg push&lt;br /&gt;&lt;/pre&gt;I.e. "cp ../work-repo/.hgignore ." is used instead of "hg merge -r B  .hgignore".Although this works, it makes me unhappy.E.g. I may blithely have overwritten other changes, e.g. if I cloned from the master rather than the work-repo.Not to mention the fact that the "hg push" abnove would push my branch.And my teammates do not want my branch to be pushed, because it has too many fine grain checkins - they want just a single checkin message.But that's another story.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-668655027215530883?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/668655027215530883/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=668655027215530883' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/668655027215530883'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/668655027215530883'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/afaict-mercurial-only-allows-you-to.html' title=''/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-3766820961622797756</id><published>2012-01-14T16:23:00.001-08:00</published><updated>2012-01-14T16:23:58.570-08:00</updated><title type='text'></title><content type='html'>&lt;br /&gt;I'd like to have a text parser, like Perl CPAN Text::ParseWords,&lt;br /&gt;that *only* breaks the text into words&lt;br /&gt;- but which does not transform the words, handle escape characters, etc.&lt;br /&gt;&lt;br /&gt;For example,&lt;br /&gt;&amp;nbsp; &amp;nbsp;Text::ParseWords::&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; shellwords("a b 'c d' e")&lt;br /&gt;returns&lt;br /&gt;&amp;nbsp; &amp;nbsp;a&lt;br /&gt;&amp;nbsp; &amp;nbsp;b&lt;br /&gt;&amp;nbsp; &amp;nbsp;c d&lt;br /&gt;&amp;nbsp; &amp;nbsp;e&lt;br /&gt;i.e. it breaks the text up into words,&lt;br /&gt;but it also transforms the words.&lt;br /&gt;&lt;br /&gt;I would like to separate the breakup from the transformation:&lt;br /&gt;&amp;nbsp; &amp;nbsp;a&lt;br /&gt;&amp;nbsp; &amp;nbsp;b&lt;br /&gt;&amp;nbsp; &amp;nbsp;'c d'&lt;br /&gt;&amp;nbsp; &amp;nbsp;e&lt;br /&gt;&lt;br /&gt;Note that if you ever encounter such a list whose words can themselves be further broken up,&lt;br /&gt;then you know that it has been parsed by some tool after your original parser.&lt;br /&gt;&lt;br /&gt;[[Category:Programming]] [[Categy::Text]]&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-3766820961622797756?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/3766820961622797756/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=3766820961622797756' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3766820961622797756'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3766820961622797756'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/id-like-to-have-text-parser-like-perl.html' title=''/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7595562158411204696</id><published>2012-01-08T18:28:00.000-08:00</published><updated>2012-01-08T18:29:10.178-08:00</updated><title type='text'>Thumb drive as a webserver / NAS</title><content type='html'>I wish that I could find a flash drive that acted as a NAS.&lt;br /&gt;&lt;br /&gt;Most USB flash drives are passive storage. Encryption is done by the OS you plug into. &amp;nbsp;Since I run Windows and various *IXes, and want to access my data from each, depending on a particular OS is a pain.&lt;br /&gt;&lt;br /&gt;Booting an OS from the USB flash drive is better - but still not great, since it makes assumptions about the platform you are plugging in to. Typically, that it is a PC.&lt;br /&gt;&lt;br /&gt;Flash drives have non-trivial processors in them. &amp;nbsp;Probably running Linux. &amp;nbsp;Why not make them a peer?&lt;br /&gt;&lt;br /&gt;Issue: network interface.&lt;br /&gt;&lt;br /&gt;I don't really want to add a typical Cat 5 ethernet connector - what is that, RJ45, 8P8C plug? - to the flash drive, neither in addition to nor replacing USB.&lt;br /&gt;&lt;br /&gt;Q: I am sure there is a standard for networking over USB - but how ubiquitous is it? &amp;nbsp;I do not recall ever getting the option, when I plug in a USB flash drive, of connecting a webnrowser to a server running on tghe drive.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7595562158411204696?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7595562158411204696/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7595562158411204696' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7595562158411204696'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7595562158411204696'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/i-wish-that-i-could-find-flash-drive.html' title='Thumb drive as a webserver / NAS'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-3985307521170519319</id><published>2012-01-06T15:51:00.000-08:00</published><updated>2012-01-06T15:51:08.089-08:00</updated><title type='text'>Gratuitous key position changes.</title><content type='html'>Was wondering why "search" &amp;nbsp;or "find" in Google chrome was broken.&lt;br /&gt;&lt;br /&gt;Wasn't. &amp;nbsp;For the past week I was using a keyboard that was not my usual, with the keys in the lower left hand corner looking like Ctrl,Fn,...&lt;br /&gt;&lt;br /&gt;Just switched back to what has been my usual keyboard for several months, with those keys swapped: Fn,Ctrl,...&lt;br /&gt;&lt;br /&gt;It's funny how these trivial differences in devices are some of the most annoying.&lt;br /&gt;&lt;br /&gt;Last time I shopped for keyboards, I bought 3 identical. &amp;nbsp;Not my favorite keyboard, but cheap enough that i could afford 3 of them. &amp;nbsp;Home, work, and Oceanside.&lt;br /&gt;&lt;br /&gt;Yeah, yeah: remapping. Loses.&lt;br /&gt;&lt;br /&gt;(The only good form of keyboard remapping is one that is downloaded into a PROM on the keyboard, so that it works in all modes, even before the OS boots.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-3985307521170519319?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/3985307521170519319/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=3985307521170519319' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3985307521170519319'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3985307521170519319'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/gratuitous-key-position-changes.html' title='Gratuitous key position changes.'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7418140151828507054</id><published>2012-01-05T13:04:00.000-08:00</published><updated>2012-01-05T13:04:42.186-08:00</updated><title type='text'>hg schemes extension</title><content type='html'>&lt;a href="http://mercurial.selenic.com/wiki/SchemesExtension"&gt;http://mercurial.selenic.com/wiki/SchemesExtension&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I like the idea of this extension, that allows you to create shortcuts for URLs you use often, e.g. cloning.&lt;br /&gt;&lt;br /&gt;E.g.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre style="background-color: white; border-bottom-color: rgb(174, 189, 204); border-bottom-style: solid; border-bottom-width: 1pt; border-image: initial; border-left-color: rgb(174, 189, 204); border-left-style: solid; border-left-width: 1pt; border-right-color: rgb(174, 189, 204); border-right-style: solid; border-right-width: 1pt; border-top-color: rgb(174, 189, 204); border-top-style: solid; border-top-width: 1pt; font-family: courier, monospace; font-size: 16px; padding-bottom: 5pt; padding-left: 5pt; padding-right: 5pt; padding-top: 5pt; white-space: pre-wrap; word-wrap: break-word;"&gt;[extensions]&lt;br /&gt;&lt;span class="anchor" id="line-2"&gt;&lt;/span&gt;hgext.schemes=&lt;br /&gt;&lt;span class="anchor" id="line-3"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="anchor" id="line-4"&gt;&lt;/span&gt;[schemes]&lt;br /&gt;ups = ssh://my-uarch-performance-simulator/...&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;However, it is something of a no-sale when I saw that the URL was notexpanded in the clone:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre style="background-color: white; border-bottom-color: rgb(174, 189, 204); border-bottom-style: solid; border-bottom-width: 1pt; border-image: initial; border-left-color: rgb(174, 189, 204); border-left-style: solid; border-left-width: 1pt; border-right-color: rgb(174, 189, 204); border-right-style: solid; border-right-width: 1pt; border-top-color: rgb(174, 189, 204); border-top-style: solid; border-top-width: 1pt; padding-bottom: 5pt; padding-left: 5pt; padding-right: 5pt; padding-top: 5pt; word-wrap: break-word;"&gt;&lt;span style="font-family: courier, monospace; font-size: medium; white-space: pre-wrap;"&gt;&amp;gt; hg clone ups: u&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family: courier, monospace; font-size: medium; white-space: pre-wrap;"&gt;...&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family: courier, monospace; font-size: medium; white-space: pre-wrap;"&gt;&amp;gt; cd u&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family: courier, monospace; font-size: medium; white-space: pre-wrap;"&gt;&amp;gt; hg paths&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family: courier, monospace; font-size: medium; white-space: pre-wrap;"&gt;default = ups://&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family: courier, monospace; font-size: medium; white-space: pre-wrap;"&gt;&amp;gt; &lt;/span&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br class="Apple-interchange-newline" /&gt;&lt;br /&gt;Or in WorkDir/.hg/hgrc&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre style="background-color: white; border-bottom-color: rgb(174, 189, 204); border-bottom-style: solid; border-bottom-width: 1pt; border-image: initial; border-left-color: rgb(174, 189, 204); border-left-style: solid; border-left-width: 1pt; border-right-color: rgb(174, 189, 204); border-right-style: solid; border-right-width: 1pt; border-top-color: rgb(174, 189, 204); border-top-style: solid; border-top-width: 1pt; padding-bottom: 5pt; padding-left: 5pt; padding-right: 5pt; padding-top: 5pt; word-wrap: break-word;"&gt;&lt;span style="font-family: courier, monospace; font-size: medium;"&gt;&lt;span style="white-space: pre-wrap;"&gt;[paths]&lt;br /&gt;default = ups:&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;br /&gt;PRO: when the remote repository changges location, this *might* allow you to change the scheme URL shorthand in hgrc, and have things work in the possibly multiple clones you have outstanding.&lt;br /&gt;&lt;br /&gt;CON: it loses some documentation. If I got run over by a beer truck, my replacement might have trouble finding where it was from. &amp;nbsp;Worse, if a repo was restored from backup (particularly if people keep branches as repos), and my home directory with the ~/.hgrc enabling SchemesExtension was &amp;nbsp;not present.&lt;br /&gt;&lt;br /&gt;I'm not sure what I would like. Possibly always expand the shorthand. &amp;nbsp;Possibly record the shorthand with the long path as a comment:&lt;br /&gt;&lt;pre style="background-color: white; border-bottom-color: rgb(174, 189, 204); border-bottom-style: solid; border-bottom-width: 1pt; border-image: initial; border-left-color: rgb(174, 189, 204); border-left-style: solid; border-left-width: 1pt; border-right-color: rgb(174, 189, 204); border-right-style: solid; border-right-width: 1pt; border-top-color: rgb(174, 189, 204); border-top-style: solid; border-top-width: 1pt; padding-bottom: 5pt; padding-left: 5pt; padding-right: 5pt; padding-top: 5pt; word-wrap: break-word;"&gt;&lt;span style="font-family: courier, monospace; font-size: medium;"&gt;&lt;span style="white-space: pre-wrap;"&gt;[paths]&lt;br /&gt;default = ups: &lt;/span&gt;&lt;span style="white-space: pre-wrap;"&gt;# &lt;/span&gt;&lt;/span&gt;&lt;span style="font-family: courier, monospace; font-size: 16px; white-space: pre-wrap;"&gt;ssh://my-uarch-performance-simulator/...&lt;/span&gt;&lt;/pre&gt;&lt;pre style="background-color: white; border-bottom-color: rgb(174, 189, 204); border-bottom-style: solid; border-bottom-width: 1pt; border-image: initial; border-left-color: rgb(174, 189, 204); border-left-style: solid; border-left-width: 1pt; border-right-color: rgb(174, 189, 204); border-right-style: solid; border-right-width: 1pt; border-top-color: rgb(174, 189, 204); border-top-style: solid; border-top-width: 1pt; padding-bottom: 5pt; padding-left: 5pt; padding-right: 5pt; padding-top: 5pt; word-wrap: break-word;"&gt;&lt;span style="font-family: courier, monospace; font-size: medium;"&gt;&lt;span style="white-space: pre-wrap;"&gt;&amp;gt; hg paths&lt;br /&gt;default = ups:// # &lt;/span&gt;&lt;/span&gt;&lt;span style="font-family: courier, monospace; font-size: 16px; white-space: pre-wrap;"&gt;ssh://my-uarch-performance-simulator/...&lt;/span&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br class="Apple-interchange-newline" /&gt;&lt;br /&gt;&lt;br /&gt;TBD: move this to my wiki Category:VCS.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7418140151828507054?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7418140151828507054/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7418140151828507054' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7418140151828507054'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7418140151828507054'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/hg-schemes-extension.html' title='hg schemes extension'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-1091388235568944411</id><published>2012-01-03T21:23:00.000-08:00</published><updated>2012-01-03T21:23:17.836-08:00</updated><title type='text'>VirtualBoxen</title><content type='html'>Gave in, worn down... Installed VirtualBox, and Ubuntu in a virtual box, on my tablet PC running Windows 7.&lt;br /&gt;&lt;br /&gt;Stream of &amp;nbsp;consciousness...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-1091388235568944411?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/1091388235568944411/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=1091388235568944411' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1091388235568944411'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1091388235568944411'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/virtualboxen.html' title='VirtualBoxen'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-3412535485457143253</id><published>2012-01-02T14:25:00.000-08:00</published><updated>2012-01-02T14:25:48.958-08:00</updated><title type='text'>Recovering diverged home  directory version control</title><content type='html'>I have long version controlled my home directory. &amp;nbsp;CVS. Git. Hg.&lt;br /&gt;&lt;br /&gt;Unfortunately, they diverged. &amp;nbsp;Divergence happens naturally with CVS. &amp;nbsp;You have to work hard to get git and hg to diverge, but I did.&lt;br /&gt;&lt;br /&gt;Now I want to merge the diverged home directories back together. &amp;nbsp;Preserving the history if possible, from the different VCS. &amp;nbsp;Sometimes just merging fikes.&lt;br /&gt;&lt;br /&gt;==&lt;br /&gt;&lt;br /&gt;Today: I want to start merging a diverged linux tree, from a flash drive, with my working tree (which started off on cygwin).&lt;br /&gt;&lt;br /&gt;I created a new hg repo, created a branch on it, and then imported the tree to be merged.&lt;br /&gt;&lt;br /&gt;I pulled/pushed this with my main home hg repo. &amp;nbsp;Had to use -f, to force unrelated repos to be together.&lt;br /&gt;&lt;br /&gt;Now I have a single repo, with my current working (cygwin derived, homedir, and a not working linux homedir, on different branches. &amp;nbsp;The former n the default branch.&lt;br /&gt;&lt;br /&gt;That's okay. &amp;nbsp;Not so bad. &amp;nbsp;A single history object(although I have kept separate working space trees.)&lt;br /&gt;&lt;br /&gt;===&lt;br /&gt;&lt;br /&gt;Now I want to merge, a file or a few files at a time.&lt;br /&gt;&lt;br /&gt;E.g. copy the README from the linux branch to the default working derived from cygwin branch.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;AAARGH!!!! &amp;nbsp;Hg doesn't handle partials... neither merges, not copies, nor... &amp;nbsp;Hg just plain really wants to lose track, not make doing this activity easy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-3412535485457143253?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/3412535485457143253/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=3412535485457143253' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3412535485457143253'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3412535485457143253'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/recovering-diverged-home-directory.html' title='Recovering diverged home  directory version control'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-1479058972475458556</id><published>2012-01-02T13:26:00.000-08:00</published><updated>2012-01-03T14:50:56.599-08:00</updated><title type='text'>Missing X fonts</title><content type='html'>Like&amp;nbsp;&lt;a href="http://ubuntuforums.org/showthread.php?p=11544410"&gt;http://ubuntuforums.org/showthread.php?p=11544410&lt;/a&gt;, I was getting errors such as&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="alt2" dir="ltr" style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; border-bottom-style: inset; border-bottom-width: 1px; border-color: initial; border-image: initial; border-left-style: inset; border-left-width: 1px; border-right-style: inset; border-right-width: 1px; border-top-style: inset; border-top-width: 1px; font-size: 12px; height: 50px; overflow-x: auto; overflow-y: auto; padding-bottom: 6px; padding-left: 6px; padding-right: 6px; padding-top: 6px; text-align: left; width: 640px;"&gt;Warning: Cannot convert string "-*-courier-medium-r-*-*-*-120-*-*-*-*-iso8859-*" to type FontStruct&lt;br /&gt;Warning: Cannot convert string "-*-helvetica-medium-r-*--*-120-*-*-*-*-iso8859-1" to type FontStruct&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;(Except that I was getting the errors, not on a fresh Ubintu install, but when trying to use a fairly new machine at work, with what is probably a newer version of redHat.)&lt;br /&gt;&lt;br /&gt;The post I quote fixed things by setting ~/.Xdefaultsto&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="alt2" dir="ltr" style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; border-bottom-style: inset; border-bottom-width: 1px; border-color: initial; border-image: initial; border-left-style: inset; border-left-width: 1px; border-right-style: inset; border-right-width: 1px; border-top-style: inset; border-top-width: 1px; font-size: 12px; height: 34px; overflow-x: auto; overflow-y: auto; padding-bottom: 6px; padding-left: 6px; padding-right: 6px; padding-top: 6px; text-align: left; width: 640px;"&gt;emacs*font: 7x14&lt;/pre&gt;&lt;br /&gt;And then doing&amp;nbsp;&lt;span style="background-color: white; font-size: 12px; text-align: left;"&gt;xrdb -merge ~/.Xdefaults&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: white; font-size: 12px; text-align: left;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;&lt;span style="font-size: 12px;"&gt;On my end, I found that neither 7x14 nor courier worked. &amp;nbsp;But font "fixed" did.&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span style="font-size: 12px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span style="font-size: 12px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-1479058972475458556?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/1479058972475458556/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=1479058972475458556' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1479058972475458556'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1479058972475458556'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/like-httpubuntuforums.html' title='Missing X fonts'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-3047413739125327338</id><published>2012-01-02T11:33:00.001-08:00</published><updated>2012-01-02T11:33:57.548-08:00</updated><title type='text'>An Amusing and Frustrating Anecdote about Google 2-step Verification</title><content type='html'>&lt;br /&gt;I like Google 2-step verification - in which you normally log in with a password, but where, if you are logging in from a new machine, etc., you "verify" your login by entering a one time code sent to your cell phone by voice or text.&lt;br /&gt;&lt;br /&gt;I liked the idea as soon as I heard of it, but was reluctant to sign up for it because I frequently use my laptop computer in places where cell phones don't work - e.g. at the coast in Oregon, and, most recently, at my new house in Portland's hills.&lt;br /&gt;&lt;br /&gt;I was scared that I might end up unable to log into my gmail, lacking cell phone.&lt;br /&gt;&lt;br /&gt;Nevertheless, after reading rave reviews, I finally gave in and signed up for 2-step verification. &amp;nbsp;And have continued to use my laptop fairly successfully at the coast without cell phone coverage, since normally it has already been verified.&lt;br /&gt;&lt;br /&gt;But over New Year's I finally tripped up:&lt;br /&gt;&lt;br /&gt;Because of a bug with googlevoiceplugin.exe (a new copy was spawned every time I started Chrome: I had 52 copies when I realized what was happening) I uninstalled and reinstalled the plugin, and eventually Chrome itself 9since &amp;nbsp;the plugin would not uninstall while Chrome was running).&lt;br /&gt;&lt;br /&gt;So when I tried to log back into gmail on Chrome, 2-step verification was required. &amp;nbsp;But my cell phone doesn't work at the coast.&lt;br /&gt;&lt;br /&gt;Now, Google 2-step verification has &amp;nbsp;a backup phone numbeer, which can be a land line using voice. &amp;nbsp;But remember that I said the new house that I have just bought also lacks cell phone coverage? &amp;nbsp;Guess what my backup phone was? &lt;br /&gt;&lt;br /&gt;And Google 2-step verification does have a backup set of one time passwords. &amp;nbsp;I know I printed them out for my wallet. &amp;nbsp;Umm... &amp;nbsp;got a new wallet, recently, smaller, and did not carry it over.&lt;br /&gt;&lt;br /&gt;So now the fun begins: &amp;nbsp;I don't have to drive too far to receive a text message. &amp;nbsp;I'll try to login, get Google to send the verification code, drive to where I can receive a text message, drive back.&lt;br /&gt;&lt;br /&gt;Try #1: got the message. &amp;nbsp;Actually, got several verification code messages. &amp;nbsp;Drive back, they don't work. &amp;nbsp;Perhaps I got confused, and typed the wrong verification code into the wrong box.&lt;br /&gt;&lt;br /&gt;Try #2: I realize that I may not need to drive the few miles to the next town. &amp;nbsp;The mountain next to my house may have reception. &amp;nbsp;Drive up it, yep, received the text message. &amp;nbsp;Drive down... Nope, didn't work. &lt;br /&gt;&lt;br /&gt;I'm beginning to think there is a timeout.&lt;br /&gt;&lt;br /&gt;Try #3: Repeat. An hour or so later, since I had to charge my cell phone - the battery drains quickly in this area. But this time, I can't get any bars on top of the mountain. &amp;nbsp;The fog has moved in and the sun has set, affecting signal strength.&lt;br /&gt;&amp;nbsp; &lt;br /&gt;So I drive to the next town. Signal, but no text message. &amp;nbsp;I wait ten minutes, start driving back... and the message arrives while I am driving. &amp;nbsp;Have I mentioned that AT&amp;amp;T Wireless has occasionally taken &amp;gt;4 hours to deliver text messages? &amp;nbsp;20 minutes is par for the course.&lt;br /&gt;&lt;br /&gt;Doesn't work. &amp;nbsp;I', getting pretty sure there is a timeout.&lt;br /&gt;&lt;br /&gt;Try #4: This time I request the verification code, drive out, and call back to ask my wife to enter it.&lt;br /&gt;&amp;nbsp; &amp;nbsp; &lt;br /&gt;However, my laptop has gone into power saving mode, and although I disabled the power-on password, by wife doesn't realize what has happened, and tries to use &amp;nbsp;the second computer sitting next to the laptop that needs the verification code.&lt;br /&gt;&lt;br /&gt;2 days later we try again: my wife drives to the next town with my cell phone. &amp;nbsp;She calls me back when she gets a signal. &amp;nbsp;I request the verification code. She waits a minute or so - fortunately, this morning AT&amp;amp;T is fast - and reads it back to me over the phone. I enter the code, and all is well.&lt;br /&gt;&lt;br /&gt;I change my backup hone number to the landline at the coast. Realizing that this will needto be changed again when I get back to Portland.&lt;br /&gt;&lt;br /&gt;And I write the backup passwords down by hand.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-3047413739125327338?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/3047413739125327338/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=3047413739125327338' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3047413739125327338'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3047413739125327338'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2012/01/i-like-google-2-step-verification-in.html' title='An Amusing and Frustrating Anecdote about Google 2-step Verification'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-1513771503848036058</id><published>2011-12-12T17:29:00.000-08:00</published><updated>2011-12-12T17:29:16.448-08:00</updated><title type='text'>Powering up multiple displays on Windows 7 - lots of flashing :-(</title><content type='html'>I have 4 external 1920x1200 displays connected to my Lenovo Thinkpad X220 Tablet PC, via Diamond DisplayLink based USB adapters.&lt;br /&gt;&lt;br /&gt;When I power the besat up, not just cold start, but also, most annoyingly, warm start, from Hibernate (Suspend to Disk) or Suspwend (to RAM) power down modes, it engages in excessive flashing.&lt;br /&gt;&lt;br /&gt;The screen blanks and redisplays 15 times!!!&lt;br /&gt;&lt;br /&gt;Quite annoying.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Continuing&amp;nbsp;&lt;a href="http://andyglew.blogspot.com/2011/12/multiple-displays-on-windows-7-good-but.html"&gt;http://andyglew.blogspot.com/2011/12/multiple-displays-on-windows-7-good-but.html&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-1513771503848036058?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/1513771503848036058/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=1513771503848036058' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1513771503848036058'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1513771503848036058'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/12/powering-up-multiple-displays-on.html' title='Powering up multiple displays on Windows 7 - lots of flashing :-('/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-2136313802983746584</id><published>2011-12-11T00:37:00.000-08:00</published><updated>2011-12-11T00:41:12.033-08:00</updated><title type='text'>Lifehack: Binder Clips help Debug Chistmas Lights</title><content type='html'>Fixing Christmas lights today.&lt;br /&gt;&lt;br /&gt;Figured out the binary search approach - e.g. as described by&amp;nbsp;&lt;a href="http://www.squidoo.com/howto-fix-broken-christmas-lights#module12970984"&gt;http://www.squidoo.com/howto-fix-broken-christmas-lights#module12970984&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;My own embellishment: I find it hard to track where I am in checking the Christmas lights, and/or strand continuity. &amp;nbsp;Even leaving empty sockets behind doesn't help that much, since the empty sockets are hard to see. &amp;nbsp;And since I have a small work space, so had to fold the string of lights back on itself 3 times.&lt;br /&gt;&lt;br /&gt;So I marked my position with binder clips. &amp;nbsp;I happened to have at least 4 colours of binder clip. I used four clips to dedfine the boundaries of the interval where I was searching for a duff lightbulb - two clibs at each boundary. &amp;nbsp;Two clips at each boundary because, given the folding ofg the string of lights, it became hard to tell which direction was towards the end, and which inside the interval. &amp;nbsp;So I used two colours of binder clip at each end.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-2136313802983746584?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.squidoo.com/howto-fix-broken-christmas-lights#module12970984' title='Lifehack: Binder Clips help Debug Chistmas Lights'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/2136313802983746584/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=2136313802983746584' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/2136313802983746584'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/2136313802983746584'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/12/lifehack-binder-clips-help-debug.html' title='Lifehack: Binder Clips help Debug Chistmas Lights'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-3136879790051267852</id><published>2011-12-08T07:21:00.000-08:00</published><updated>2011-12-08T07:22:02.301-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='security'/><title type='text'>A Modest 1 bit Proposal about Quotification - making the Default Easy</title><content type='html'>Listening to an old "Security Now" podcast while doing my morning stretches.&lt;br /&gt;&lt;br /&gt;Leo Laporte's TWIT website was hacked, and Steve Gibson, the Security Guy, says "Any time you are soliciting user input, there is a risk of malicious input somehow tricking the backend and executing that input, when it is meant to be, you know, benign [input data, like] user name and password.".&lt;br /&gt;&lt;br /&gt;This is typical of the classic SQL injection hack, and, indeed, of any hack where the attacker is able to inject scripting code and fool the backend into executing it. &amp;nbsp;Typically by embedding quotes or the like in the input string.&lt;br /&gt;&lt;br /&gt;(For that matter, Steve's description also applies to binary injection via buffer overflow. &amp;nbsp;But we won't go there; this page will talk only about non-buffer-overflow attacks, sijnce we have elsewhere described our agenda for preventing buffer overflow attacks.)&lt;br /&gt;&lt;br /&gt;Say that you are talking user input like NAME, and are somehow using it to create an SQL or other language command, like "SELECT FIELDLIST FROM TABLE WHERE NAME = '$NAME' &amp;nbsp;". &amp;nbsp; But now the attacker, instead of providing a nicely formed string like "John Doe", provides instead something like "anything' OR 'x' = 'x &amp;nbsp;". (I added spaces between the single and double quotes for readability.) I.e. the user provides a string that contains quotes in the target language - not the language where the query string is composed, but a language further along. &amp;nbsp;So the query string becomes&amp;nbsp;"SELECT FIELDLIST FROM TABLE WHERE NAME = 'anything' OR 'x' = 'x' &amp;nbsp;". And now the query matches any row in the table. &amp;nbsp;(&lt;a href="http://www.unixwiz.net/techtips/sql-injection.html"&gt;http://www.unixwiz.net/techtips/sql-injection.html&lt;/a&gt;&amp;nbsp;provides examples, as does wikip[edia.).&lt;br /&gt;&lt;br /&gt;The general solution to this is "quotification": take the user input, and either delete or quote anything that looks like a quote in the target language:. E.g. transform the attacker's string&amp;nbsp;"anything' OR 'x' = 'x &amp;nbsp;" into either&amp;nbsp;"anything OR x = x &amp;nbsp;" or&amp;nbsp;"anything\' OR \'x\' = \'x &amp;nbsp;".&lt;br /&gt;&lt;br /&gt;The problem with deleting stuff from the user string is that sometimes the user is supposed to have quotelike things. &amp;nbsp;Consider names like "O'Toole". &amp;nbsp;Or consider prioviding, e.g. via cut and paste, Chinese unicode names in an application whose original programmer was English, but where the system is otherwise capable of displaying Chinese. &amp;nbsp;It is a pity if the barrier to internationalizaion is the "security" code scattered throughout your application that santizes user input. Worse, that is the sort of code that might get fixed by somebody who fixing internationalization problems who doesn't understand the security issues&lt;br /&gt;&lt;br /&gt;The problem with quotifiying stuff is that it is hard. &amp;nbsp;It is not just a case, for you Perl afficionadoes, of doing s/'/\/g - what about input strings that already have \\' inside them? &amp;nbsp;And so on.&lt;br /&gt;&lt;br /&gt;But the real problem, applicable to both deleting and quotification strategies, is that the code doing the user input sanitization does not necessarily know the syntax of all of the languages downstream. &amp;nbsp;It may know that there is SQL along the way - but it may not know that somebody has just added a special filter that looks for French quotes, &amp;lt;&amp;lt; and &amp;gt;&amp;gt;. &amp;nbsp;Etc. &amp;nbsp;Not just special symbols: I have defined sublanguages where QuOtE and EnDqUoTe were the quotes.&lt;br /&gt;&lt;br /&gt;The security code may know the syntax at the time the sanitization code was written. &amp;nbsp;But the downstream processing may have changed. &amp;nbsp;The syntax of the language may have been extended, in a new revision of the SQL or Perl or ... . &amp;nbsp;(I found a bug like that last year.)&lt;br /&gt;&lt;br /&gt;The problem is that the user input santization code is trying to transform user input from strings that may be unsafe, to strings that are guaranteed to be safe forever and ever, no matter what revisions are made to the language, etc. &amp;nbsp; The problem is that the default for character strings is that ANY CHARCATER MAY BE PART OF A COMMAND unless specially quoted.&lt;br /&gt;&lt;br /&gt;We need to change this default. &amp;nbsp;Here is my moldest proposal:&lt;br /&gt;&lt;br /&gt;Let us define a character set whereby there is a special bit free in all characters. &amp;nbsp;And whereby, if that special bit is set, it is guaranteed by ANY REASONABLE LANGUAGE that no character with that special bit set will be part of any command or language syntax like a quote symbol.&lt;br /&gt;&lt;br /&gt;We should strongly suggest, that the visual display for the characters with and without the special bit set is the same. &amp;nbsp;Or at least, the same in most situations - in others you may want to distinguish them, e.g., by shading.&lt;br /&gt;.&lt;br /&gt;If you are using something like BNF to describe your language, then it might be:&lt;br /&gt;&lt;br /&gt;ORDINARY_CHARACTER ::== 'A' | 'B' | &amp;nbsp;...&lt;br /&gt;&lt;br /&gt;TAINTED_CHARACTER ::== 1'A' | 1'B' | &amp;nbsp;...&lt;br /&gt;POSSIBLY_TAINTED_CHARACTER ::= ORDINARY_CHARACTER | TAINTED_CHARACTER&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;where I am using the syntax 1'A' to describe a single character literal. with the special bit set.&lt;br /&gt;&lt;br /&gt;STRING_LITERAL := QUOTED_STRING | TAINTED_STRING&lt;br /&gt;TAINTED_STRING ::= TAINTED_CHARACTER+&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;QUOTED_STRING ::= " CHARACTER* "&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;(Actually, I am not sure whether a quoted string should be the abnove, or&lt;br /&gt;&amp;nbsp; &amp;nbsp; QUOTED_STRING ::= " POSSIBLY_TAINTED_CHARACTER* "&lt;br /&gt;)&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And we require that the only place where the possibly tainted characters with the tainted bit set are ONLY permitted in strings. &amp;nbsp;Nowhere else in the language. &amp;nbsp;Not in keywords, symbols, operators....&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Then we just have to ensure that all of our input routines set the special bit. If you really need to form operators, the programmer can untaint the data expliocitly. &amp;nbsp;Btter to have to untaint explicitly in a few p[laces, than to have to quotify correctly in all places.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Perhaps better to make taintimg the default. &amp;nbsp;To flip the polarity of the special bit. &amp;nbsp;And to require that language syntax, keywords, etcv., be set only if the special bit is set.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is just the well known taint or poison propagation strategy. &amp;nbsp;Exposed to programming language syntax definitions.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I have elsewhere espoused taking advantage of extensible XML syntax for programming languages. &amp;nbsp;This is similar, although orthogonal.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;wi8ki'ed at&amp;nbsp;&lt;a href="http://wiki.andy.glew.ca/wiki/A_Modest_1_bit_Proposal_about_Quotification_-_making_the_Default_Easy"&gt;http://wiki.andy.glew.ca/wiki/A_Modest_1_bit_Proposal_about_Quotification_-_making_the_Default_Easy&lt;/a&gt;&lt;/li&gt;&lt;li&gt;as well as posted on my blog&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-3136879790051267852?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://wiki.andy.glew.ca/wiki/A_Modest_1_bit_Proposal_about_Quotification_-_making_the_Default_Easy' title='A Modest 1 bit Proposal about Quotification - making the Default Easy'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/3136879790051267852/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=3136879790051267852' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3136879790051267852'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3136879790051267852'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/12/modest-1-bit-proposal-about.html' title='A Modest 1 bit Proposal about Quotification - making the Default Easy'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-4422400420239563929</id><published>2011-12-07T18:00:00.000-08:00</published><updated>2011-12-07T18:00:53.897-08:00</updated><title type='text'>Multiple Displays on Windows 7... Good, but...</title><content type='html'>I love the DisplayLink USB display adapters that let my piddling little tablet PC drive 4 1920x1200 external monitors.&lt;br /&gt;&lt;br /&gt;However, many, umm, irregularities happen with Windows 7 support and/or the display or device drivers.&lt;br /&gt;&lt;br /&gt;Earlier today, and for weeks if not months, I have been able ti have 5 displays - my laptop/tablet PC's built in LCD, and my 4 external displays.&lt;br /&gt;&lt;br /&gt;However, I went into hibernation while travelling from home to work (where I did not use the PC) and back to home again.&lt;br /&gt;&lt;br /&gt;And when I woke up, I cannot get my laptop built in display to work, when the external displays are plugged. Sure, it works when they are not plugged in; but when they are plugged in, the laptop built in display gets "blackened" in the Orientation box.&lt;br /&gt;&lt;br /&gt;Just yet another Windows 7 strangeness.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-4422400420239563929?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/4422400420239563929/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=4422400420239563929' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4422400420239563929'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4422400420239563929'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/12/multiple-displays-on-windows-7-good-but.html' title='Multiple Displays on Windows 7... Good, but...'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-4395539211054044559</id><published>2011-12-03T22:33:00.000-08:00</published><updated>2011-12-03T22:33:30.425-08:00</updated><title type='text'>Organization and time management, technology for</title><content type='html'>I felt inspired to start writing this. &amp;nbsp;More like collecting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-4395539211054044559?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://wiki.andy.glew.ca/wiki/Organization,_technology,_evolution_of' title='Organization and time management, technology for'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/4395539211054044559/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=4395539211054044559' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4395539211054044559'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4395539211054044559'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/12/organization-and-time-management.html' title='Organization and time management, technology for'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-1447290182614650202</id><published>2011-12-03T16:57:00.000-08:00</published><updated>2011-12-03T16:57:03.092-08:00</updated><title type='text'>Human handwritten proofreading marks</title><content type='html'>&lt;a href="http://creativeservices.iu.edu/resources/guide/marks.shtml"&gt;http://creativeservices.iu.edu/resources/guide/marks.shtml&lt;/a&gt;&amp;nbsp;has a nice collection of human proofreading marks.&lt;br /&gt;&lt;br /&gt;Some are irrelevant to pen or touch tablet computers - why mark that a period should inserted, when you can just do it?&lt;br /&gt;&lt;br /&gt;But some might be usefully considered for use as commands in a pen driven text editing system&lt;br /&gt;&lt;br /&gt;Possibly also for depiction of edits and revisions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-1447290182614650202?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://creativeservices.iu.edu/resources/guide/marks.shtml' title='Human handwritten proofreading marks'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/1447290182614650202/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=1447290182614650202' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1447290182614650202'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1447290182614650202'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/12/human-handwritten-proofreading-marks.html' title='Human handwritten proofreading marks'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-5395171424093733993</id><published>2011-12-03T14:38:00.000-08:00</published><updated>2011-12-03T14:38:22.879-08:00</updated><title type='text'>Organization Tools, Evolution of</title><content type='html'>I am fascinated, or at least interested, in tools for organization, scheduling. Time management. Filing.&lt;br /&gt;&lt;br /&gt;American schools say "Time management skills are the most important thing you have to learn in high school to be prepared for success in college." &amp;nbsp;But, as far as I know, nobody actually teaches techniques for organization and scheduling.&lt;br /&gt;&lt;br /&gt;Perhaps one of the reasons I am fascinated by this is that it does not come naturally to me. &amp;nbsp;But, as a result, I have studied it, and my career as a computer programmer and computer architect is really all about organization and scheduling. &amp;nbsp;Albeit for computers, not people.&lt;br /&gt;&lt;br /&gt;My whole career has been about scheduling for computers. &amp;nbsp;First software operating system schedulers, in particular real time schedulers. Then hardware OOO CPU schedulers.&lt;br /&gt;&lt;br /&gt;But I remain interested in scheduling and organization in many different aspects: computerzs, factories (JIT and kanban), people (organizers, PDAs, DayTimers, etc.)&lt;br /&gt;&lt;br /&gt;It is quite interesting to see how organizing tools for people have evolved. &amp;nbsp;I probably need to convert this into a wiki post so that I can evolve it. &amp;nbsp;&lt;a href="http://wiki.andy.glew.ca/wiki/Organization,_technology,_evolution_of"&gt;http://wiki.andy.glew.ca/wiki/Organization,_technology,_evolution_of&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;First paper, and now on computer.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;In paper:&lt;br /&gt;&lt;br /&gt;Duotangs&lt;br /&gt;&lt;br /&gt;Three ring folders.&lt;br /&gt;&lt;br /&gt;Tracker-keepr --- I just learned from a friend how this tracker/keeper system for holding papers together became popular for elementary and high school kids in the US in the 1980s. &amp;nbsp;Essentially plastic sheets that folded to provide an envelope out of which is was hard for things to fall.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Three ring folders have made a comeback in my daughter's&lt;br /&gt;school, but with zippers around them.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Staples&lt;br /&gt;&lt;br /&gt;Paper clips.&lt;br /&gt;&lt;br /&gt;A famous engineering author explains how, prior to paper clips and stables, people used straight pins.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-5395171424093733993?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://wiki.andy.glew.ca/wiki/Organization,_technology,_evolution_of' title='Organization Tools, Evolution of'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/5395171424093733993/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=5395171424093733993' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/5395171424093733993'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/5395171424093733993'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/12/organization-tools-evolution-of.html' title='Organization Tools, Evolution of'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-4089645274400046282</id><published>2011-12-03T12:45:00.000-08:00</published><updated>2011-12-03T12:45:03.325-08:00</updated><title type='text'>School websites</title><content type='html'>School websites sound like a good thing: use your browser to keep track of your kid's school, homework assignments, etc.&lt;br /&gt;&lt;br /&gt;In reality, it turns into a mess unless the website is organized. &amp;nbsp;And teachers are not necessarily good website architects[*].&lt;br /&gt;&lt;br /&gt;The school may have a website. &amp;nbsp;Good. &amp;nbsp;In a content management system like Drupal. &amp;nbsp;Okay. &amp;nbsp;With pages for every class, a news system, etc. Not so bad.&lt;br /&gt;&lt;br /&gt;But then some teachers may use a separate Moodle. &lt;br /&gt;&lt;br /&gt;And some other teachers may use a separate Google Apps or Google Sites site,.&lt;br /&gt;&lt;br /&gt;So, for class #1 you have to go to the Drupal site. For class #2 you have to go to the Moodle. &amp;nbsp;For class #3, the homework is posted on the Google Sites site, but there's some background info on the Moodle. &amp;nbsp;And, oh yes, still other stuff on the Drupal.&lt;br /&gt;&lt;br /&gt;AAARGH!!!! &amp;nbsp;I can't remember which webtool - Drupal, Moodle, Google Sites - to use for any particular class. Yes, I may be a Luddite parent - but I am a reasonably web.savvy Luddite parent. &lt;br /&gt;&lt;br /&gt;And it's not just me, the Luddite parent, who finds this confusing. &amp;nbsp;The students do too. &amp;nbsp;Not just my kid: &amp;nbsp;I have had a teacher complain to me that this teacher keeps telling the class to use the Google Sites site for the class, but the students never seem to understand. &amp;nbsp;Could it be because this teacher is their only teacher using Google Sites instead of Drupal or Moodle, and the kids naturally look to one place? &amp;nbsp; &amp;nbsp;Perhaps if this teacher created a class page in standard place on the Drupal or Moodle page, and linked to the Google Sites site for his class? It might also help the Luddite parents.&lt;br /&gt;&lt;br /&gt;Note that above at the [*] I decided to say "website architect" or "website organizer" rather than "website designer". &amp;nbsp;Website designers, at least the not so good ones, often concentrate on visual flashy effects. &amp;nbsp;School websites have plenty of that. &amp;nbsp;What they seem to lack is organization, or architecture. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Note: it's not just a question of installing a new content management system. &amp;nbsp;They already have more than enough. &amp;nbsp;Note that I am not saying "too many" - I am saying "more than enough". &amp;nbsp;I am not against using different CMSes. &amp;nbsp; But, if you do, you need to link them together, so you can get from one to the other.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;Here is a dream, a suggestion a modest proposal for school websites:&lt;br /&gt;&lt;br /&gt;(0) Have a front end, in whatever CMS you want. &amp;nbsp;Drupal, say.&lt;br /&gt;&lt;br /&gt;(1) Allow the teachers to create sites in whatever other CMSes they insist on using. Moodle. Google Sites. &amp;nbsp;Whatever.&lt;br /&gt;&lt;br /&gt;(2) But insist that there be a classroom page created on the front end site (Drupal, in this example). &amp;nbsp;If nothing else, it should point to where on the other CMSes the real classroom contents.&lt;br /&gt;&lt;br /&gt;Note that I don't mean just with text saying "some teachers use the Moodle or other tools". &amp;nbsp;Which basically means that you have to figure out how to get there on your own. &amp;nbsp;No, I mean a real, clickable, link to that other site. &amp;nbsp;If you can't create a link into that other site, well... &amp;nbsp;(IMHO that's the only reason to forbid using a different tool.)&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;---&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;More on the same lines: Use dates, so that stale data does not keep rising to the top.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I keep clicking to links and URLs that say stuff like "Fourth Grade Science". &amp;nbsp;But it turns out that it was 4th grade science for 2008. &amp;nbsp;Or last years' class. &amp;nbsp;Or ...&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A page identifier like "4th grade science" is really a floating identifier. &amp;nbsp;At any particular time it should point to something like "4th grade science 2011-2012". &amp;nbsp;Every year, it should be updated.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'm quite happy to have the old years' stuff remain on the website(s). &amp;nbsp;It's nice to look at. &amp;nbsp;But, I don't like it getting in my way when I am looking for this year's stuff.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;---&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Perhaps "News" should expire after a while?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I.e. I am not at all sure that class presentations for history class from two years ago are still "News". &amp;nbsp;More like "Olds". &amp;nbsp;Again, useful for reference, but a time waster when I, am a parent, am trying to find what is really new at my child's school website.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;---&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In my dreams: &amp;nbsp;wouldn't it be nice to have a page you or your child could log into, that contained links to the particular class webpages for all the classes your child is taking?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Actually, my effort to do that myself - to create a wiki/Evernote/whatever page, pointing to all of the key information on my child's school website - is what inspired this post. &amp;nbsp;It's darned frustrating finding it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;---&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Hypertext systems like the web have great potential. &amp;nbsp;But they are a pain to use if disorganized.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It's hard to keep them organized.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Search engines like Google are the saving grace for the real web. &amp;nbsp;But Google is not allowed to index a school website with possibly sensitive information. &amp;nbsp; Most CMSes have their own search, but when you have multiple CMSes, this doesn't help much.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It would be nice if each school ran its own private Google indexer on its various websites and CMSes. &amp;nbsp;(Hmm, Google, maybe that would be a nice public service? &amp;nbsp;Or are you hell bent on getting as many schools as possible to use Google Sites and Apps, giving non-Google Drupal and Moodle the cold shoulder?)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But even if there was decent search for school websites, some modicum of organization is desirable. &amp;nbsp;I think what I suggest is pretty basic: &amp;nbsp;(1) All class webpages in a fixed place, linking to whatever other websites or services the teacher chooses to use. &amp;nbsp;(2) With dated content content, and floating labels - ("4th grade science 2008-9" versus "This year's 4th grade science".)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;===&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Nothing that I say here is specific to schools. &amp;nbsp;This is just good website design - or should I say "architecture" - for a heterogeneous system.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Nothing that I say here is critical of school IT. &amp;nbsp;School IT just sets up the system - the teachers provide the content. &amp;nbsp; &amp;nbsp;The only slight criticism is that IT should enforce the very minimal standards that I suggest. &amp;nbsp;Perhaps school IT should set up the default classroom pages on the primary website. &amp;nbsp;Teachers should be responsible for linking to whatever other non-standard website technology they choose to use.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;===&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I just had a thought about why this schizophrenia arises aty school websites.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Teachers are often not the most sophisticated web.users, whether in their own college days, or afterwards.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In college, teachers probably used whatever their university IT department provided. And university IT departments are often pretty good, in particular good at providing services to unsophisticated users (faculty and students)&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But different universities have different CMSes. Some use Drupal, some Moodle, Google Apps and Google Sites increasingly common.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So a newly graduated teacher is used to whatever they used at school. &amp;nbsp;And is also used to having reasonably good IT support.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But when they start teaching at a K-12 school, you get teachers who graduated from different university programs, each used to different CMSes. &amp;nbsp;And they all want to use what they are familiar with. &amp;nbsp;Moreover, the K-12 school IT, although perfectly competent, is NOT as fledged as that of a university IT department.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I am sure that large school districts enforce website standards: e.g. they probably tell their teachers that Moodle is the only CMS supported, use it or else. &amp;nbsp;Makes it easier to organize. &amp;nbsp;However, that is not true for the schools that I have interacted with. &amp;nbsp;And I remain an advocate of heterogeneous systems. &amp;nbsp;It;s just that they require a certain minimum amount of work.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;===&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I should probably volunteer to help my kid's school organize its website.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-4089645274400046282?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/4089645274400046282/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=4089645274400046282' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4089645274400046282'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4089645274400046282'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/12/school-websites.html' title='School websites'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-5450118748589070137</id><published>2011-11-16T08:17:00.000-08:00</published><updated>2011-11-16T08:17:22.268-08:00</updated><title type='text'>Reagan Cattledog Links - reverifiable COW BOW links</title><content type='html'>Thinking about updating shared libraries:&lt;br /&gt;&lt;br /&gt;Shared libraries' true advantage is their true disadvantage: multiple users can be updated, whether for good or ill.&lt;br /&gt;&lt;br /&gt;Perhaps what we need are shared library linkages that are not automatically updated. Where an update is marked pending, encouraging &amp;nbsp;the user of the library to update it asap, but the change is not made.&lt;br /&gt;&lt;br /&gt;I am calling this a reverifiable COW link. A link,that is broken when somebody else writes to the linked object (hence COW, Copy on Write, or BOW, Break on Write). &amp;nbsp;But which is reverifiable. Retestable. &amp;nbsp;(As one of my friends says "If you really believe in unit tests..." &amp;nbsp;(I do, he doesn't).&lt;br /&gt;&lt;br /&gt;I would like very much to be able to have acronym COWBOY instead of COW BOW. &amp;nbsp;But I am humour deprived.&lt;br /&gt;&lt;br /&gt;In the meantime I can call them Reagan Cattledog links. &amp;nbsp;Get it? BOW, as in bowwow, dog. &amp;nbsp;Reagan, as in "trust, but verify."&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;This not just for shared libraries. &amp;nbsp;Any sharing. &amp;nbsp;web pages. Like the "cached link" I have described elsewhere. &amp;nbsp;Cached links are really just COW BOW links which are assumed to be updated when the linkee comes back online.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-5450118748589070137?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/5450118748589070137/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=5450118748589070137' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/5450118748589070137'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/5450118748589070137'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/11/reagan-cattledog-links-reverifiable-cow.html' title='Reagan Cattledog Links - reverifiable COW BOW links'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-3050857962262144265</id><published>2011-11-16T08:08:00.000-08:00</published><updated>2011-11-16T08:08:59.808-08:00</updated><title type='text'>Shared libraries and data deduplication</title><content type='html'>People have talked about the advantages of shared libraries: reducing virtual memory requirements, reducing disk space requirements, etc, because of sharing.&lt;br /&gt;&lt;br /&gt;Here's a thought: Q: if we had truly ubiquitous data deduplication, what would be the advantages of shared libraries?&lt;br /&gt;&lt;br /&gt;A: none of the performance wins through sharing need apply. &amp;nbsp;Deduplication beats them in a more flexible, more abstract, way.&lt;br /&gt;&lt;br /&gt;(Of course, we do not have truly ubiquitous deduplication. And it usually requires things to be block or file aligned.)&lt;br /&gt;&lt;br /&gt;This leaves the only fundamental advantage of shared librares&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;the fact that you can effect a ubiquitous change by updating a shared library.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Which is also their fundamental disadvantage. &amp;nbsp;You can propagate a bug fix. &amp;nbsp;But you can also propagate bugs.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-3050857962262144265?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/3050857962262144265/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=3050857962262144265' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3050857962262144265'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3050857962262144265'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/11/shared-libraries-and-data-deduplication.html' title='Shared libraries and data deduplication'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-4284809299982667096</id><published>2011-11-16T07:55:00.000-08:00</published><updated>2011-11-16T07:55:33.178-08:00</updated><title type='text'>Modules(1)</title><content type='html'>So I read Furlani et al's papers on "Modules".&lt;br /&gt;&lt;br /&gt;Modules is not so bad - it is a nice way of dealing with an annoying problem.&lt;br /&gt;&lt;br /&gt;Or, rather, Modules may be the best possible way of dealing with environment dependent code. &amp;nbsp;But it might be better to discourage environment dependent code in the first place. &amp;nbsp;See my earlier post about environment dependent code being a dominant meme.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Minor observation: I would like to bring some non-Modules source scripts "kicking and screaming into the '90s with Modules". &amp;nbsp;I would like to simply wrapperize some existing legacy code that requires you to "set some environment variables and then source foo". &amp;nbsp;I.re.I don't want to rewrite foo - I would just like to wrap it in a module.&lt;br /&gt;&lt;br /&gt;Modules does not seem to be able to do this.&lt;br /&gt;&lt;br /&gt;Although it looks as if it would only be a minor extension to modules to handle it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-4284809299982667096?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/4284809299982667096/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=4284809299982667096' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4284809299982667096'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4284809299982667096'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/11/modules1.html' title='Modules(1)'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-3035114641361711259</id><published>2011-11-16T07:49:00.001-08:00</published><updated>2011-11-16T07:49:42.424-08:00</updated><title type='text'>To do foo, start off in a new window</title><content type='html'>How many times have you seen "How to" directions begin:&lt;br /&gt;&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;To do foo&lt;/blockquote&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Start in a fresh xterm&lt;/li&gt;&lt;li&gt;Start in a fresh shell (typicaly csh)&lt;/li&gt;&lt;li&gt;Log out and log in again so that you get a clean environment&lt;/li&gt;&lt;/ol&gt;etc.&lt;br /&gt;&lt;br /&gt;While this may be good advice, certainly good advice for debugging brokenesses and/or avoiding bugs in the first place &amp;nbsp;- it is basically an admission that something is not right.&lt;br /&gt;&lt;br /&gt;Some tool depends on the environment in weird ways.Possibly not just the standard UNIX environment string;possibly also the extended shell environment.&lt;br /&gt;&lt;br /&gt;Tools should be callable from almost arbitrary environments. &amp;nbsp;They should not DEPEND on environment variables. It may be acceptable to USE environment variables to change some behaviors, but, consider: if you had a setuid script, it would probably be unwise to depend on environment variables. &amp;nbsp;Environment variables should be considered tainted.&lt;br /&gt;&lt;br /&gt;I suppose my version of the above is to say&lt;br /&gt;&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;To do foo&lt;/blockquote&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Empty all of your environment variables, and start off with a minimum environment&lt;/li&gt;&lt;li&gt;Type the absolute path to the tool,/a/b/.../c/d/tool&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;IMHO tools should work when invoked like this. &amp;nbsp;If they are using the equivalent of Perk FindBin, they should be able to locate all of the relevant library files, etc., they need. Or else they should have, embedded in them, the paths to same.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;GLEW OPINION: much of the reason for environment abuse is the broken, non-object oriented, UNIX installation model, where a tool may be put in /bin, its libraries in /usr/lib, etc - where the directories are not known at build time. &amp;nbsp;PATH, LIBPATH. MANPATH. &amp;nbsp;FindBin can live with this - a FindBin script can be relocated by copying - so long as the relative locations of what it depends on are maintained.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-3035114641361711259?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/3035114641361711259/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=3035114641361711259' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3035114641361711259'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3035114641361711259'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/11/to-do-foo-start-off-in-new-window.html' title='To do foo, start off in a new window'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-1144883739342003879</id><published>2011-11-16T07:36:00.000-08:00</published><updated>2011-11-16T07:37:13.116-08:00</updated><title type='text'>source scripts</title><content type='html'>I am going to try calling the sort of shell command file that must be source'd, i.e. read into the user so that it can modify the environment, a "source-script".&lt;br /&gt;&lt;br /&gt;As opposed to a "shell-script" or other command which is usually executed as a subprocess.&lt;br /&gt;&lt;br /&gt;UNIX-style subprocess execution of commands provides isolation. &amp;nbsp;The parent is unaffected by the child, except for interactions through the filesystem. &amp;nbsp;(Although with the /proc filesystem, or the child applying a debugger to the parent, that could be significant.)&lt;br /&gt;&lt;br /&gt;Whereas, consider a csh style source-script. &amp;nbsp;It can be changing directories all over the place to get its work done. &amp;nbsp;And it may terminate with an error before finishing properly. &amp;nbsp;So the "caller" or the source-script may not know what directory he is in after the source script terminates.&lt;br /&gt;&lt;br /&gt;Q: &amp;nbsp;how many people do:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;set saved_dir=`pwd`&lt;br /&gt;source srcscript.csh&lt;br /&gt;cd $saved_dir&lt;/blockquote&gt;&lt;br /&gt;And,of course, even that has obvious bugs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-1144883739342003879?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/1144883739342003879/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=1144883739342003879' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1144883739342003879'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1144883739342003879'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/11/source-scripts.html' title='source scripts'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-3523005075403323264</id><published>2011-11-16T07:29:00.000-08:00</published><updated>2011-11-16T07:29:41.990-08:00</updated><title type='text'>environment setting a dominant meme?</title><content type='html'>Thinking about why I go through this paroxysm of disgust whenever I encounter a toolchain that depends on environment variables. &amp;nbsp;Like Modules or modulefiles(1). Like many CAD tools.&lt;br /&gt;&lt;br /&gt;This morning it struck me: they are a dominant meme. &amp;nbsp;An evolutionarily stable strategy.&lt;br /&gt;&lt;br /&gt;Not because environment based tools are better.&lt;br /&gt;&lt;br /&gt;But because cleanly written stuff,like I try to write, can be called pretty safely from anywhere. &amp;nbsp;Whereas stuff that &amp;nbsp;does what I consider unclean environment modifications cannot be called so easily from other code. &amp;nbsp;It can call other code, but it is hard to be called from other code. &amp;nbsp;So there is a tendency for users to just give in, and write in csh (since csh in so often the language associated with such environment dependent tools).&lt;br /&gt;&lt;br /&gt;Sure, you can try to write code that prints the environment and which then gets called. However, this only catches the UNIX environment - modulefiles(1) rely on sideeffects in the extended shell environment, shell functions and aliases. &amp;nbsp;You could print these, but would have to parse them to pass to a different language, or at least reread them if passing to a later compatible shell.&lt;br /&gt;&lt;br /&gt;Bandaid.&lt;br /&gt;&lt;br /&gt;The best way to work with such tools is to start a persistent subprocess, pass it commands, and interpret the results. &amp;nbsp;Expect style. &amp;nbsp;Coroutines. Which is doable, but is more complex than order function calls / UNIX style subprocess invocations.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-3523005075403323264?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/3523005075403323264/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=3523005075403323264' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3523005075403323264'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3523005075403323264'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/11/environment-setting-dominant-meme.html' title='environment setting a dominant meme?'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-9137833529282086484</id><published>2011-11-13T13:37:00.000-08:00</published><updated>2011-11-13T13:38:53.659-08:00</updated><title type='text'>calling a function to change the environment</title><content type='html'>I think that what I really object to in tools that dedpend on environmebt variables is that it is hard to put a wrapper around environment variables.&amp;nbsp;I.e. it is hard to "call a function" to set environment variables.&lt;br /&gt;&lt;br /&gt;However, I have to parse out what I mean by the terms above.&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Remember: I am talking about scripting, in shell and Perl, etc.&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; "Calling a function" in a shell script usually means executing a child process.  And, by definition, a child process dos not pass environment variables back to its parent.&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Now, of course, in every scripting language worth its salt, it is possible to write a function that sets environment variables.  But that's a function in the language you are running in.  It's a lot harder to have, e.g. perl or bash call a csh function, and have that child csh set environment variables in the parent perl or bash.&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Similarly, you can source a file in csh, or "." it in bash, and hae some degree of modularization.  But again it is same language.&lt;br /&gt;&lt;br /&gt;Why do I care?  Mainly because I am writing stuff in bash or perl or python, and I want to get whatever environment variables legacy csh scripts set up.  &lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; But, in general, you lose abstraction if you are constrained to call functions only written in your current language. &amp;nbsp;Or, even then, if only callable via a special syntax, eg. csh's "source file" rather than just executing file.&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Loss of abstraction. &amp;nbsp;But, requiring a special syntax provides a but of security.You can tell who is side effectful, and who is not. &amp;nbsp;Pure vs impure functions.&lt;br /&gt;&lt;br /&gt;My clear-env script does things oppositely - it allows me to call a script in a subshell with a clean environment, but I don't necessarily see what it set up in the environment.&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Similarly, my friends trick where bash can get a csh script's environment by doing something like&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;csh -c "source module; exec bash&lt;/span&gt;&lt;/div&gt;is NOT a "function call". It's more like continuations.&lt;br /&gt;&lt;br /&gt;Part of the trouble is that the whole point of processes is to isolate the parent from the child.&lt;br /&gt;&amp;nbsp; &amp;nbsp; Except that here, the whole point is to get access tyo the child's side effects.&lt;br /&gt;&lt;br /&gt;I think that I may need to create a family of scripts in various languages that execute or source or whatever, and then, at the end, printenv into a file or stdout - so that the parent can parse the printenv.&lt;br /&gt;&lt;br /&gt;A better way would be to do something like stop the child process in a debugger - and then have the parent look through the child's environment with the debugger.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;I haven't even touched on other side effects. &amp;nbsp;Not just the environment, but file descriptors.&lt;br /&gt;&lt;br /&gt;E.g. "I have a bash script that wants to call a csh script that redirects stdout and stderr - and then have the parent bash script use those settings".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-9137833529282086484?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/9137833529282086484/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=9137833529282086484' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/9137833529282086484'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/9137833529282086484'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/11/calling-function-to-change-environment.html' title='calling a function to change the environment'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-5898647775906685383</id><published>2011-11-12T10:32:00.000-08:00</published><updated>2011-11-12T10:33:16.046-08:00</updated><title type='text'>Why depend on UNIX environment variables?</title><content type='html'>Previously I have posted about how I find tools that depend on sourcing csh files to get environment variables set a real pain to deal with.  Largely because I don't have good ways of automating such procedures, both for use and test.  And also because of environment variable interference.&lt;br /&gt;&lt;br /&gt;So, I ask myself: Q: why depend on environment variables?&lt;br /&gt;&lt;br /&gt;Then I remember how excited I was to learn of shell environment variables.&lt;br /&gt;&lt;br /&gt;By setting an environment variable I could arrange for a parameter to be passed from the outside world right into the middle of a tool.  Without having to modify any of the interveing levels.  &lt;br /&gt;&lt;br /&gt;No need to create argument parsing code.&lt;br /&gt;&lt;br /&gt;In fact, in Bell Labs type shells,argument parsers are implicitly created for environmrnt variables:&lt;br /&gt;&lt;blockquote&gt;VAR1=val1 VAR2=val2 program options...&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-5898647775906685383?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/5898647775906685383/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=5898647775906685383' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/5898647775906685383'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/5898647775906685383'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/11/why-depend-on-unix-environment.html' title='Why depend on UNIX environment variables?'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-2523924630959376300</id><published>2011-11-12T10:27:00.000-08:00</published><updated>2011-11-12T10:27:37.381-08:00</updated><title type='text'>I hate (csh) environment based tools</title><content type='html'>I hate tools that have heavy, undocumented, environment dependencies.&lt;br /&gt;&lt;br /&gt;csh scripts seem to be the classic example.  Beware of anything that says&lt;br /&gt;&lt;pre&gt;source file1&lt;br /&gt;source file2&lt;br /&gt;do this&lt;br /&gt;source file3&lt;br /&gt;do that&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;where the csh files you are source'ing mainly act by setting up environment variables, but also may act by side effects such as cd'ing.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;Why do I hate these things?  They are hard to automate. Especaly to automatically test.&lt;br /&gt;&lt;br /&gt;Really, to automate I need to insert checks into the script above after each step.  At least if it is flakey.  (If it is not flakey and is all working, I don't care what it does.  As long as I can put t in a black box, and don't have to let its side effects escape.)&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Why do I hate these things?&lt;br /&gt;&lt;br /&gt;So often the are designed for interactive use.  And interfere with other stuff you may be using interactvely.&lt;br /&gt;&lt;br /&gt;Oftentimes I need to fall back to re,oving all of my interactive customizations to get something like this working in a clean environment.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;I have a script I call clear-env that deletes environment variables, and starts up new subshells.  Has saved my bacon many times&lt;br /&gt;&lt;br /&gt;However, today I am running into problems that depend on running exactly the sit standard initialization files, .login and .cshrc, before running any other csh-&amp;**^&amp;**^&amp;^-source modules.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-2523924630959376300?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/2523924630959376300/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=2523924630959376300' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/2523924630959376300'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/2523924630959376300'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/11/i-hate-csh-environment-based-tools.html' title='I hate (csh) environment based tools'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-8010333642882445387</id><published>2011-11-12T10:19:00.000-08:00</published><updated>2011-11-12T10:19:41.998-08:00</updated><title type='text'>Testing: just (start) doing it</title><content type='html'>Trying to test something that I don;t know how to automate.  In fact, my goal is to automate - but I can't automate it enough even to run the smallest self checking automated test.  The blinking procedure only works, at all, interactively.&lt;br /&gt;&lt;br /&gt;So, do what I can:&lt;br /&gt;&lt;br /&gt;I can't run the program under test automatically.  (Yet: I hope to change this.)&lt;br /&gt;&lt;br /&gt;But the program under test does leave some output files, etc.&lt;br /&gt;&lt;br /&gt;Therefore, I can automate CHECKING the output of the manual test.&lt;br /&gt;&lt;br /&gt;Perhaps in a while I will be able to automate the whole thing.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;It is amazing the sense of ease that even this small degree of automation brings.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-8010333642882445387?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/8010333642882445387/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=8010333642882445387' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8010333642882445387'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8010333642882445387'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/11/testing-just-start-doing-it.html' title='Testing: just (start) doing it'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-8006265798130076577</id><published>2011-11-08T11:51:00.000-08:00</published><updated>2011-11-08T11:51:03.441-08:00</updated><title type='text'>hgignore</title><content type='html'>&lt;a href="http://www.selenic.com/mercurial/hgignore.5.html"&gt;hgignore&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;a style="font-size:13px" href="https://chrome.google.com/webstore/detail/pengoopmcjnbflcjbmoeodbmoflcgjlk"&gt;'via Blog this'&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Most version control tools have a .ignore file or ignore  - .cvsignore, .hgignore, vk ignore, bzr ignore, etc.&lt;br /&gt;&lt;br /&gt;All that I am aware of ignore fils based on pattern matching on the filename.  E.g. in Mercurial:&lt;br /&gt;&lt;blockquote&gt;&lt;/blockquote&gt;&lt;blockquote&gt;An untracked file is ignored if its path relative to the repository root directory, or any prefix path of that path, is matched against any pattern in .hgignore.&lt;/blockquote&gt;I would like to extend this to be able to do simple queries.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;E.g. I usually have an ignore rule that looks something like&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;*.novc/&lt;/blockquote&gt;&lt;/div&gt;&lt;div&gt;I.e. I ignore directories that are suffixed .novc (for no version control). &lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This works fine, but is somewhat verbose.  Plus, it gets in the way when certan tools have other conventions about names.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I should like to get the .novc directvetive out of the filename, and into a placeholder file in the directory.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;E.g. if .../path/.novc exists, then ignore .../path&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Q: is it there, and I do not know?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-8006265798130076577?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.selenic.com/mercurial/hgignore.5.html' title='hgignore'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/8006265798130076577/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=8006265798130076577' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8006265798130076577'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8006265798130076577'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/11/hgignore.html' title='hgignore'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-2481271564453690615</id><published>2011-10-17T21:23:00.000-07:00</published><updated>2011-10-17T21:23:56.141-07:00</updated><title type='text'>Comp-arch.net wiki on hold from October 17, 2011 - CompArch</title><content type='html'>&lt;a href="http://semipublic.comp-arch.net/wiki/Comp-arch.net_wiki_on_hold_from_October_17,_2011"&gt;Comp-arch.net wiki on hold from October 17, 2011 - CompArch&lt;/a&gt;: &lt;br&gt;&lt;br&gt;&lt;a style="font-size:13px" href="https://chrome.google.com/webstore/detail/pengoopmcjnbflcjbmoeodbmoflcgjlk"&gt;'via Blog this'&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= On Hold =&lt;br /&gt;&lt;br /&gt;The comp-arch.net wiki is being put on hold - or at least to a very low boil back burner - &lt;br /&gt;as of Monday, October 17, 2011.&lt;br /&gt;&lt;br /&gt;Reason: I (Andy Glew, the creator of and main contributor to comp-arch.net)&lt;br /&gt;have decided to go back to work in industry as a computer architect at MIPS Technologies.&lt;br /&gt;&lt;br /&gt;= (Pre)History of comp-arch.net =&lt;br /&gt;&lt;br /&gt;I have long wanted to write a book that I threatened to call "The Art of Computer Architecture".&lt;br /&gt;I would like it to be like Knuth's "The Art of Computer Programming",&lt;br /&gt;except that I am no Don Knuth:  I am willing to compromise, not necessarily provide full academic references,&lt;br /&gt;if in exchange I can document the "folklore" of computer architecture - the things that hardware engineers&lt;br /&gt;know or used to know, but which never seem to get written down, and which get re-invented every decade or so.&lt;br /&gt;&lt;br /&gt;The web, and wikis in particular, provided a forum and technology to allow me to write this in small pieces.&lt;br /&gt;&lt;br /&gt;At most of the computer companies I have worked at, in particular AMD and Intel&lt;br /&gt;(but also to a lesser extent at Gould and Motorola, prior to the creation of the web)&lt;br /&gt;I have created wikis like this.&lt;br /&gt;Whenever other engineers asked me a question&lt;br /&gt;for which the answer was known, in the literature and/or folklore,&lt;br /&gt;but not necessarily easily accessible,&lt;br /&gt;I would write up a white paper or wiki page explaining the topic.&lt;br /&gt;Bonus: often in so doing I would realize missing alternatives in a taxonomy,&lt;br /&gt;which would lead to new inventions.&lt;br /&gt;&lt;br /&gt;These company-internal web and wiki sites proved popular.&lt;br /&gt;Several times, former co-workers who had left industry to become academics &lt;br /&gt;asked me if they were accessible outside the company.&lt;br /&gt;I always had to say "No".&lt;br /&gt;&lt;br /&gt;Intel and AMD never allowed me to create public versions of such websites.&lt;br /&gt;Perhaps they would have given extensive legal review - but I found the process of getting such approval a large disincentive.&lt;br /&gt;The document or wiki or website would be stale before approved.&lt;br /&gt;&lt;br /&gt;In September 2009 I left Intel to join Intellectual Ventures.&lt;br /&gt;One of the conditions for my joining IV was that I be allowed to work on such a public website,&lt;br /&gt;http://comp-arch.net.&lt;br /&gt;(I actually created the website during the brief period between leaving Intel and starting work at IV.)&lt;br /&gt;I am incredibly grateful to IV for giving me that opportunity.&lt;br /&gt;&lt;br /&gt;Progress on comp-arch.net was slow, and probably not steady, but at least visible.&lt;br /&gt;In two years I had created 300 public pages on comp-arch.net.&lt;br /&gt;&lt;br /&gt;In addition, I created a certain number of private wiki pages, sometimes on topics that I thought might be worth patenting,&lt;br /&gt;sometimes because I was afraid that disclosing the topics I was writing about might create conflict with IV.&lt;br /&gt;Even though my employment agreement might give me the right to work on something in public,&lt;br /&gt;I would not want to get in the way of my employer's business or marketing strategy.&lt;br /&gt;Such conflicts would have loomed very large for Intel&lt;br /&gt;- I would have had trouble writing honestly about Itanium at the time when Itanium was Intel's main emphasis&lt;br /&gt;- and were much less of a problem for IV,&lt;br /&gt;but still FUD and self-censorship were an impediment&lt;br /&gt;to work on comp-arch.net.&lt;br /&gt;&lt;br /&gt;However, I say again: I am immensely grateful to Intellectual Ventures for giving me the chance to start working on comp-arch.net.&lt;br /&gt;THANK YOU, IV!!!!&lt;br /&gt;If I was confident I could stay state-of-the-art as a computer architect while continuing to work on comp-arch.net&lt;br /&gt;and with Intellectual Ventures, I would keep doing so.&lt;br /&gt;&lt;br /&gt;= Present of comp-arch.net =&lt;br /&gt;&lt;br /&gt;For reasons such as these I left Intellectual Ventures to return to work in industry as a computer architect.&lt;br /&gt;On October 17, 2011, I joined MIPS Technologies.&lt;br /&gt;&lt;br /&gt;At MIPS I do not expect to be able to write pages on comp-arch.net and post them in real time.&lt;br /&gt;I will continue to try to work on comp-arch.net in private,&lt;br /&gt;and occasionally seek approval to make stuff public.&lt;br /&gt;&lt;br /&gt;= Working on the Wiki =&lt;br /&gt;&lt;br /&gt;I will also take this opportunity to work on the technology of comp-arch.net.&lt;br /&gt;&lt;br /&gt;In 2009 I started comp-arch.net using mediawiki.&lt;br /&gt;&lt;br /&gt;I have long wanted a better wiki technology.  My wiki wishlist is documented elsewhere on this wiki, and other places.&lt;br /&gt;It includes better support for drawings,&lt;br /&gt;and better support for melding of public and private content&lt;br /&gt;- since it appears that such restrictions will be something I have to live with for the foreseeable future.&lt;br /&gt;The computer hardware industry is not "open" as the wiki philosophy would have it,&lt;br /&gt;except possibly within companies.&lt;br /&gt;&lt;br /&gt;= Future of comp-arch.net =&lt;br /&gt;&lt;br /&gt;As mentioned above, I hope to continue working on comp-arch.net,&lt;br /&gt;making stuff public occasionally, when permitted.&lt;br /&gt;&lt;br /&gt;Plus, if ever I retire, I hope to continue this labor of love.&lt;br /&gt;&lt;br /&gt;= It's not just about me! =&lt;br /&gt;&lt;br /&gt;Although I have been the main contributor to comp-arch.net,&lt;br /&gt;there have been other contributors.&lt;br /&gt;&lt;br /&gt;For example, Paul Aaron Clayton has made some contributions.&lt;br /&gt;&lt;br /&gt;I hope that others can continue to work on comp-arch.net&lt;br /&gt;during this time when I must leave it on hold.&lt;br /&gt;&lt;br /&gt;If you are interested, please contact me, and I will arrange for access.&lt;br /&gt;&lt;br /&gt;(I may need to do a bit of wiki work to separate the old stuff from new stuff.)&lt;br /&gt;&lt;br /&gt;= Au revoir, et &amp;agrave; bient&amp;ocirc;t =&lt;br /&gt;&lt;br /&gt;I hope to see myself working on comp-arch.net again.&lt;br /&gt;&lt;br /&gt;But for the moment, I am excited to be working towards actual shipping product again at MIPS.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-2481271564453690615?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Comp-arch.net_wiki_on_hold_from_October_17,_2011' title='Comp-arch.net wiki on hold from October 17, 2011 - CompArch'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/2481271564453690615/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=2481271564453690615' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/2481271564453690615'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/2481271564453690615'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/10/comp-archnet-wiki-on-hold-from-october.html' title='Comp-arch.net wiki on hold from October 17, 2011 - CompArch'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7209999369774592749</id><published>2011-10-16T21:51:00.000-07:00</published><updated>2011-10-16T21:51:54.023-07:00</updated><title type='text'>perlsec - Perl security</title><content type='html'>&lt;a href="http://www.washington.edu/perl5man/pod/perlsec.html"&gt;perlsec - Perl security&lt;/a&gt;: &lt;br&gt;&lt;br&gt;&lt;a style="font-size:13px" href="https://chrome.google.com/webstore/detail/pengoopmcjnbflcjbmoeodbmoflcgjlk"&gt;'via Blog this'&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;More of the same.&lt;br /&gt;&lt;br /&gt;This is probably stale.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7209999369774592749?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.washington.edu/perl5man/pod/perlsec.html' title='perlsec - Perl security'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7209999369774592749/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7209999369774592749' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7209999369774592749'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7209999369774592749'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/10/perlsec-perl-security.html' title='perlsec - Perl security'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-6632570983771949729</id><published>2011-10-14T13:21:00.000-07:00</published><updated>2011-10-14T13:21:31.506-07:00</updated><title type='text'>Mint to Quicken 2011</title><content type='html'>&lt;a href="http://satisfaction.mint.com/mint/topics/mint_to_quicken_2011?from_gsfn=true"&gt;Mint to Quicken 2011&lt;/a&gt;: &lt;br&gt;&lt;br&gt;&lt;a style="font-size:13px" href="https://chrome.google.com/webstore/detail/pengoopmcjnbflcjbmoeodbmoflcgjlk"&gt;'via Blog this'&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I would be posting this on a Mint.com forum, but I don't want to bother with registering, so I will post it here.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I am yet another person who would like to synch Quicken from Mint, and possibly vice versa.&lt;br /&gt;&lt;br /&gt;Although just being able to take transactions from Mint into Quicken would be wonderful.  A backup.&lt;br /&gt;&lt;br /&gt;In my case, although I am a past Quicken user, I am not currently.  Not even a Mint user.  I use Yodlee.  I would consider switching to any product that allowed me to keep both online and offline accounts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-6632570983771949729?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://satisfaction.mint.com/mint/topics/mint_to_quicken_2011?from_gsfn=true' title='Mint to Quicken 2011'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/6632570983771949729/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=6632570983771949729' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6632570983771949729'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6632570983771949729'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/10/mint-to-quicken-2011.html' title='Mint to Quicken 2011'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7823114589573916236</id><published>2011-10-12T10:57:00.000-07:00</published><updated>2011-10-12T10:57:35.261-07:00</updated><title type='text'>Google Plus Traffic Drops, 1269% Gains Erased</title><content type='html'>&lt;a href="http://www.readwriteweb.com/archives/google_plus_traffic_drops_1269_gains_erased.php"&gt;Google Plus Traffic Drops, 1269% Gains Erased&lt;/a&gt;: &lt;br&gt;&lt;br&gt;&lt;a style="font-size:13px" href="https://chrome.google.com/webstore/detail/pengoopmcjnbflcjbmoeodbmoflcgjlk"&gt;'via Blog this'&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Google+ is declining.&lt;br /&gt;&lt;br /&gt;But I am still using Google+.  &lt;br /&gt;&lt;br /&gt;For one big reason: the "circles" provide better access control than LinkedIn or FaceBook.&lt;br /&gt;&lt;br /&gt;Why do you want access control?  Well, it often becomes obvious, looking at LinkedIn or FaceBook updates, when a person from company 1 is interviewing with company 2.  Now, you can often turn off the automatic notification - but I know former coworkers/managers who proactively searched LinkedIn and FaceBook, to see if people they knew had suddenly become, e.g. linked directly to people in other companies, or even rivals within the same company.&lt;br /&gt;&lt;br /&gt;I.e. YOU SHOULD NOT ENTER JOB INTERVIEW CONTACTS IN LINKEDIN OR FACEBOOK!!!&lt;br /&gt;&lt;br /&gt;(Not if there is a possibility that your present employer might retaliate.)&lt;br /&gt;&lt;br /&gt;(Yes, there are limited fixes for this in LinkedIn and FaceBook.  None satisfactory.)&lt;br /&gt;&lt;br /&gt;Now, Google+ circles may help here.   Except, as is typical with so many Google apps, they don't scale, in the user interface.&lt;br /&gt;&lt;br /&gt;E.g. I have separate Google+ circles for all companies I used to work for, as well as old schools, etc.  But when you reach more than a dozen circles, they become a pain to deal with.  I currently have 24 circles - that's too many for Google's user interface.  It needs some sort of hierarchy - e.g. a meta-circle of Companies containing several sub-circles of particular coompanies, and so on.&lt;br /&gt;&lt;br /&gt;As is typical with so many Google apps, the user interface doesn't scale.  People repeat, over and over and over again, the mistake of providing a flat set of categories (Google+ circles, Gmail labels), rather than providing structure and nesting.  I can just imagine the "data driven" conversation at Google: "prove that people need more than 8 circles", "prove that people want nested circles and labels".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7823114589573916236?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.readwriteweb.com/archives/google_plus_traffic_drops_1269_gains_erased.php' title='Google Plus Traffic Drops, 1269% Gains Erased'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7823114589573916236/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7823114589573916236' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7823114589573916236'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7823114589573916236'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/10/google-plus-traffic-drops-1269-gains.html' title='Google Plus Traffic Drops, 1269% Gains Erased'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-4231249882633027175</id><published>2011-10-11T10:04:00.000-07:00</published><updated>2011-10-11T10:04:50.905-07:00</updated><title type='text'>Colorado College | Block Plan</title><content type='html'>I learned about the Colorado College Block Plan from my daughter's school counselor. Instead off juggling 8 classes at a time, take one class at a time, intensely.  &lt;br /&gt;&lt;br /&gt;I find this interesting and attractive - this is almost, for example, how I got into computers, via a summer session Concordia University organized for high school students in the Montreal region.&lt;br /&gt;&lt;br /&gt;The school itself has the usual issues with small liberal arts colleges versus universities.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-4231249882633027175?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://coloradocollege.edu/welcome/blockplan/' title='Colorado College | Block Plan'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/4231249882633027175/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=4231249882633027175' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4231249882633027175'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4231249882633027175'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/10/colorado-college-block-plan.html' title='Colorado College | Block Plan'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-8765141336882240015</id><published>2011-10-01T20:34:00.001-07:00</published><updated>2011-10-01T20:34:33.132-07:00</updated><title type='text'>Load-linked/store-conditional (LL/SC)</title><content type='html'>http://semipublic.comp-arch.net/wiki/Load-linked/store-conditional_(LL/SC)&lt;br /&gt;&lt;br /&gt;The load-linked/store-conditional instruction pair provide a [[RISC]] flavor of [[atomic RMW]] [[synchronization]], emphasizing primitives which can be flexibly composed.  It can be viewed as a minimal form of [[transactional memory]].&lt;br /&gt;&lt;br /&gt;Part of the [[RISC philosophy]] was to espouse [[load/store architecture]] - instruction sets that separated load and store memory operations&lt;br /&gt;from computational or ALU operations such as add or increment.  This works fine for single processor operations, but runs into problems&lt;br /&gt;for multiprocessor synchronization.&lt;br /&gt;&lt;br /&gt;== Pre-LL/SC ==&lt;br /&gt;&lt;br /&gt;After years of evolution, prior to the so-called [[RISC revolution]] &lt;br /&gt;multiprocessor synchronization instruction sets were converging on simple [[atomic RMW]] instructions such as&lt;br /&gt;[[compare-and-swap]], [[atomic increment memory]] or [[fetch-and-add]] and other [[fetch-and-op]]s.&lt;br /&gt;Now, these atomic RMWs can be seen as composed of fairly simple primitives, for example:&lt;br /&gt;&lt;br /&gt;    [[locked increment memory]]&lt;br /&gt;         begin-atomic&lt;br /&gt;             tmp := load(MEM)&lt;br /&gt;             tmp := tmp+1&lt;br /&gt;             store( MEM, tmp )&lt;br /&gt;         end-atomic&lt;br /&gt;&lt;br /&gt;However, the implementation of begin-atomic/end-atomic is not necessarily simple.&lt;br /&gt;&lt;br /&gt;The atomicity can be provided in a simple way:&lt;br /&gt;&lt;br /&gt;    [[locked increment memory]]&lt;br /&gt;             tmp := [[load-locked]](MEM)&lt;br /&gt;             tmp := tmp+1&lt;br /&gt;             [[store-unlock]]( MEM, tmp )&lt;br /&gt;        &lt;br /&gt;Where [[load-locked]] may be&lt;br /&gt;* implemented at the memory module&lt;br /&gt;** locking the entire module&lt;br /&gt;** or a limited number of memory locations at the module&lt;br /&gt;** or potentially an arbitrary number of memory locations, using per-location lock-bit [[memory metadata]], e.g. [[stolen from ECC]]&lt;br /&gt;&lt;br /&gt;or&lt;br /&gt;* implemented via a bus lock&lt;br /&gt;&lt;br /&gt;or&lt;br /&gt;* implemented via a cache protocol&lt;br /&gt;** e.g. an [[address lock]]: acquiring ownership of the cache line, and then refusing to respond to [[snoops or probes]] until the [[address lock]] was released&lt;br /&gt;** or, more primitive: acquiring ownership of the cache line, and then acquiring a [[cache lock]], locking the entire cache or a fraction thereof&lt;br /&gt;&lt;br /&gt;Interestingly, implementing locking at the memory module quite possibly came first, since many early multiprocessor systems were not snoopy or bus-based.&lt;br /&gt;&lt;br /&gt;So far, so good: [[load-locked]] and [[store-unlocked]] are somewhat RISCy.&lt;br /&gt;&lt;br /&gt;But they have a problem:  [[load-locked]] and [[store-unlocked]] as separate instructions &lt;br /&gt;raise certain security, performance, and reliability problems in some (but not necessarily all) implementations.&lt;br /&gt;&lt;br /&gt;E.g. what happens if a user uses load-locked to lock the bus, &lt;br /&gt;and never unlocks it?&lt;br /&gt;That *might* be interpreted as giving one user exclusive access to the bus.&lt;br /&gt;&lt;br /&gt;Obviously, one can eliminate these problems by architecture and microarchitecture.&lt;br /&gt;But doing so is a source of complexity.&lt;br /&gt;Many, probably most, implementations have found it easier to package the atomic RMW &lt;br /&gt;so that the primitive operations are not exposed to the user&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;== [[Load-linked/store-conditional]] ==&lt;br /&gt;&lt;br /&gt;The basic idea behind [[LL/SC]] is that, instead of guaranteeing forward progress with a [[load-locked]] that always succeeds,&lt;br /&gt;but which may deadlock,&lt;br /&gt;[[load-linked]] would employ [[optimistic concurrency control]].&lt;br /&gt;Software would assume that it had worked, perform the operation that you want to be atomic,&lt;br /&gt;and then try to commit the operation using [[store-conditional]].&lt;br /&gt;&lt;br /&gt;If the operation was atomic, i.e. if nobody else had accessed the memory location load-link'ed, the store-conditional would proceed.&lt;br /&gt;&lt;br /&gt;But if the operation were non-atomic, if somebody else had accessed the memory location, the store-conditional would fail.&lt;br /&gt;And the user software would be required to handled that failure, e.g. by looping.&lt;br /&gt;&lt;br /&gt;     fetch-and-add:&lt;br /&gt;         L:  oldval := load-linked MEM&lt;br /&gt;             newval := oldval+a&lt;br /&gt;             store-conditional( MEM, newval )&lt;br /&gt;             if SC_failed goto L&lt;br /&gt;             return oldval&lt;br /&gt;&lt;br /&gt;Typically LL/SC are implemented via a snoopy bus protocol:&lt;br /&gt;a memory location is "linked" by putting its address into a snooper.&lt;br /&gt;If another location writes to it, the link is broken so that SC will fail.&lt;br /&gt;&lt;br /&gt;Some implementations did not implement an address snooper - they might fail if there was ANY other activity on the bus.&lt;br /&gt;Needless to say, this does not exhibit good performance on contended systems.&lt;br /&gt;&lt;br /&gt;Non-snoopy implementations of LL/SC are possible. &lt;br /&gt;E.g. implementing them at a memory module is possible. [[TBD]].&lt;br /&gt;&lt;br /&gt;As with more extensive forms of transactional memory, LL/SC has issues with fairness and forward progress.  It is not desirable to loop forever in the LL/SC code.&lt;br /&gt;This is solvable - through much the same mechanisms as one uses to implement a [[locked atomic RMW]] with [[load-locked]].&lt;br /&gt;&lt;br /&gt;One way amounts to converting a [[load-linked]] into a [[load-locked]] after a certain number of failures.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;== Synthesizing Complex RMWs ==&lt;br /&gt;&lt;br /&gt;The nice thing about [[LL/SC]] is that they can be used to implement many different forms of synchronization. E.g. almost any form of [[fetch-and-op]].&lt;br /&gt;Say you want [[floating point fetch-and-add]] .. you can build that with LL/SC. &lt;br /&gt;Whereas if you don't have LL/SC, and just have a fixed repertoire of integer [[atomic RMW]]s, you may just be out of luck.&lt;br /&gt;&lt;br /&gt;This *is* one of the big advantages of RISC: provide primitives, rather than full solutions.&lt;br /&gt;&lt;br /&gt;The problem, of course, is what was mentioned above: "plain" LL/SC may have problems with fairness and forward progress.  Mechanisms that solve these problems&lt;br /&gt;for LL/SC may be more complicated than they would be for non-LL/SC instructions.&lt;br /&gt;&lt;br /&gt;See [[list of possible atomic RMW operations]].&lt;br /&gt;&lt;br /&gt;== [[Remote Atomic RMWs]] ==&lt;br /&gt;&lt;br /&gt;The "most natural" implementation of [[LL/SC]] is to pull the data into registers and/or cache of the processor on which the instructions are executing. By "the instructions" I mean those before the LL, the LL itself, between the LL and SC, the SC itself, and after the SC. &lt;br /&gt;&lt;br /&gt;This is also the most natural implementation for locked implementations of atomic RMWs:&lt;br /&gt;&lt;br /&gt;    [[LOCK INC mem]]&lt;br /&gt;         tmp := [[load-locked]] M[addr]&lt;br /&gt;         dstreg := tmp+1&lt;br /&gt;         [[store-unlock]] M[addr] := dstreg&lt;br /&gt;&lt;br /&gt;I call this the "local implementation" of the atomic RMW.&lt;br /&gt;&lt;br /&gt;However, many systems have found it useful to create "remote implementations" of the atomic RMW.&lt;br /&gt;&lt;br /&gt;For example, special [[bus or interconnect]] transactions could be created that indicate that the memory controller should do the atomic RMW operation itself.&lt;br /&gt;&lt;br /&gt;For example, the [[NYU Ultracomputer]] allowed many [[fetch-and-add]] operations to be exported.  They could be performed at the ultimate destination, the memory controller.  But they could also be handled specially at intermediate nodes in the interconnection fabric, where conflicting atomic RMWs to the same location could be combined, forwarding only their net effect on towards the destination.&lt;br /&gt;&lt;br /&gt;You can imagine such [[remote atomic RMW]]s as being performed by&lt;br /&gt;&lt;br /&gt;    [[LOCK INC mem]]&lt;br /&gt;         send the command "fetch-and-add M[addr],1" to the outside system&lt;br /&gt;&lt;br /&gt;This is the advantage of CISCy atomic RMW instructions: they can hide the implementation.  &lt;br /&gt;If the interconnect fabric supports only [[fetch-and-add]] but not [[fetch-and-OR]], then the two respective [[microoperation expansions]] might be:&lt;br /&gt;&lt;br /&gt;    [[LOCK INC mem]]&lt;br /&gt;         send the command "fetch-and-add M[addr],1" to the outside system&lt;br /&gt;&lt;br /&gt;    [[LOCK FETCH-AND-OR mem]]&lt;br /&gt;         oldval := [[load-locked]] M[addr]&lt;br /&gt;         newtmp := oldval OR src1&lt;br /&gt;         [[store-unlock]] M[addr] := newtmp&lt;br /&gt;&lt;br /&gt;For that matter, the CISC implementation migh well use [[LL/SC]] optimistic concurrency control within its [[microflow]].&lt;br /&gt;&lt;br /&gt;It is more work to create such a [[remote atomic RMW]] with [[LL/SC]], since the very strength of LL/SC&lt;br /&gt;- that the user can place almost arbitrarily complicated instruction sequences between the [[LL]] and [[SC]]&lt;br /&gt;is also its weakness.&lt;br /&gt;If you wanted to create a [[remote atomic RMW]] implementation that supported just, say, [[fetch-and-add]],&lt;br /&gt;then you would have to create an [[idiom recognizer]] that recognized &lt;br /&gt;code sequences such as:&lt;br /&gt;&lt;br /&gt;     fetch-and-add:&lt;br /&gt;         L:  oldval := load-linked MEM&lt;br /&gt;             newval := oldval+a&lt;br /&gt;             store-conditional( MEM, newval )&lt;br /&gt;             if SC_failed goto L&lt;br /&gt;             return oldval&lt;br /&gt;&lt;br /&gt;Actually, you might not have to recognize the full sequence, complete with the retry loop.&lt;br /&gt;You would only need to recognize the sequence&lt;br /&gt;&lt;br /&gt;             oldval := load-linked MEM&lt;br /&gt;             newval := oldval+a&lt;br /&gt;             store-conditional( MEM, newval )&lt;br /&gt;&lt;br /&gt;and convert that into whatever bus operations create the [[remote atomic RMW]].&lt;br /&gt;&lt;br /&gt;Recognizing a 3-instruction idiom is not THAT hard.  &lt;br /&gt;But in general [[it is harder to combine separate instructions that to break up complex instructions in an instruction decoder]].&lt;br /&gt;&lt;br /&gt;One might even imagine creating a fairly complete set of instructions executable on the interconnect.&lt;br /&gt;&lt;br /&gt;=== The best of both worlds? ===&lt;br /&gt;&lt;br /&gt;Let me end this section by describing a hybrid implementation of the CISC atomic RMWs that combines the best of both worlds wrt [[local and remote atomic RMWs]].&lt;br /&gt;&lt;br /&gt;    [[LOCK FETCH-AND-OP mem]]&lt;br /&gt;         oldval := [[load-locked/fetch-and-op]] M[addr]&lt;br /&gt;         newtmp := oldval OP src1&lt;br /&gt;         [[store-unlock/fetch-and-op]] M[addr] := newtmp&lt;br /&gt;         return oldval&lt;br /&gt;&lt;br /&gt;If M[addr] is exclusive in the locak cache, then the [[load-locked/fetch-and-op]] and [[store-unlock/fetch-and-op]]&lt;br /&gt;are equivalent to ordinary [[load-locked]] and [[store-unlock]].&lt;br /&gt;&lt;br /&gt;If M[addr]] is not present in the local cache, then [[load-locked/fetch-and-op]] is equivalent to sending a remote atomic RMW fetch-and-op command to the external network. The [[store-unlock/fetch-and-op]] may then become a NOP (although it may be used for bookkeeping purposes).&lt;br /&gt;&lt;br /&gt;If M[addr] is present in the local cache in a shared state, then we *COULD* perform the atomic RMW both locally and remotely. The remote version might invalidate or update other copies.  However, if the operation is supposed to be [[serializing]], then the local update cannot proceed without coordinating with the remote.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;== Extending the semantics ==&lt;br /&gt;&lt;br /&gt;[[PAC]]:&lt;br /&gt;&lt;br /&gt;(Paul Clayton added this.)&lt;br /&gt;&lt;br /&gt;Because existing LL/SC definitions either fail or have [[undefined semantics]] if other memory operations are performed, the semantics can be safely extended without sacrificing [[compatibility]].  (While it may be possible in some architectures ([[TBD]]: check whether this is the case for Alpha, MIPS, ARM, Power, etc.) that a non-SC memory operation was used by software to cancel an atomic section, such is generally discouraged and unlikely to have been done.)  Such extensibility could allow increasingly more extensive forms of transactional memory to be implemented without requiring additional instructions.  While an implementation of an older architecture would not generate an illegal instruction, an architecture which guaranteed failure on additional memory accesses would guarantee either deadlock (which would simplify debugging and fixing) or the use of an already in place software fallback (which, while slower, would maintain correctness).  In architectures which use a full-sized register to return the success condition and define a single success condition, that register could be used to communicate failure information.&lt;br /&gt;&lt;br /&gt;A natural progression for such extensions might be: to initially constrain the transaction to an aligned block of 32 or 64 bytes or the implementation-defined unit of coherence (This last could cause a significant performance incompatibility but would maintain the semantics.), perhaps with a single write, then to expand the semantics to something like those presented in Cliff Click's "IWannaBit!" where any cache miss or eviction cancels the reservation (For many uses, this would require that the cache be warmed up before the transaction can succeed.  If the reservation mechanism is expensive, prefetching data outside of the atomic section might be preferred over retrying the transaction.), perhaps extending writes to an entire cache line, and then to an arbitrary number of memory accesses.  &lt;br /&gt;&lt;br /&gt;Providing non-transactional memory accesses within the atomic section would be more difficult.  It would be possible to define register numbers which when used as base pointers for memory accesses have non-transactional semantics (See [[Semantic register numbers]]).  This definition would have to be made at the first extension of LL/SC and it might be difficult to establish compiler support.&lt;br /&gt;&lt;br /&gt;A non-transactional extension of LL/SC would be to guarantee the success of the store-conditional under certain limited conditions.  E.g., a simple register-register operation might be guaranteed to succeed, providing the semantics of most RMW instructions.  Such a guaranteed transaction could itself be extended to allow more complex operations, though it might be desirable to allow a transaction that can be guaranteed to fail if lock semantics are not desired under contention.  E.g., a thread might work on other tasks rather than wait for the completion of a heavily contended locked operation.&lt;br /&gt;&lt;br /&gt;== See Also ==&lt;br /&gt;&lt;br /&gt;* [[locked versus atomic]]&lt;br /&gt;* [[Returning the old versus new value: fetch-and-add versus add-to-mem]]&lt;br /&gt;* [[atomic RMW spinloops and CISCy instructions]]&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-8765141336882240015?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Load-linked/store-conditional_(LL/SC)' title='Load-linked/store-conditional (LL/SC)'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/8765141336882240015/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=8765141336882240015' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8765141336882240015'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8765141336882240015'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/10/load-linkedstore-conditional-llsc.html' title='Load-linked/store-conditional (LL/SC)'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-6222383745372330374</id><published>2011-09-28T18:06:00.001-07:00</published><updated>2011-09-28T18:06:47.115-07:00</updated><title type='text'>Xfinity failing every afternoon 2:30-6pm approx; comcast DNS problems fixed by google﻿</title><content type='html'>Xfinity failing every afternoon 2:30-6pm approx; comcast DNS problems fixed by google﻿&lt;br /&gt;&lt;br /&gt;My comcast internet cvonnection was failing regularly, every weekday, starting at approximately 2:30pm, lasting until 6 or 7pm.  This started happening shortly after labor day.&lt;br /&gt;&lt;br /&gt;I conjecture that this was probably happening because schoolkids were getting off school (which in my area happens circa 2:30pm) and starting to use the net.&lt;br /&gt;&lt;br /&gt;The failures were accompanied by error meessages of the form "DNS error... DNS lookup failure... Error 105 net:ERR_NAME_NOT_RESOLVED".  That error message was from a Windows 7 machine.  I also regularly have a Vista machine and an Apple Mac connected to my home network.  The Vista machine occasionally worked when the Windows 7 machine did not, and the Apple Mac more frequently worked; but all regularly failed about the same time in the afternoon.   Yes, I tried powering off/disconnecting/connecting only 1 machine at a time.  Not a cure.&lt;br /&gt;&lt;br /&gt;Coincidence?  I think not.&lt;br /&gt;&lt;br /&gt;I reported the problem to Comcast, who very quickly sent over a "cable guy" who apologized that my 7 year old Linksys cable modem had not been replaced.  So now I have a new SMC cable modem, and the net performs much better, up to 10X faster than in earlier speed tests.  This is great!&lt;br /&gt;&lt;br /&gt;But, it did not fix the DNS problem.&lt;br /&gt;&lt;br /&gt;My Windows 7 machine (actually, the Windows 7 laptop my employer gave me) still got DNS errors between roughly 2:30 and 7pm.&lt;br /&gt;&lt;br /&gt;My Vista machine sometiimes ran when the Windows 7 machine failed.  &lt;br /&gt;&lt;br /&gt;And the Apple Mac seemed nearly always to work, failing only occasionally.&lt;br /&gt;&lt;br /&gt;But the Windows 7 machine reliably failed between 2:30 and 7pm.&lt;br /&gt;&lt;br /&gt;OK, so I get more serious.  I check the DNS settings.  Finally, I do what I should have done in the first place: I switch the Windows 7 machine DNS server frm "automatic", i.e. from comcast's DNS servers, to Google's DNS servers, 8.8.8.8 and 8.8.4.4.  &lt;br /&gt;&lt;br /&gt;Fixed!  :-)&lt;br /&gt;&lt;br /&gt;Conclusion: Comcast DNS servers have problems.  Particularly when the kids get home from school.&lt;br /&gt;&lt;br /&gt;But I'm happy that I got a new, faster, cable modem out of this.   And it is interesting that different machines with their different OSes behaved differently. Different timeouts (DNS settings that don't appear at the usual configuration places)?  BTW, the machine that failed most often is the fastest machine by far.&lt;br /&gt;&lt;br /&gt;===&lt;br /&gt;&lt;br /&gt;BTW, I did try fixing DNS earlier, to no avail.  OpenDNS did not seem to work.  But I wasn't that serious, so I did not try all possibilities.  It is possible that both the cable modem needed to be upgraded, AND the DNS needed to point away from comcast's DNS servers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-6222383745372330374?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/6222383745372330374/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=6222383745372330374' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6222383745372330374'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6222383745372330374'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/09/xfinity-failing-every-afternoon-230-6pm.html' title='Xfinity failing every afternoon 2:30-6pm approx; comcast DNS problems fixed by google﻿'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-896309664647001325</id><published>2011-09-17T13:13:00.000-07:00</published><updated>2011-09-17T13:13:38.122-07:00</updated><title type='text'>Pipelined Bloom Filter</title><content type='html'>http://semipublic.comp-arch.net/wiki/Bloom_Filters#Pipelined_Bloom_Filter&lt;br /&gt;&lt;br /&gt;=  [[Pipelined Bloom Filter]] =&lt;br /&gt;&lt;br /&gt;Occasionally one wants to keep track of what (micro)instructions are in a pipeline.&lt;br /&gt;For example, an aggressively speculative x86 may need to&lt;br /&gt;[[snoop the pipeline]]&lt;br /&gt;to detect [[self-modifying code]] that writes to instructions&lt;br /&gt;that are already in the process of being executed.&lt;br /&gt;&lt;br /&gt;A brute force technique might be to [[CAM]] all of the instructions in the pipeline.&lt;br /&gt;Or to mark I$ lines or even ITLB pages when instructions from them are fetched for execution.&lt;br /&gt;These can be expensive and/or complex.&lt;br /&gt;&lt;br /&gt;This suggests a Bloom filter implementation:&lt;br /&gt;set bit(s) in a bitvector according to a hash of the physical address of instructions&lt;br /&gt;as the enter the pipeline.&lt;br /&gt;&lt;br /&gt;unfortunately, a straightforward Bloom filter only ever sets bits.&lt;br /&gt;It only ever accumulates. Eventually this will reduce in false conflicts being detected.&lt;br /&gt;&lt;br /&gt;The basic idea behind a [[pipelined Bloom filter]] is to have a set of Bloom filters.&lt;br /&gt;E.g. 2 1024 bit Bloom filters.&lt;br /&gt;A new Bloom filter is (re)allocated by clearing it every 128 instructions.&lt;br /&gt;It accumulates bits as instructions as executed for the next 128 instructions.&lt;br /&gt;It is checked, but new instructions are not allocated into it, for the next 128 instructions.&lt;br /&gt;It is then reallocated, and cleared.&lt;br /&gt;Two such filters therefore can ping pong, providing coverage to the pipeline.&lt;br /&gt;&lt;br /&gt;== Generalized ==&lt;br /&gt;&lt;br /&gt;A set of K filters, used to protect a pipeline.&lt;br /&gt;&lt;br /&gt;Use filter j for a specified period of time&lt;br /&gt;- e.g. L instructions, or perhaps until B bits are set,&lt;br /&gt;or C entries in a counted Bloom filter are saturated.&lt;br /&gt;Switch to the next.&lt;br /&gt;During this period one sets bits in the filter as new items are placed in the pipeline.&lt;br /&gt;&lt;br /&gt;After this "allocation" period, continue to use the filter for checking, without writing new entries.&lt;br /&gt;&lt;br /&gt;Deallocate the filter when it is guaranteed that no items inserted into the filter remain in the system, i.e. the pipeline.&lt;br /&gt;&lt;br /&gt;== Other Terminology ==&lt;br /&gt;&lt;br /&gt;It appears that there are some papers that use the term "pipelined Bloom filter"&lt;br /&gt;to refer solely to a physical or electronic pipeline, and not the the logical structure&lt;br /&gt;of overlapping Bloom filters that can be deallocated.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-896309664647001325?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Bloom_Filters#Pipelined_Bloom_Filter' title='Pipelined Bloom Filter'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/896309664647001325/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=896309664647001325' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/896309664647001325'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/896309664647001325'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/09/pipelined-bloom-filter.html' title='Pipelined Bloom Filter'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-5456968461369002994</id><published>2011-09-10T10:36:00.000-07:00</published><updated>2011-09-10T10:36:00.492-07:00</updated><title type='text'>Google Desktop (Search) Dying</title><content type='html'>Google is end-of-life-ing several products, including Google Desktop, in particular Google Desktop Search, which I use constantly, and Google Notebook and Sidewiki, which I would like to use, but which, fortunately, I decided not to become dependent on.&lt;br /&gt;&lt;br /&gt;Google says &lt;br /&gt;&lt;br /&gt;&lt;UL&gt;Desktop: In the last few years, there’s been a huge shift from local to cloud-based storage and computing, as well as the integration of search and gadget functionality into most modern operating systems. People now have instant access to their data, whether online or offline. As this was the goal of Google Desktop, the product will be discontinued on September 14, including all the associated APIs, services, plugins, gadgets and support.&lt;/UL&gt;&lt;br /&gt;Fair enough.  I *am* trying to keep most of my information in the cloud.  But unfortunately I do not have full connectivity - large areas of rural Oregon still do not have cell phone connectivity, voice, let alone data, and, for that matter, I am too cheap to pay for data connectivity at all times.  Plus, some things I am not allowed to keep in the cloud, and &lt;br /&gt;&lt;br /&gt;So it looks like I will have to use Microsoft Desktop Search, as built in to Windows 7, for stuff on my laptop.  I prefer the Google user interface - but since Google Desktop Search will be discontinued, I don't have much choice.  And because I am using Microsoft Desktop Search, I have started using Bing for my web search.   Here, I still prefer the Google user interface, which I am familiar with - but Bing's user interface is more similar to MS Desktop Search, and I value consistency between Desktop and Web more than I value the more familiar Google Search interface.&lt;br /&gt;&lt;br /&gt;I suspect that Microsoft Desktop Search cannot search my Google Chrome browser history.  Just like Google Desktop Search could not search my Microsoft OneNote.&lt;br /&gt;&lt;br /&gt;Unfortunately, this begins to look like dominoes falling.  So long as I have substantial local data on my Microsoft PC, I am attracted, despite my personal preferences, to Microsoft tools like Microsoft Desktop Search, Bing, and Internet Explorer.&lt;br /&gt;&lt;br /&gt;Not everyone will be in this situation.  Some people will be purely cloud based.  But, unfortunately, I am not. Yet?&lt;br /&gt;&lt;br /&gt;I wonder if this will affect my next choice of cell phone?  I was leaning towards Android, but perhaps this is enough to tilt me towards Windows 8.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-5456968461369002994?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://googleblog.blogspot.com/2011/09/fall-spring-clean.html' title='Google Desktop (Search) Dying'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/5456968461369002994/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=5456968461369002994' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/5456968461369002994'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/5456968461369002994'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/09/google-desktop-search-dying.html' title='Google Desktop (Search) Dying'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-8432657207182052480</id><published>2011-09-05T16:22:00.001-07:00</published><updated>2011-09-05T16:22:37.429-07:00</updated><title type='text'>Bit Interleaving, Instruction Set Support for</title><content type='html'>[[Category:ISA]]&lt;br /&gt;&lt;br /&gt;http://semipublic.comp-arch.net/wiki/Bit_Interleaving,_Instruction_Set_Support_for&lt;br /&gt;= Instruction Set Support for Bit Interleaving =&lt;br /&gt;&lt;br /&gt;== What is Bit Interleaving? ==&lt;br /&gt;&lt;br /&gt;[[Bit interleaving]] is a very common form of multi-operand bit permutation.&lt;br /&gt;It has been important enough to occasionally warrant special instruction set support.&lt;br /&gt;&lt;br /&gt;2-way bit interleaving of A and B;&lt;br /&gt;&lt;br /&gt; for i = 0 to N do&lt;br /&gt;   result.bit[2*i] := A.bit[i]&lt;br /&gt;   result.bit[2*i+1] := B.bit[i]&lt;br /&gt;&lt;br /&gt;3-way bit interleaving off A, B, C:&lt;br /&gt;&lt;br /&gt; for i = 0 to N do&lt;br /&gt;   result.bit[3*i] := A.bit[i]&lt;br /&gt;   result.bit[3*i+1] := B.bit[i]&lt;br /&gt;   result.bit[3*i+2] := C.bit[i]&lt;br /&gt;&lt;br /&gt;M-way bit interleaving of A[i]&lt;br /&gt;&lt;br /&gt; for i = 0 to N do&lt;br /&gt;   for j = 0 to M do &lt;br /&gt;     result.bit[N*i+j] := A[j].bit[i]&lt;br /&gt;&lt;br /&gt;Note that since bit interleaving combines multiple operands into a single operand, issues of overflow may arise.&lt;br /&gt;For example, the result is often used in address arithmetic,&lt;br /&gt;in which case the inputs (aka the coordinates) &lt;br /&gt;are probably not full width addresses.&lt;br /&gt;The bit interleaving may be used in only a subrange of  the input bits.&lt;br /&gt;&lt;br /&gt;== Why Bit Interleaving? ==&lt;br /&gt;&lt;br /&gt;See [[http://en.wikipedia.org/wiki/Morton_number_(number_theory) Wikipedia Morton numbers]].&lt;br /&gt;&lt;br /&gt;Bit interleaving is often used as a component of hash functions.&lt;br /&gt;&lt;br /&gt;In particular, bit interleaving of 2D and 3D coordinates in computer graphics&lt;br /&gt;often produces hash functions that have better cache locality than uninterleaved coordinates used to index an array.&lt;br /&gt;&lt;br /&gt;== Bit Interleaving Instructions ==&lt;br /&gt;&lt;br /&gt;Intel Larrabee has 1:1 and 2:1 interleave instructions, in both scalar and vector form:&lt;br /&gt;&lt;br /&gt;  vbitinterleave11pi: 1:1 bit-interleave vectors&lt;br /&gt;  vbitinterleave21pi: 2:1 bit-interleave vectors&lt;br /&gt;&lt;br /&gt;  bitinterleave11: 1:1 bit-interleave scalar&lt;br /&gt;  bitinterleave21: 2:1 bit-interleave scalar&lt;br /&gt;&lt;br /&gt;(See [[http://drdobbs.com/architecture-and-design/216402188?pgno=5 Mike Abrash's Dr. Dobb's article]] which comments that&lt;br /&gt;"is useful for generating swizzled addresses, particularly in conjunction with vinsertfield, for example in preparation for texture sample fetches (volume textures in the case of vbitinterleave21pi".)&lt;br /&gt;&lt;br /&gt;The 1:1 interleaving instruction obviously accomplishes 2D   interleaving, &lt;br /&gt;i.e. the interleaving of two coordinates for a 2D system.&lt;br /&gt;&lt;br /&gt;The 2:1 interleaving instruction accoplishes 3D interleaving.&lt;br /&gt;Or, rather, first one interleaves two coordinates, say X and Y, using 1:1,&lt;br /&gt;and then one interleaves the third coordinate, Z, using 2:1.&lt;br /&gt;&lt;br /&gt;4:1 interleaving is accomplished by two 1:1 interleaves.&lt;br /&gt;However, operations on objects and spaces of more than 1D, 2D, or 3D are much less common.&lt;br /&gt;&lt;br /&gt;== Expand (Interleave with 0) ==&lt;br /&gt;&lt;br /&gt;Although 1:1 and 2:1 interleaving "fit" into commion instruction sets, &lt;br /&gt;higher degrees of interleaving may not - too many operands may be required.&lt;br /&gt;&lt;br /&gt;Some support may be afforded by bit expansion,&lt;br /&gt;or interleaving with 0.&lt;br /&gt;This takes 2 (or 3) inputs:&lt;br /&gt;* the bitvector to which 0s will be inserted between elements&lt;br /&gt;* the number of 0 bits to be inserted between elements&lt;br /&gt;* possibly an offset to cause the result to be aligned appropriately for the last step.&lt;br /&gt;&lt;br /&gt;The last step would be ORing the results together.&lt;br /&gt;(Or equivalently, ANDing).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-8432657207182052480?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Bit_Interleaving,_Instruction_Set_Support_for' title='Bit Interleaving, Instruction Set Support for'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/8432657207182052480/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=8432657207182052480' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8432657207182052480'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8432657207182052480'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/09/bit-interleaving-instruction-set.html' title='Bit Interleaving, Instruction Set Support for'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-389663627174261254</id><published>2011-08-28T19:54:00.000-07:00</published><updated>2011-08-28T19:54:33.832-07:00</updated><title type='text'>comp-arch.net mal-hacked - cloned, maybe broken into.</title><content type='html'>&lt;br /&gt;&lt;br /&gt;Googling "glew FP16" today, I found my own website, comp-arch.net.&lt;br /&gt;&lt;br /&gt;But I also found another website, waboba.info, that seems to be a clone of comp-arch.net.  &lt;br /&gt;&lt;br /&gt;This is probably a malware site, probably directing people to attack code, or at least trying to promote search engine scores.&lt;br /&gt;&lt;br /&gt;It is certainly unauthorized. I.e. nobody asked me about setting it up/&lt;br /&gt;&lt;br /&gt;It is probably a violation of the very loose copyright and licensing setup for comp-arch.net - a derivative of Creative Commons.&lt;br /&gt;&lt;br /&gt;In a way, it is flattering that compo-arch.net might be considered worth cloning.  But then again, I imagine that this sort of thing is automated by the bad guys.&lt;br /&gt;&lt;br /&gt;If you ever use comp-arch.net stuff, beware of waboba.info, and other possible clones.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;&lt;br /&gt;If anyone here has advice on steps to take, Id appreciate it.&lt;br /&gt;&lt;br /&gt;===&lt;br /&gt;&lt;br /&gt;Of more concern is the fact that my copyright notice,&lt;br /&gt;https://semipublic.comp-arch.net/wiki/index.php?title=CompArch:Copyrights_and_Other_Intellectual_Property_Rights,&lt;br /&gt;seems to have disappeared.&lt;br /&gt;&lt;br /&gt;Now, this may have happened for innocuous reasons - such as an automatic update of the underlying mediawiki version.&lt;br /&gt;&lt;br /&gt;But, it may also indicate that my site has been hacked.&lt;br /&gt;If only to legitimize the unauthorized cloning described above.&lt;br /&gt;&lt;br /&gt;I will investigate 0 bt that may take a while.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;The real problem here is that I spend very little time working on this site, and I want to spend that time writing content, not administering Mediawiki.&lt;br /&gt;&lt;br /&gt;I may need to give in and use a hosted wiki.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-389663627174261254?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Admin:Wiki_Admin_Log#waboba.info_unauthorized_clone_-_probable_malwebsite' title='comp-arch.net mal-hacked - cloned, maybe broken into.'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/389663627174261254/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=389663627174261254' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/389663627174261254'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/389663627174261254'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/08/comp-archnet-mal-hacked-cloned-maybe.html' title='comp-arch.net mal-hacked - cloned, maybe broken into.'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-604249401765574985</id><published>2011-08-13T17:29:00.000-07:00</published><updated>2011-08-13T17:29:14.941-07:00</updated><title type='text'>Register_File_Port_Reduction_using_Time-based_SIMD</title><content type='html'>http://semipublic.comp-arch.net/wiki/Register_File_Port_Reduction_using_Time-based_SIMD&lt;br /&gt;&lt;br /&gt;= Background: 4-cycle GPU SIMD instruction groups =&lt;br /&gt;&lt;br /&gt;The most popular GPUs circa 2010 - Nvidia, AMD/ATI, Intel Gen - share the followng characteristics:&lt;br /&gt;* they have [[SIMD or SIMT]] [[coherent threading]]&lt;br /&gt;* the SIMD is 16-way spatially: &lt;br /&gt;** i.e. each SIMD engine is circa 16 [[lanes]] wide&lt;br /&gt;** although the definition of the lane varies, e.g. from 32 bit wide scalar for Nvidia to 5-way VLIW for AD/ATI&lt;br /&gt;* the SIMD is 4 way temporally&lt;br /&gt;* i.e. every [[SIMD instruction group]], [[wavefront]] or [[warp]], occupies the 16 lanes for 4 cycles.&lt;br /&gt;&lt;br /&gt;This wiki page discusses the temporal aspect,&lt;br /&gt;at least one specific advantage of taking at least 4 cycles peer [[SIMD instruction group]].&lt;br /&gt;This is a particular case of &lt;br /&gt;[[register file port reduction]].&lt;br /&gt;&lt;br /&gt;= [[Register File Port Reduction using Time-based SIMD]] =&lt;br /&gt;&lt;br /&gt;For simplicity, let us ignore the possibility of [[spatial SIMD]].&lt;br /&gt;Let us assume that we have only one ALU,&lt;br /&gt;taking 2 inputs peer cycle and producing a single output per cycle.&lt;br /&gt;(Or possibly 3 inputs, if we want to consider [[multiply-add]].)&lt;br /&gt;(Or more different combinations of inputs and outputs, if we are particularly aggressive.)&lt;br /&gt;&lt;br /&gt;I.e.&lt;br /&gt;    dest := opcode( src, src2, src3 )&lt;br /&gt;&lt;br /&gt;On a conventional scalar microarchitecture, this would require 3 read ports and 1 write port on the register file per cycle.&lt;br /&gt;(Let's forget the possibility of architecturally requiring the registers to belong to banks with non-interfering ports.)&lt;br /&gt;&lt;br /&gt;Now, instead, let us imagine that we are dealing with a SIMD instruction group or wavefront, i=0,3.&lt;br /&gt;And let us say that it is distributed over time, 4 successive clock cycles t0..t3&lt;br /&gt;&lt;br /&gt;I.e.&lt;br /&gt;    t0: dest[0] := opcode( src[0], src2[0], src3[0] )&lt;br /&gt;    t1: dest[1] := opcode( src[1], src2[1], src3[1] )&lt;br /&gt;    t2: dest[2] := opcode( src[2], src2[2], src3[2] )&lt;br /&gt;    t3: dest[3] := opcode( src[3], src2[3], src3[3] )&lt;br /&gt;&lt;br /&gt;Now, instead of reading 3 separate (e.g. 32) values per cycle, and writing a single such value every cycle,&lt;br /&gt;we could make each separate access 4X larger, but only do one such 4X larger access every cycle:&lt;br /&gt;&lt;br /&gt;I.e.&lt;br /&gt;    t-3: src3_4x := RF.read( src3 )[0:127] &lt;br /&gt;    t-2: src2_4x := RF.read( src2 )[0:127] &lt;br /&gt;    t-1: src1_4x := RF.read( src1 )[0:127] &lt;br /&gt;    t0: dest_4x[0:31] := opcode( src1_4x[0:31], src2_4x[0:31], src3_4x[0:31] )&lt;br /&gt;    t1: dest_4x[32:63] := opcode( src1_4x[32:63], src2_4x[32:63], src3_4x[32:63] )&lt;br /&gt;    t2: dest_4x[64:95] := opcode( src1_4x[64:95], src2_4x[64:95], src3_4x[64:95] )&lt;br /&gt;    t3: dest_4x[96:127] := opcode( src1_4x[96:127], src2_4x[96:127], src3_4x[96:127] )&lt;br /&gt;    t3+1: RF.write( dest )[0:127] := dest_4x[0:127]&lt;br /&gt;&lt;br /&gt;(As is typical in these discussions, we are constrained by the lack of universally understood [[slicing notation]])&lt;br /&gt;&lt;br /&gt;This shows that you can get away with a single 4X wider read/write port, &lt;br /&gt;at the cost of muxing/delay elements:&lt;br /&gt;&lt;br /&gt;The pipeline might look like&lt;br /&gt;&lt;br /&gt;    W0.write   W1.exec_0&lt;br /&gt;    W2.read1*  W1.exec_1&lt;br /&gt;    W2.read2*  W1.exec_2&lt;br /&gt;    W2.read3*  W1.exec_3&lt;br /&gt;    W1.write   W2.exec_0*&lt;br /&gt;    W3.read1   W2.exec_1*&lt;br /&gt;    W3.read2   W2.exec_2*&lt;br /&gt;    W3.read3   W2.exec_3*&lt;br /&gt;    W2.write*  W3.exec_0&lt;br /&gt;    ...        ...&lt;br /&gt;               &lt;br /&gt;&lt;br /&gt;although of course it can be extended to be deeper, tolerating more ALU latency, etc.&lt;br /&gt;&lt;br /&gt;= 4X Wider versus 4X Time-skewed =&lt;br /&gt;&lt;br /&gt;This register file port reduction can be obtained in at least two ways:&lt;br /&gt;&lt;br /&gt;* 4x wider&lt;br /&gt;** by performing 4X wider reads and writes&lt;br /&gt;** in a single "cycle" of RF access&lt;br /&gt;** to and from 4X wider temporary registers&lt;br /&gt;** and then muxing 1/4 of those wider registers in any given cycle of execution&lt;br /&gt;&lt;br /&gt;* 4x time skewing&lt;br /&gt;** by having 4 register files&lt;br /&gt;** each skewed from the others by 1 cycle&lt;br /&gt;** e.g. providing the same register number to each, but delayed 1 cycle for each skewing&lt;br /&gt;&lt;br /&gt;These are very similar.&lt;br /&gt;&lt;br /&gt;The 4X wider approach has the advantage of being very easy to express in conventional synthesized logic,&lt;br /&gt;even though it might be marginally more expensive in full custom logic.&lt;br /&gt;As of 2011 time skewed register files are hard to express in RTL languages.&lt;br /&gt;&lt;br /&gt;= Why 4X? =&lt;br /&gt;&lt;br /&gt;4X is a convenient power of two,&lt;br /&gt;and [[powers of 2 are convenient in computer architecture]].&lt;br /&gt;&lt;br /&gt;4X matches 3 reads and 1 write&lt;br /&gt;for a register file with a single read and write port in any cycle.&lt;br /&gt;Which conveniently matches multiply add, A:=B*C+D, one of the most common operations in graphics&lt;br /&gt;- and GPUs were one of the first places this occurred.&lt;br /&gt;&lt;br /&gt;3X could be a possibility, if restricted to [[2-input operations]].&lt;br /&gt;But I suspect that 4X is just plain nicer.&lt;br /&gt;&lt;br /&gt;Larger than 4X also a possibility.  But, again, 4X is the smallest convenient size&lt;br /&gt;that reaps most of the benefits.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-604249401765574985?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Register_File_Port_Reduction_using_Time-based_SIMD' title='Register_File_Port_Reduction_using_Time-based_SIMD'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/604249401765574985/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=604249401765574985' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/604249401765574985'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/604249401765574985'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/08/registerfileportreductionusingtime.html' title='Register_File_Port_Reduction_using_Time-based_SIMD'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-2653490716099111665</id><published>2011-07-14T07:57:00.001-07:00</published><updated>2011-07-14T07:57:34.322-07:00</updated><title type='text'>Things you can do with a double precision multiplier</title><content type='html'>Consider a processor that has hardware sufficient to do a double precision multiplication.&lt;br /&gt;Or perhaps a multiply-add.&lt;br /&gt;&lt;br /&gt;([[WLOG]] we will talk about floating point; similar discussion applies to 2X wide integer.)&lt;br /&gt;&lt;br /&gt;A double precision multiplier overall has a multiplier capable of forming 4 single precision products.&lt;br /&gt;Let us draw something like this, except using byte wide multipliers as subcomponents rather than single bit:&lt;br /&gt;Compare the partial products array for 64x64 to 32x32:&lt;br /&gt;        XXXX/XXXX&lt;br /&gt;       X  X/X  X&lt;br /&gt;      X  X/X  X&lt;br /&gt;     XXXX/XXXX&lt;br /&gt;    ----+-----&lt;br /&gt;    XXXX/XXXX&lt;br /&gt;   X  X/X  X&lt;br /&gt;  X  X/X  X&lt;br /&gt; XXXX/XXXX&lt;br /&gt;&lt;br /&gt;or briefly if the numbers are (Ahi+Blo)*(Xhi+Ylo)&lt;br /&gt;   AY BY&lt;br /&gt;  AX BX&lt;br /&gt;&lt;br /&gt;the summation network is similarly larger, although the final [[CPA (Carry Propagate Adder)]] is "only" 2X wider&lt;br /&gt;(more than 2X more gates, but only a bit deeper in logic depth).&lt;br /&gt;&lt;br /&gt;Given such a double precision multiplier, we can synthesize several different types of single precision operations&lt;br /&gt;&lt;br /&gt;* [[LINE]]: v=p*u+q&lt;br /&gt;:: this is just an [[FMA]], with possibly a different arrangement of inputs&lt;br /&gt;* [[PLANE]]: w=p*u+q*v+r&lt;br /&gt;:: this has 2 multiplications, although the sum network must be adjusted to align the products differently. This can be achieved by shifting the input to the upper half of the multiplier array&lt;br /&gt;* [[LRP]] or [[BLEND]]: w=u*x+v*(1-x)&lt;br /&gt;:: This is like [[PLANE]], except the second multiplier part is calculated. Like 2X, etc. products for advanced [[Booth encoding]]?&lt;br /&gt;&lt;br /&gt;The above uses the 4 multiplications of the double precision multiplier,&lt;br /&gt;but only uses 2 of them.&lt;br /&gt;We can be more aggressive, trying to use all 4 - but then the summation network needs considerable adjustment.&lt;br /&gt;&lt;br /&gt;An arbitrary 2D outer product:&lt;br /&gt; [[OUTER2]] = (a b) X (x y) =&lt;br /&gt;     ax ay&lt;br /&gt;     bx by&lt;br /&gt;although this causes some difficulties because it needs to write back an output twice as wide as its inputs.&lt;br /&gt;&lt;br /&gt;[[CMUL (Complex multiply)]]: can be achieved using this multiplier: (a+bi) X (x+yi) = ax-by  + (ay+bx)i&lt;br /&gt;although once again there are difficulties with alignment in the summation network.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-2653490716099111665?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Things_you_can_do_with_a_double_precision_multiplier' title='Things you can do with a double precision multiplier'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/2653490716099111665/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=2653490716099111665' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/2653490716099111665'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/2653490716099111665'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/07/things-you-can-do-with-double-precision.html' title='Things you can do with a double precision multiplier'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-8828210847045254848</id><published>2011-07-09T15:38:00.000-07:00</published><updated>2011-07-09T15:39:55.405-07:00</updated><title type='text'>Paradox and Consistency</title><content type='html'>I am fascinated by paradox.  I am interested in the possibility of logical systems which are not consistent but where the inconsistency is somehow limited, so that it does not propagate and pollute the whole system.  I am unhappy with systems where, given any inconsistency, any theorem can be proven.&lt;br /&gt;&lt;br /&gt;I suspect that there may be much existing work in this area.  Certainly I am aware of Russel's "set of all sets that do not contain themselves".  And with Godel (although I probably do not understand Godel as well as I would like.).  I should probably research the literature.  &lt;br /&gt;&lt;br /&gt;But since this is a hobby, I thought I might begin by writing down some simple observations. Not new - not to me, and probably well known.  &lt;br /&gt;&lt;br /&gt;But fun.&lt;br /&gt;&lt;br /&gt;Consider simple systems of statements:&lt;br /&gt;&lt;br /&gt;A single statement system:&lt;br /&gt;&lt;br /&gt;S0:  "This statement is false."  The classic Liar's Paradox.  The statement is neither false nor true: some say that it reflects an extra logic value, "paradox".  Three valued logic.&lt;br /&gt;&lt;br /&gt;A two statement system:&lt;br /&gt;&lt;br /&gt;S1: "S2 is false"&lt;br /&gt;S2: "S1 is false"&lt;br /&gt;&lt;br /&gt;This system is self referential, but not necessarily paradoxical.  It might be that S1 is true, in which case S2 is false, confirming that S1 is true.  Or, vice versa.   Thus, it is not contradictory. Either situation is consistent.  But, from the outside, we don't know which.&lt;br /&gt;&lt;br /&gt;What should we call this?  Bistable? Metastable, if more than 2 stable states?  Unknowable?&lt;br /&gt;&lt;br /&gt;A three statement system:&lt;br /&gt;&lt;br /&gt;S1: "S2 is false"&lt;br /&gt;S2: "S3 is false"&lt;br /&gt;S3: "S4 is false"&lt;br /&gt;&lt;br /&gt;This is paradoxical, in the same way that the single statement Liar's Paradox is paradoxical.&lt;br /&gt;&lt;br /&gt;Conjecture: I suspect that be a true paradox, then the feedback loop must have an odd number of stages.  Whereas to be bistable or metastable, it must have an even number of stages.&lt;br /&gt;&lt;br /&gt;Compare to inverter rings or storage bitcells with cross coupled inverters in computer electronics.&lt;br /&gt;&lt;br /&gt;Q: can you construct a system that has interescting self referential rings of odd and even length?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-8828210847045254848?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/8828210847045254848/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=8828210847045254848' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8828210847045254848'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8828210847045254848'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/07/paradox-and-consistency.html' title='Paradox and Consistency'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-2528717815413688691</id><published>2011-07-06T23:32:00.000-07:00</published><updated>2011-07-06T23:32:15.464-07:00</updated><title type='text'>Address Generation Writeback</title><content type='html'>http://semipublic.comp-arch.net/wiki/Address_Generation_Writeback&lt;br /&gt;&lt;br /&gt;= Hardware Motivation for Address Register Modifying Addressing Modes =&lt;br /&gt;&lt;br /&gt;Apart from the software datastructure motivation,&lt;br /&gt;there is a hardware motivation for addressing modes such as pre-increment or decrement.&lt;br /&gt;&lt;br /&gt;In more generality - addressing modes that calculate an address, and then save the results of that calculation &lt;br /&gt;in a register - typically one of the registers involved in the addressing mode.&lt;br /&gt;&lt;br /&gt;Consider [[Mitch Alsup's Favorite Addressing Mode]],&lt;br /&gt;the x86's [[Base+Scaled-Index+Offset]].&lt;br /&gt;&lt;br /&gt;A succession of such instructions might look like:&lt;br /&gt;&lt;br /&gt;;an [[RMW]]&lt;br /&gt;  r1 := load( M[ rB+rO*4+offset1 ] )&lt;br /&gt;  r1 := ... some calculation involving r1 ...&lt;br /&gt;  store( M[ rB+rO*4+offset1 ] := r1 )&lt;br /&gt;&lt;br /&gt;or&lt;br /&gt;&lt;br /&gt;; nearby loads&lt;br /&gt;  r1 := load( M[ rB+rO*4+offset1 ] )&lt;br /&gt;  r2 := load( M[ rB+rO*4+offset2 ] )&lt;br /&gt;  r3 := load( M[ rB+rO*4+offset3 ] )&lt;br /&gt;&lt;br /&gt;Notice the possibility of [[CSE (Common Subexpression Elimination)]]:&lt;br /&gt;&lt;br /&gt;For the [[RMW]] case, this is straightforward:&lt;br /&gt;&lt;br /&gt;  rAtmp := [[lea]]( M[ rB+rO*4+offset1 ] )&lt;br /&gt;  r1 := load( rAtmp ] )&lt;br /&gt;  r1 := ... some calculation involving r1 ...&lt;br /&gt;  store( M[ rAtmp ] := r1 )&lt;br /&gt;&lt;br /&gt;Is this a win? It depends:&lt;br /&gt;* in terms of performance, on a classic RISC, probably note: the extra instruction adds latency, probably costs a cycle.&lt;br /&gt;* in terms of power, possibly&lt;br /&gt;&lt;br /&gt;For the nearby load case, we could do:&lt;br /&gt;&lt;br /&gt;  rAtmp := lea( M[ rB+rO*4 ] )&lt;br /&gt;  r1 := load( M[ rAtmp+offset1 ] )&lt;br /&gt;  r2 := load( M[ rAtmp*4+offset2 ] )&lt;br /&gt;  r3 := load( M[ rAtmp*4+offset3 ] )&lt;br /&gt;&lt;br /&gt;or&lt;br /&gt;&lt;br /&gt;  rAtmp := lea( M[ rB+rO*4+offset1 ] )&lt;br /&gt;  r1 := load( M[ rAtmp ] )&lt;br /&gt;  r2 := load( M[ rAtmp*4+(offset2-offset1) ] )&lt;br /&gt;  r3 := load( M[ rAtmp*4+(offset3-offset1) ] )&lt;br /&gt;&lt;br /&gt;Is this a win? Again, it depends:&lt;br /&gt;* probably not in performance&lt;br /&gt;* possibly in terms of power.&lt;br /&gt;&lt;br /&gt;&lt;UL&gt;As an aside, let me note that if the base address rB+rO*4 is aligned,then the subsequent addresses could use [[OR indexing]] rather than [[ADD indexing]].Which may further save power.At the cost of yet another piece of crap in the instruction set.&lt;/UL&gt;&lt;br /&gt;To avoid conundrums like this - is it better to [[CSE]] the addressing mode or not?&lt;br /&gt;- processors like the [[AMD 29K]]&lt;br /&gt;abandoned addressing modes except for [[absolute addressing mode]] and [[register indirect addressing mode]]&lt;br /&gt;- so to form any nomn-trivial address you had to save the results to a register, and thereby were&lt;br /&gt;encouraged to CSE it whenever possible.&lt;br /&gt;&lt;br /&gt;Generalized pre-increment/decrement is a less RISCy approach to this.&lt;br /&gt;Instead of encouraging CSE via separate instructions,&lt;br /&gt;it encourages CSE by making it "free" to save the result ofthe addressing mode calculation.&lt;br /&gt;&lt;br /&gt;:Let's call this [[Address Generation Saving]], or, if it modifies a register used in the addressing mode, [[Address Register Modification]]:&lt;br /&gt;&lt;br /&gt;In our examples:&lt;br /&gt;&lt;br /&gt;;RMW&lt;br /&gt;  rAtmp := [[lea]]( M[ rB+rO*4+offset1 ] )&lt;br /&gt;  r1 := load( rAtmp := rB+rO*4+offset1 ] )&lt;br /&gt;  r1 := ... some calculation involving r1 ...&lt;br /&gt;  store( M[ rAtmp ] := r1 )&lt;br /&gt;&lt;br /&gt;; nearby load&lt;br /&gt;  r1 := load( M[ rAtnp :=rB+rO*4+offset1 ] )&lt;br /&gt;  r2 := load( M[ rAtmp*4+(offset2-offset1) ] )&lt;br /&gt;  r3 := load( M[ rAtmp*4+(offset3-offset1) ] )&lt;br /&gt;&lt;br /&gt;Note  that the nearby case that changes the offsets is simpler than the nearby case that saves only part ofthe address calculation:&lt;br /&gt;&lt;br /&gt;;Using C's comma expression:&lt;br /&gt;  r1 := load( M[ (rAtmp := rB+rO*4, rAtmp+offset1) ] )&lt;br /&gt;  r2 := load( M[ rAtmp*4+offset2 ] )&lt;br /&gt;  r3 := load( M[ rAtmp*4+offset3 ] )&lt;br /&gt;&lt;br /&gt;:: Again, [[OR indexing]] may benefit, for addressing mode [[(Base+Scaled-Index)}Offset]]&lt;br /&gt;&lt;br /&gt;What is the probem with this?&lt;br /&gt;* Writeback ports&lt;br /&gt;&lt;br /&gt;Such an [[Address Generation Saving]] requires an extra writeback port, in addition to the result of an instruction like load.&lt;br /&gt;&lt;br /&gt;For this reason, some instruction sets have proposed to NOT have [[address generation writeback]] for instructions that write other results, such as load,&lt;br /&gt;but only to have [[address generation writeback]] for instructions that do not have a normal writeback, such as a store.&lt;br /&gt;&lt;br /&gt;: Glew opinion: workable, minor benefit, but ugly.&lt;br /&gt;&lt;br /&gt;Except that pre-increment/decre more general form:&lt;br /&gt;&lt;br /&gt;Post-increment and decrement goes further, generalized in this way:&lt;br /&gt;then the address calculation looks like&lt;br /&gt;&lt;br /&gt;   { old := address_reg; address_reg := new_address; return old }&lt;br /&gt;&lt;br /&gt;Not only does this require a writeback port for the updated address register,&lt;br /&gt;but it also requires two paths from the AGU or RF to where the address is used&lt;br /&gt;- one for the address, the other for the updated address_reg.&lt;br /&gt;&lt;br /&gt;OVERALL OBSERVATION:  addressing modes begin the slippery slope to VLIW.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-2528717815413688691?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Address_Generation_Writeback' title='Address Generation Writeback'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/2528717815413688691/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=2528717815413688691' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/2528717815413688691'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/2528717815413688691'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/07/address-generation-writeback.html' title='Address Generation Writeback'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7267351113612255846</id><published>2011-07-04T16:40:00.000-07:00</published><updated>2011-07-04T16:50:35.877-07:00</updated><title type='text'>Mitre Top 25 SW bugs</title><content type='html'>http://cwe.mitre.org/top25/index.html&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;Rank Score ID Name&lt;br /&gt;[1] 93.8 CWE-89 Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')&lt;br /&gt;[2] 83.3 CWE-78 Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection')&lt;br /&gt;&lt;br /&gt;-- the top two are handled by taint propagation. "Improper Neutralization" =&gt; "quotification"&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[3] 79.0 CWE-120 Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')&lt;br /&gt;&lt;br /&gt;-- handled by stuff like Milo Martin's Hardbound and Softbound.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[4] 77.7 CWE-79 Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')&lt;br /&gt;&lt;br /&gt;-- tainting/quotification&lt;br /&gt;&lt;br /&gt;[5] 76.9 CWE-306 Missing Authentication for Critical Function&lt;br /&gt;[6] 76.8 CWE-862 Missing Authorization&lt;br /&gt;&lt;br /&gt;-- more like real SW bugs, with no crutches like bounds checks of tainting.&lt;br /&gt;&lt;br /&gt;-- except that we can imagine that capability systems might help here - it might be made mandatory to activate some capabilities positively, rather than passively inheriting them.&lt;br /&gt;&lt;br /&gt;[7] 75.0 CWE-798 Use of Hard-coded Credentials&lt;br /&gt;[8] 75.0 CWE-311 Missing Encryption of Sensitive Data&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[9] 74.0 CWE-434 Unrestricted Upload of File with Dangerous Type&lt;br /&gt;&lt;br /&gt;-- while we might hope that a capability system might upload such a file but strip it of all ability to do harm, experience suggests that a user will just blindly give the file whatever privileges it asks for.&lt;br /&gt;&lt;br /&gt;[10] 73.8 CWE-807 Reliance on Untrusted Inputs in a Security Decision&lt;br /&gt;&lt;br /&gt;-- tainiting&lt;br /&gt;&lt;br /&gt;[11] 73.1 CWE-250 Execution with Unnecessary Privileges&lt;br /&gt;&lt;br /&gt;-- capability systems are supposed to make it easier to implement the "Principle of Least Privilege".  However, it still happens.&lt;br /&gt;&lt;br /&gt;-- we can imagine security tools that profile privileges - that determine what privileges have never been used, to allow narrowing the privileges as much as possible.&lt;br /&gt;&lt;br /&gt;[12] 70.1 CWE-352 Cross-Site Request Forgery (CSRF)&lt;br /&gt;&lt;br /&gt;-- tainting, capabilities&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[13] 69.3 CWE-22 Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')&lt;br /&gt;[14] 68.5 CWE-494 Download of Code Without Integrity Check&lt;br /&gt;[15] 67.8 CWE-863 Incorrect Authorization&lt;br /&gt;&lt;br /&gt;[16] 66.0 CWE-829 Inclusion of Functionality from Untrusted Control Sphere&lt;br /&gt;&lt;br /&gt;-- tainting&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[17] 65.5 CWE-732 Incorrect Permission Assignment for Critical Resource&lt;br /&gt;[18] 64.6 CWE-676 Use of Potentially Dangerous Function&lt;br /&gt;[19] 64.1 CWE-327 Use of a Broken or Risky Cryptographic Algorithm&lt;br /&gt;&lt;br /&gt;[20] 62.4 CWE-131 Incorrect Calculation of Buffer Size&lt;br /&gt;&lt;br /&gt;-- classic buffer overflow, such as Martin Hardbound/Softbound.&lt;br /&gt;&lt;br /&gt;[21] 61.5 CWE-307 Improper Restriction of Excessive Authentication Attempts&lt;br /&gt;[22] 61.1 CWE-601 URL Redirection to Untrusted Site ('Open Redirect')&lt;br /&gt;&lt;br /&gt;[23] 61.0 CWE-134 Uncontrolled Format String&lt;br /&gt;[24] 60.3 CWE-190 Integer Overflow or Wraparound&lt;br /&gt;&lt;br /&gt;-- these two, 23 and 24, are somewhat handled by buffer overflow checking as in Hardbound and Softbound - the security flaws with integer overflow often manifest themselves as unchecked buffer overflows.&lt;br /&gt;&lt;br /&gt;-- however, they can manifest themselves in other ways as well.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[25] 59.9 CWE-759 Use of a One-Way Hash without a Salt&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7267351113612255846?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://cwe.mitre.org/top25/index.html' title='Mitre Top 25 SW bugs'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7267351113612255846/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7267351113612255846' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7267351113612255846'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7267351113612255846'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/07/mitre-top-25-sw-bugs.html' title='Mitre Top 25 SW bugs'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-1830733869915994329</id><published>2011-06-29T15:10:00.001-07:00</published><updated>2011-06-29T15:11:27.926-07:00</updated><title type='text'>Notions of Time in Computer Architecture</title><content type='html'>http://semipublic.comp-arch.net/wiki/Notions_of_Time_in_Computer_Architecture&lt;br /&gt;&lt;br /&gt;Advanced computer architecture admits several different notions of time.&lt;br /&gt;&lt;br /&gt;= Real World Time =&lt;br /&gt;&lt;br /&gt;First, there is [[real time]] or [[wall clock time]].  Time in the real world (forgetting Einsteinian relativistic issues for the moment).&lt;br /&gt;&lt;br /&gt;Unfortunately, the term [[real time]], which would be the best term for this, is often overloaded with the additional meaning&lt;br /&gt;of pertaining to a system that must respond in real time - e.g. "Drop the rods in the nuclear power plant quickly or the nuclear fuel will melt down."&lt;br /&gt;Or "adjust the aileron pitch with a few milliseconds, or the plane will stall and crash."&lt;br /&gt;These applications are often called [[hard real time]], and there are also [[soft real time]] applications that have less stringent timing requirements.&lt;br /&gt;&lt;br /&gt;Because of the use of real time in phrases such as [[real time operating system]] or [[real time response]], terms such as [[wall clock time]] may be used.&lt;br /&gt;&lt;br /&gt;== Absolute versus Relative ==&lt;br /&gt;&lt;br /&gt;Real time (and, indeed, other times) may be absolute or relative.&lt;br /&gt;&lt;br /&gt;Absolute time is wall clock time, or calendar time.&lt;br /&gt;&lt;br /&gt;Relative time is the amount of elapsed time since some other event.  E.g. one might start a timer on entering a function, and read it on exit.&lt;br /&gt;&lt;br /&gt;Obviously, you can calculate relative timer from absolute time samples.&lt;br /&gt;But, it can be considerably easier to build the hardware to measure relative time that it is to measure ab solute time.&lt;br /&gt;Timers used in relative timing may be in isolation, whereas clocks measuring absolute time may need to be kept in synchronization &lt;br /&gt;with other clocks in other systems.&lt;br /&gt;&lt;br /&gt;== [[Timers versus Timestamps]] ==&lt;br /&gt;&lt;br /&gt;Oftentimes appplications do not need actual time. &lt;br /&gt;Instead, what they really need is timestamps that indicate relative position.&lt;br /&gt;The application may care who came first or second, but not how much time elapsed between arrivals.&lt;br /&gt;&lt;br /&gt;Therefore, some timer architectures provide absolute time in uppper bits,&lt;br /&gt;but simply indicate a unique counter in the lower bits of a timestamp.&lt;br /&gt;&lt;br /&gt;= Von Neuman Time =&lt;br /&gt;&lt;br /&gt;In out-of-order and speculative processors,&lt;br /&gt;even when single threaded, it is often convenient to talk about the &lt;br /&gt;[[Von Neumann sequence number]] or [[Von Neumann time]].&lt;br /&gt;&lt;br /&gt;On a single threaded program, this is the sequence number at which every and any [[dynamic instruction sequence]] executes.&lt;br /&gt;&lt;br /&gt;The [[Von Neumann time]] or sequence number for an instruction is NOT the [[wall clock time]]&lt;br /&gt;at which an instruction executes.&lt;br /&gt;Even on simple in-order processors, where instructions may have differing [[instruction latency]],&lt;br /&gt;the VN time is not linearly related to the wall clock time.&lt;br /&gt;On complex out-of-order processors instructions that are later in the Von Neumman instruction sequence&lt;br /&gt;may execute earlier in wall clock time (that's what makes them out-of-order).&lt;br /&gt;&lt;br /&gt;The [[Von Neumann time]] or sequence number may be a convenient book-keeping tool.&lt;br /&gt;* it is often made available in a [[simulator]] user interface&lt;br /&gt;* I have proposed using it it [[hardware data structures]] such as store buffers and schedulers.&lt;br /&gt;** When I talk about an "age based scheduler", I usually mean a [[Von Neumann age]] based scheduler. &lt;br /&gt;** Of course, real hardware must implement a finite number of bits, typically wrapping&lt;br /&gt;&lt;br /&gt;== Multithreading ==&lt;br /&gt;&lt;br /&gt;The [[Von Neumann time]] or sequence number is unambiguous for a single threaded program.&lt;br /&gt;&lt;br /&gt;For multithreaded programs that interact, it is necessary to establish correspondences - VN1 time of thread A corresponds to VN time of thread B,&lt;br /&gt;at different points if interaction.&lt;br /&gt;&lt;br /&gt;It should be seen that this multithreaded time is a partial order.  Ideally one from which a total order can be constructed.&lt;br /&gt;(Although certain microarchitectural interactions may imply cycles in such a time graph.)&lt;br /&gt;&lt;br /&gt;I know of no standard term for this [[multithreaded time]]. [[Multithreaded Von Neuman time]].&lt;br /&gt;&lt;br /&gt;I think that it might be nice to call it [[Lamport time]],&lt;br /&gt;since Leslie Lamport was a pioneer in this area.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= Pipeline Time(s) =&lt;br /&gt;&lt;br /&gt;Any given [[dynamic instruction instance]] desn't execute at a single point in time.,&lt;br /&gt;Rather, it executes in different ways at many different points in tiime:&lt;br /&gt;* [[instruction fetch time]]&lt;br /&gt;* [[decode time]]&lt;br /&gt;* [[schedule time]]&lt;br /&gt;* [[operand ready time]]&lt;br /&gt;* [[execution time]]&lt;br /&gt;* [[register read time]]&lt;br /&gt;* [[writeback time]]&lt;br /&gt;* re-execution time]] or [[replay time]]&lt;br /&gt;* [[commit time]]&lt;br /&gt;* [[retirement time]]&lt;br /&gt;&lt;br /&gt;Plus of course there is the&lt;br /&gt;* [[Von Neuman time]] or [[Von Neumann sequence number]] of a [[dynamic instruction instance]].&lt;br /&gt;&lt;br /&gt;For that matter&lt;br /&gt;* the [[Von Neumann time]] relative to a given thread of execution&lt;br /&gt;may differe from the&lt;br /&gt;* [[Von Neuman time]] relative to the processor it is executed on&lt;br /&gt;because of [[timesharing]] or [[multitasking]] between threads or processors.&lt;br /&gt;&lt;br /&gt;= Simulator Times =&lt;br /&gt;&lt;br /&gt;Consider&lt;br /&gt;* Real time of the machine on which the simulator runs: [[simulator real time]]&lt;br /&gt;* [[Simulated real time]] of the workload running on the simulator: [[simulatee real time]]&lt;br /&gt;* [[Von Neumann time]] or sequence number for the workload running on the simulator.&lt;br /&gt;&lt;br /&gt;This can be fully recursive.  A simulator may be ruunning on top of a simulator on top of ...&lt;br /&gt;&lt;br /&gt;= Conclusion =&lt;br /&gt;&lt;br /&gt;Be careful what notion of time you are talking about.&lt;br /&gt;It may not be the notion of time for the person you are talking to.&lt;br /&gt;&lt;br /&gt;[[Microarchitecture]] is concerned with [[real time]].&lt;br /&gt;[[Macroarchitecture]] is concerned more with abstract time, such as [[Von Neumann time]].&lt;br /&gt;Systems that have different microarchitectures and hence different real time behavior,&lt;br /&gt;may execute the same [[macroarchitecture]].&lt;br /&gt;But, of course, [[macroarchitecture]] features such as [[ISA]] greatly influence [[real time]] performance.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-1830733869915994329?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Notions_of_Time_in_Computer_Architecture#Timers_versus_Timestamps' title='Notions of Time in Computer Architecture'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/1830733869915994329/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=1830733869915994329' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1830733869915994329'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1830733869915994329'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/06/notions-of-time-in-computer.html' title='Notions of Time in Computer Architecture'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-3252781192343429530</id><published>2011-06-23T10:48:00.000-07:00</published><updated>2011-06-23T10:48:18.306-07:00</updated><title type='text'>cygwin startxwin problem with multiple displays</title><content type='html'>I use cygwin on my Windows PCs. Especially my tablet PC. AFAIK there is no equally good UNIX or Linux or Apple equivalent for a Microsoft Windows tablet PC. Tell me if there is. (Yes, I write and draw with the pen.))&lt;br /&gt;&lt;br /&gt;Unfortunately, my cygwin setup has decayed - things stopped working on various upgrades, and I do not always fix the problem immediately.  Here's one:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;For a while, I would use startxwin to start up X Windows in Cygwin's multiwindow mode.  At some point, it worked, but the xterm started by default was off obably havescreen, could not be seen.  If I maximized I could find it, but when non-maximized, not seen.&lt;br /&gt;&lt;br /&gt;Turns out that startxwin's default was to start "xterm -geometry +1+1 ...".&lt;br /&gt;&lt;br /&gt;And I have been using multiple displays - e.g. currently have 4.  Irregularly arranged.  So that there is no display at +1+1 in the bounding box of all displays.&lt;br /&gt;&lt;br /&gt;FIX: provide my own ~/.startxwinrc, with no location specified for the xterm.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;&lt;br /&gt;Meta-issue: almost want a sanity check, to make sure that no windows are raised offscreen, invisibly.&lt;br /&gt;&lt;br /&gt;Overall, there are probably a lot of such issues relating to assumptions about screen geometry being regular and rectangular, that no longer hold in the world of multiple displays.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;Another meta-issue: took a while to find out where the xterm was coming from.  &lt;br /&gt;&lt;br /&gt;It would have been nice to have an audit trail that allowed me to find where the xterm came from.&lt;br /&gt;&lt;br /&gt;Similarly, to find out where its geometry parameter came from.&lt;br /&gt;&lt;br /&gt;As it was, I just guessed "startxwin".  But I am embarassed to say how long it took...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-3252781192343429530?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://x.cygwin.com/docs/man1/startxwin.1.html' title='cygwin startxwin problem with multiple displays'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/3252781192343429530/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=3252781192343429530' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3252781192343429530'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3252781192343429530'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/06/cygwin-startxwin-problem-with-multiple.html' title='cygwin startxwin problem with multiple displays'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7251275627012748032</id><published>2011-06-19T18:14:00.000-07:00</published><updated>2011-06-19T18:14:05.173-07:00</updated><title type='text'>A vocabulary of terms for memory operation combining</title><content type='html'>There are many different terms for the common optimization of memory operation combining or coalescing.&lt;br /&gt;See the discussions of the [[difference between write combining and write coalescing]],&lt;br /&gt;as well as [[write combining]] and [[write coalescing]],&lt;br /&gt;and [[NYU Ultracoomputer]] [[fetch-and-op]] [[combining networks]].&lt;br /&gt;&lt;br /&gt;Here is a by no means complete list of such terms:&lt;br /&gt;&lt;br /&gt;* [[combining]]&lt;br /&gt;** [[write combining]]&lt;br /&gt;*** [[write combining buffer]]s&lt;br /&gt;*** [[write combining cache]]&lt;br /&gt;** [[read combining]]&lt;br /&gt;** [[fetch-and-op combining]]&lt;br /&gt;*** [[NYU Ultracomputer]] [[fetch-and-op]] [[combining networks]]&lt;br /&gt;&lt;br /&gt;* [[coalescing]]&lt;br /&gt;:: on GPUs...&lt;br /&gt;** [[write coalescing]]&lt;br /&gt;** [[read coalescing]]&lt;br /&gt;&lt;br /&gt;I have not heard anyone talk about fetch-and-op or atomic RMW coalescing&lt;br /&gt;as distinct from the historic&lt;br /&gt;[[NYU Ultracomputer]] [[fetch-and-op combining]].&lt;br /&gt;But I suppose that inevitably this will arise.&lt;br /&gt;&lt;br /&gt;* [[squashing]]&lt;br /&gt;:: mainly refers to finding that an operation is unnecessary, and cancelling it - or at least the unnecessary part&lt;br /&gt;** [[load squashing]]&lt;br /&gt;::: On P6, referred to a cache miss finding that there was already a cache miss in flight for the same cache line. No need to issue a new bus request, but the squashed request was not completely cancelled - arrangement was made so that data return from the  squashing cache miss would &lt;br /&gt;** [[store squashing]]&lt;br /&gt;::: I am not aware of this being done, but it could refer to comparing store addresses in a [[store buffer]], and determining that an older store is unnecessary since a later store completely overwrites it (and there would be no intervening operations that make it visible according to the [[memory ordering model]]).  (Actually, I am not sure that there could be such, but I am leaving the possibility as a placeholder.)&lt;br /&gt;::: [[Write combining]] accomplishes much the same thing, although [[store squashing]] in the store buffer gets it done earlier.&lt;br /&gt;::: Note that this is similar to [[store buffer combining]] - combining entries in the store buffer, separate  from a [[write combining buffer]].&lt;br /&gt;&lt;br /&gt;Again, I have not heard of fetch-and-op squashing, although I am sure that it could be done, e.g. for lossy operations such as AND and OR (a later fetch-and-OR with a bitmask that completely includes an earlier...).&lt;br /&gt;&lt;br /&gt;* [[snarfing]]&lt;br /&gt;:: I have usually seen this term used in combination with a [[snoopy bus]], although it can also be used with other interconnects.  [[Read snarfing]] means that a pending request from P1 that has not yet received the bus in arbitration observes the data for the same cachelines from a different processor P0, and "snarfs" the data as it goes by.&lt;br /&gt;Depending on the cache protocol, it may be necessary to assert a signal to put data into [[shared (S) state]] rather than [[exclusive (E) state]].&lt;br /&gt;&lt;br /&gt;I am not sure what [[write snarfing]] would look like.&lt;br /&gt;An [[update cache]] is somewhat like, but it updates an existing cache line, not an existing request.&lt;br /&gt;I.e. an update cache protocol snarfs write data from the bus or interconnect to update a cache line.&lt;br /&gt;Whereas a pending load can snarf data either from read replies or  write data transactions on the bus or interconnect.&lt;br /&gt;I.e. a read can snarf from a read or a write.&lt;br /&gt;But does a write "snarf"?&lt;br /&gt;More like a write may combine with another write -&lt;br /&gt;[[snoopy bus based write combining]],&lt;br /&gt;as distinct from [[buffer based write combining]].&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7251275627012748032?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/A_vocabulary_of_terms_for_memory_operation_combining' title='A vocabulary of terms for memory operation combining'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7251275627012748032/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7251275627012748032' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7251275627012748032'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7251275627012748032'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/06/vocabulary-of-terms-for-memory.html' title='A vocabulary of terms for memory operation combining'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7552997877440924677</id><published>2011-06-19T18:13:00.001-07:00</published><updated>2011-06-19T18:13:24.245-07:00</updated><title type='text'>Difference between write combining and write coalescing</title><content type='html'>[[Write coalescing]] is the term some GPUs, notably AMD/ATI and Nvidia, use to describe how they, umm, combine or coalesce writes from different N different SIMD threads into a single, or at least fewer than N, accesses.  There is also [[read coalescing]], and one can imagine other forms of coalescing, such as atomic fetch-and-op coalescing.&lt;br /&gt;&lt;br /&gt;At AFDS11 I (Glew) asked an AMD/ATI GPU architect&lt;br /&gt;"What is the difference between [[write coalescing]] and [[write combining]]?"&lt;br /&gt;&lt;br /&gt;He replied that [[write combining]] was an x86 CPU feature that used a [[write combining buffer]],&lt;br /&gt;whereas [[write coalescing]] was a GPU feature that performed the optimization between multiple writes that were occurring simultaneously, not in a buffer.&lt;br /&gt;&lt;br /&gt;Hmmm...&lt;br /&gt;&lt;br /&gt;Since I (Glew) had a lot to do with x86 write combining&lt;br /&gt;- arguably I invented it on P6, although I was inspired by a long line of work in this area,&lt;br /&gt;most notably the [[NYU Ultracomputer]] [[fetch-and-op]] [[combining network]]&lt;br /&gt;- I am not sure that this distinction is fundamental.&lt;br /&gt;&lt;br /&gt;Or, rather, it _is_ useful to distinguish between buffer based implementations and implementations that look at simultaneous accesses.&lt;br /&gt; &lt;br /&gt;However, in the original NYU terminology, [[combining]] referred to both:&lt;br /&gt;operations received at the same time by a switch in the [[combining network]],&lt;br /&gt;and operations received at a later time that match an operation buffered in the switch,&lt;br /&gt;awaiting either to be forwarded on,&lt;br /&gt;or a reply.&lt;br /&gt;(I'm not sure which was in the Ultracomputer.)&lt;br /&gt;&lt;br /&gt;A single P6 processor  only did one store per cycle, so a buffer based implementation that performed [[write combining]] between stores &lt;br /&gt;at different times was the only possibility. Or at least the most useful.&lt;br /&gt;Combining stores from different processors was not done (at least, not inside the processor, and could not legally be done to all UC stores).&lt;br /&gt;&lt;br /&gt;The NYU Ultracomputer performed this optimization in a switch for multiple processors,&lt;br /&gt;so combining both simultaneous operations and operations performed at different times&lt;br /&gt;was a possibility.&lt;br /&gt;&lt;br /&gt;GPUs do many, many, stores at the same time, in a [[data memory coherent]] manner.&lt;br /&gt;This creates a great opportunity for optimizing simultaneous stores.&lt;br /&gt;Although I would be surprised and disappointed to learn that &lt;br /&gt;GPUs did not combine or coalesce&lt;br /&gt;(a) stores from different cycles in the typically 4 cycle wavefront or warp,&lt;br /&gt;and&lt;br /&gt;(b) stores from different SIMD engines, if they encounter each other on the way to memory.&lt;br /&gt;&lt;br /&gt;I conclude therefore that the difference between [[write combining]] and [[write coalescing]] is really one of emphasis.&lt;br /&gt;Indeed, this may be yet another example  where my&lt;br /&gt;(Glew's) predilection is to [[create new terms by using adjectives]],&lt;br /&gt;e.g. [[write combining buffer]] or [[buffer-based write combining]]&lt;br /&gt;versus [[simultaneous write combining]] (or the [[AFAIK]] hypiothetical special case [[snoop based write combining]]),&lt;br /&gt;rather than creating gratuitous new terminology,&lt;br /&gt;such as [[write combining]] (implicitly restricted to buffer based)&lt;br /&gt;versus [[write coalescing]] (simultaneous, + ...).&lt;br /&gt;&lt;br /&gt;= See Also =&lt;br /&gt;&lt;br /&gt;This discussion prompts me to create&lt;br /&gt;&lt;br /&gt;* [[a vocabulary of terms for memory operation combining]]&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7552997877440924677?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Difference_between_write_combining_and_write_coalescing' title='Difference between write combining and write coalescing'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7552997877440924677/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7552997877440924677' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7552997877440924677'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7552997877440924677'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/06/difference-between-write-combining-and.html' title='Difference between write combining and write coalescing'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-3461994257459388981</id><published>2011-06-15T23:24:00.000-07:00</published><updated>2011-06-15T23:24:36.637-07:00</updated><title type='text'>Write combining</title><content type='html'>http://semipublic.comp-arch.net/wiki/Write_combining#WC_policies&lt;br /&gt;&lt;br /&gt;Many processors have [[write combining]] support. The [[WC buffers]] and [[USWC memory type]] that I (Andy Glew) added to the Intel P6 are probably by no means the first, or last, such feature. Although arguably pretty successful.&lt;br /&gt;&lt;br /&gt;= The basic idea of write combining =&lt;br /&gt;&lt;br /&gt;Say that you have a memory bus of width N, e.g. N=64b.&lt;br /&gt;&lt;br /&gt;But say that software, for its own nefarious reasons, is writing only in size M, M&lt;N.&lt;br /&gt;E.g. say that software is writing a byte at a time.&lt;br /&gt;&lt;br /&gt;Simplistically, each such byte write would be wasting 7/8 of the 64b bus.&lt;br /&gt;&lt;br /&gt;The basic idea of a write combining buffer is to have a buffer at least N bits wide, with subset validity bits&lt;br /&gt;- e.g. 64 bits, with 1 bit per byte indicating that a byte has been written.&lt;br /&gt;&lt;br /&gt;With the valid bits initially empty, as you write to the WC buffer you place data in the corresponding byte, and set the corresponding byte valid bit.&lt;br /&gt;&lt;br /&gt;At some later point in time, you may evict the WC buffer.  If all of the bytes have been written, you use a single efficient N=64b write.  If not, you do some sort of [[partial eviction]]. &lt;br /&gt;&lt;br /&gt;If the silly software completely overwrites the WC buffer, you have used 8/8 of the bus bandwidth, rather than only 1/8.&lt;br /&gt;&lt;br /&gt;== A more realistic example ==&lt;br /&gt;&lt;br /&gt;There may be several WC buffers,&lt;br /&gt;each a full cache line (e.g. 64B (not bits, but Bytes) in length.&lt;br /&gt;&lt;br /&gt;The [[bus]] is optimized for 8-chunk bursts.  &lt;br /&gt;The bus may permit full utilization if 64B burst transfers, but smaller transfers are less efficient es&lt;br /&gt;- and, in fact, in some systems occupy exactly the same number of cycles.&lt;br /&gt;Let's say 8 cycles - 8x8B.&lt;br /&gt;&lt;br /&gt;The processor may be clocked faster than the bus.  E.g. 8GHz, versus a 1GHz memory bus.&lt;br /&gt;&lt;br /&gt;Processor does 64b stores to uncached bit not memory mapped I/O.&lt;br /&gt;Each creates a [[bus write partial line]] command,&lt;br /&gt;occupying 8 cycles on the bus in our example.&lt;br /&gt;&lt;br /&gt;If, instead, we use a write combining buffer,&lt;br /&gt;the first 64b/8B store may allocate the [[WC buffer]],&lt;br /&gt;storing data in it&lt;br /&gt;and and set the byte valid bits to &lt;br /&gt; 11111111_00000000_00000000_00000000_00000000_00000000_00000000_00000000&lt;br /&gt;&lt;br /&gt;The second 64b/8B store may hit the WC buffer, and set the byte valid bits to&lt;br /&gt; 11111111_11111111_00000000_00000000_00000000_00000000_00000000_00000000&lt;br /&gt;&lt;br /&gt;And so on..&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;When the [[wrte combining buffer]] is full, and when it is eventually evicted,&lt;br /&gt;then a [[bus write full line]] transaction would be done.&lt;br /&gt;In our contrived example, occupying the same 8 cycles as the bus write partial line transcation for each of the stores separately.&lt;br /&gt;Bottom line: 8X speedup.&lt;br /&gt;1/8 the bus occupancy - freeing up the bus for other use.&lt;br /&gt;Etc.&lt;br /&gt;&lt;br /&gt;The point of this exercise was to show that write combining does not need to work with tiny 64b buffers.&lt;br /&gt;Full cache line buffers are possible, and, in fact, are what the Intel P6 and most subsequent x86 machines have.&lt;br /&gt;Also, clocking the processor faster than the bus may motivate WC.&lt;br /&gt;Also, it's not just a question of silly software doing 8b writes: even software doing 64b writes may benefit.&lt;br /&gt;&lt;br /&gt;Real world example:  on the Intel P6 family, a 64B full cache line transfer ([[BWL (Bus Write Line)]] occupied circa 5 cycles.&lt;br /&gt;A [[BWP (Bus Write Partial)]] transaction occupied 3 cycles.&lt;br /&gt;So the speedup is not quite so extreme as in the contrived example, but is still significant.&lt;br /&gt;Even if the partial writes had been optimized, integer code could only do 32 bit stores, and the bus was 64b wide.&lt;br /&gt;&lt;br /&gt;== Write Combining versus Wide Instructions ==&lt;br /&gt;&lt;br /&gt;Some may consider that write combining is a poor substitute for having wider instructions.&lt;br /&gt;I.e. instead of building a complicated write combining mechanism, why not create full cache line wide instructions such as&lt;br /&gt;* Load/Store 64B (512b) between vector registers and memory&lt;br /&gt;* [[Load/store-multiple-registers instruction]]&lt;br /&gt;&lt;br /&gt;Now, I am nearly always an advocate of explicit software control, even here.&lt;br /&gt;I am quite in favor of vector instructions,&lt;br /&gt;and less so, but still positive, on load/store-multiple-registers.&lt;br /&gt;&lt;br /&gt;But history shows: Intel x86 had only 32b integer registers&lt;br /&gt;and worked quite successfully with write combining for more than 5 years before [[x86-64]], 1996-2002;&lt;br /&gt;and even at the time I am writing this does not have 512b/64B full cache line registers.&lt;br /&gt;(Proposed in LRBNI, but not yet implemented.)&lt;br /&gt;I.e. history shows that you can be successful for quite a long time without explicit instruction set support.&lt;br /&gt;&lt;br /&gt;Explicit instruction set support may have advantages in other ways.&lt;br /&gt;But it also has costs, e.g. context switching the vector registers.&lt;br /&gt;&lt;br /&gt;== Write Combining versus Writeback, DMA, ... ==&lt;br /&gt;&lt;br /&gt;Many other arguments have been made against write combining.&lt;br /&gt;&lt;br /&gt;E.g. Why not just use [[writeback (WB)]] memory everywhere?&lt;br /&gt;A: although there is more and more cache coherent I/O, at the time I am writing this uncached memory for framebuffers, etc.,&lt;br /&gt;is still very common.  &lt;br /&gt;&lt;br /&gt;E.g. Why not use DMA engines?  Or smart I/O devices such as GPUs?&lt;br /&gt;A: not a complete answer, but basically these do not always exist.  &lt;br /&gt;Moreover, I am writing this at a GPU conference,&lt;br /&gt;where the designers of AMD's Llano APU (CPU/GPU hybrid) memory subsystem&lt;br /&gt;expressed, in conversation, the desire for better uncacheable / write combining performance.&lt;br /&gt;&lt;br /&gt;Bottom line: the need for write combining is not going away.&lt;br /&gt;&lt;br /&gt;= WC policies =&lt;br /&gt;&lt;br /&gt;== P6 USWC ==&lt;br /&gt;&lt;br /&gt;Intel P6 family USWC memory manages write-combining buffers&lt;br /&gt;in a weakly ordered manner.&lt;br /&gt;E.g. you can write A0, B0, A1, B1, etc. to two different cache lines,&lt;br /&gt;and two different WC buffers will be allocated.&lt;br /&gt;No account is taken, for USWC memory, of write ordering.&lt;br /&gt;E.g. line B may be evicted first, so that B1 is observed before A0.&lt;br /&gt;&lt;br /&gt;Eviction is almost random - from the point of view of a programmer.&lt;br /&gt;Certain events are defined to cause evictions, including:&lt;br /&gt;* interrupts&lt;br /&gt;* I/O instructions&lt;br /&gt;* uncacheable memory accesses (to [[UC]] memory) (which are possibly memory mapped I/O)&lt;br /&gt;* possibly certain fence and flushes operations (althohgh x86 fence instructions were added later).&lt;br /&gt;&lt;br /&gt;The motivation for these USWC WC buffer eviction policy&lt;br /&gt;was mainly to try to make it look transparent when coordinating with a GPU,&lt;br /&gt;sending commands via MMIO or I/O.&lt;br /&gt;&lt;br /&gt;Note: there was no guarantee of timeliness, no guarantee that USWC writes would eventually be flushed out,&lt;br /&gt;e.g. to the framebuffer.  Except that mt systems at the time had regular timer interrupts.&lt;br /&gt;Now, circa 2011, [[tickless OS]]es are common.&lt;br /&gt;It might be advisable to have hardware periodically flush USWC.&lt;br /&gt;&lt;br /&gt;Carl Amdahl observed that USWC was a cache - a small, non-coherent, cache - not a buffer.&lt;br /&gt;&lt;br /&gt;== [[Left to right write combining]] ==&lt;br /&gt;&lt;br /&gt;An alternative to P6 USWC's WC policies&lt;br /&gt;was [[left to right write combining]].&lt;br /&gt;&lt;br /&gt;This would typically only allocate a WC buffer that wrote to byte 0 of a line.&lt;br /&gt;(Although starting in the middle can be imagined.)&lt;br /&gt;&lt;br /&gt;It would perform write combining so long as the write was adjacent to, and to the right, of bytes already written in the WC buffer.&lt;br /&gt;&lt;br /&gt;If a write was performed that was not adjacent and to the write, the WC buffer would be evicted immediately.&lt;br /&gt;&lt;br /&gt;[[Left to right write combining]] has the putative advantage that it works with some forms of memory mapped I/O - devices for which the registers are designed so that they are written from low address to high.  Typically, parameters, with the last location written triggering a side effect.&lt;br /&gt;This is actually a very attractive feature, since it allows efficient MMIO for all devices designed in this way.&lt;br /&gt;Unfortunately, it is not compatible with all MMIO devices - some MMIO devices actually look at the size of the bus transaction, interpretinbg that as an aspect of the MMIO command.  (You might consider this stupid, but... )&lt;br /&gt;&lt;br /&gt;[[Left to right write combining]] also has the advantage that it preserves write ordering  So long as evictions are done left to write.&lt;br /&gt;&lt;br /&gt;In an ideal world, there would be two write-combining memory types: left to right, and USWC.  &lt;br /&gt;Unfortunately, in P6's [[feature diets]] we had to choose only one, and USWC was it.&lt;br /&gt;USWC is better for certain types of framebuffer worklolads.&lt;br /&gt;[[Left to right write combining]] for well behaved MMIO.&lt;br /&gt;&lt;br /&gt;== WC for WB ==&lt;br /&gt;&lt;br /&gt;The previous two topics, USWC and [[Left to right write combining]],&lt;br /&gt;deal with write combining with what is fundamentally uncached memory.&lt;br /&gt;&lt;br /&gt;Write combining can also be used for [[WB (Writeback)]] memory.&lt;br /&gt;&lt;br /&gt;If weakly ordered, this is straightforward.&lt;br /&gt;Similarly, it is straightforward if used to optimize writethrough traffic,&lt;br /&gt;e.g. from a WT L1$ to a WB L2$.&lt;br /&gt;So long as exclusive owne&lt;br /&gt;rship has already been obtained.&lt;br /&gt;&lt;br /&gt;However, write combining for a store-ordered memory system like [[TSO]] or [[processor consistency]]&lt;br /&gt;is more of a challenge.&lt;br /&gt;&lt;br /&gt;TBD: seek the patents on this.  Public info.&lt;br /&gt;&lt;br /&gt;= Eviction Mechanism =&lt;br /&gt;At some point it is necessary to evict a write combining buffer.&lt;br /&gt;&lt;br /&gt;If the buffer is completely written, then evicting it,&lt;br /&gt;on USWC memory, is straightforward:&lt;br /&gt;* use a [[BWL (Burst Write Line)]] bus transaction that writes the entire line&lt;br /&gt;:: (Caveat: some systems require obtaining ownership before writing like this. But nt P6.)&lt;br /&gt;&lt;br /&gt;If the buffer is not completely written, you can use any of the following&lt;br /&gt;* read the missing bytes using a [[BRL (Burst Read Line)]], merge, and then write using [[BWL]].&lt;br /&gt;** works but potentially uses MORE memory traffic than not write combining&lt;br /&gt;* using the most efficient sequence of [[partial writes]] possible&lt;br /&gt;** e.g. using 64b [[write bytes under mask bus transaction]]s - eliding empty chunks&lt;br /&gt;** some systems do not have [[write bytes under mask bus transaction]], but only support 8b, 16b, 32b, etc. partial writes.  A state machine can emit a sequence of these&lt;br /&gt;*** note: this probably violates any pretense of the original writes being atomic, particularly if not aligned&lt;br /&gt;* ideally, use an efficient [[BWLM (Burst Write Line under Mask)]] bus transaction.&lt;br /&gt;** this might consist of 4 or 8 data chunks, along with a data chunk that contains a 64 bit mask for the operation.&lt;br /&gt;*** Issue: is the mask attached to address or data?  Possibly both.  Possibly there is no distinction.&lt;br /&gt;*** Aside: in [[BS: Bitmask Coherency for Writeback Caches]] I discuss the possibility of having two bitmasks, dirty and clean, on such a bus transaction. Possibly with a read version.&lt;br /&gt;&lt;br /&gt;= TBD - fatigue =&lt;br /&gt;&lt;br /&gt;I'm too tired to finish this.  Let me just list some topics:&lt;br /&gt;&lt;br /&gt;* [[WC buffer as a store buffer extension]] - constraining eviction order if you want to maintain processor conssistency.&lt;br /&gt;&lt;br /&gt;* The [[difference between write combining and write coalescing]]&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-3461994257459388981?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Write_combining#WC_policies' title='Write combining'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/3461994257459388981/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=3461994257459388981' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3461994257459388981'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3461994257459388981'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/06/write-combining.html' title='Write combining'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7765746585263858045</id><published>2011-06-15T22:35:00.000-07:00</published><updated>2011-06-15T22:36:38.552-07:00</updated><title type='text'>Writethrough with write allocate versus no write allocate</title><content type='html'>http://semipublic.comp-arch.net/wiki/Writehrough_with_write_allocate_versus_no_write_allocate&lt;br /&gt;&lt;br /&gt;Write-through caches can be write allocate or no-write allocate.&lt;br /&gt;&lt;br /&gt;A no-write allocate write-through cache can only put data in the cache on a read.&lt;br /&gt;A write that misses is written through, to the next cache level or memory, but not stored in the cache.&lt;br /&gt;&lt;br /&gt;A write-through with write-allocate cache, if it does not read clean data,&lt;br /&gt;must necessarily have valid bits for bytes or words within the cache line.&lt;br /&gt;Unless the writes can only be cache line sized.&lt;br /&gt;&lt;br /&gt;Actually, we can imagine a write-through with write allocate cache, that reads clean data,&lt;br /&gt;and thereby avoids the need to have byte valid bits.&lt;br /&gt;I was about to say that this somewhat misses the point...&lt;br /&gt;and it certainly would on "write and forget" systems.&lt;br /&gt;But, some systems must obtain ownership even on write-through caches.&lt;br /&gt;E.g. the IBM z-series (descendants of the System 360)&lt;br /&gt;must ensure that all other copies of a cache line are invalidated before&lt;br /&gt;"performing" the write.&lt;br /&gt;If you have to do that, you might almost as well obtain the clean data,&lt;br /&gt;and not have to maintain byte dirty bits.&lt;br /&gt;&lt;br /&gt;Contrast: &lt;br /&gt;* P6 family: the write-through invalidates other caches, but the data can be read from the local cache before the remote invalidations. Do not need to get a reply from the invalidation.&lt;br /&gt;* IBM family: must invalidate before forwarding locally; i.e. must wait for the invalidation to be complete.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7765746585263858045?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Writehrough_with_write_allocate_versus_no_write_allocate' title='Writethrough with write allocate versus no write allocate'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7765746585263858045/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7765746585263858045' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7765746585263858045'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7765746585263858045'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/06/writethrough-with-write-allocate-versus.html' title='Writethrough with write allocate versus no write allocate'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-1312401781471375092</id><published>2011-05-18T06:40:00.000-07:00</published><updated>2011-05-18T06:42:32.932-07:00</updated><title type='text'>Dynamic shifts</title><content type='html'>http://semipublic.comp-arch.net/wiki/Why_dynamic_shift_count_instructions_are_often_slower_than_constant_shift_count_instructions&lt;br /&gt;&lt;br /&gt;This  essay isn't finished - need to collect more info.&lt;br /&gt;&lt;br /&gt;= [[Why dynamic shift count instructions are often slower than constant shift count instructions]] =&lt;br /&gt;&lt;br /&gt;I don't know.&lt;br /&gt;&lt;br /&gt;I don't even know if they are, in general. Terje Mathisen says that they are in his experience,&lt;br /&gt;and on at least two anecdotal machines dynamic shifts have been a pain.  But I am aware of no fundamental reason.&lt;br /&gt;&lt;br /&gt;TBD: survey.&lt;br /&gt;&lt;br /&gt;== Possible Circuit Reasons - Unlikely ==&lt;br /&gt;&lt;br /&gt;Paul Clayton suggests that knowing the static shift count could be used to do early set up of a shifter, whereas dynamic shift counts may arrive too later.  This is possibly true, even probably true - it _could_ be used as an optimization.  But in my experience I have not seen this done.  Ditto Mitch Alsup's.&lt;br /&gt;&lt;br /&gt;Now, a very basic reason is that a shift by a small constant, e.g. a shift left or right by 1, can be much cheaper.  On a 4 pipeline machine I can easily imagine building 4 narrow shifters but only one general shifter.  Similarly, I can imagine converting &lt;&lt;1 into ADD instructions.  I.e. I can imagine why we might have more small width shifters that a full width shifter.  Similarly for latency.  But I still don't see a generic reason.&lt;br /&gt;&lt;br /&gt;== x86 dynamic shift - flags hassle ==&lt;br /&gt;&lt;br /&gt;The x86 OOO/P6 family slowness for variable shifts is largely due to the fact that a variable shift by zero was defined be a NOP.  On a 2-input OOO machine, this necessitated a third input for the old flags, and a second uop:&lt;br /&gt;&lt;br /&gt;tmp := concat( value_to_be_shifted, old_flags )&lt;br /&gt;dest,new_flags := shift( tmp, shift_count )&lt;br /&gt;&lt;br /&gt;or (with lower latency for the shift, and a widget uop to handle to flag selection)&lt;br /&gt;&lt;br /&gt;dest,tmp_flags := shift( value_to_be_shifted, shift_count )&lt;br /&gt;final_flags := select_shift_flags( tmp_flags, old_flags)      &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;If the instruction had been defined without this 0-NOP flag business, e.g. if you set a flag combination on zero rather than inheriting one, it would have been faster.&lt;br /&gt;&lt;br /&gt;Now that 3-input datapaths are common for multiply-add, this could be undone.  Perhaps it has already been?&lt;br /&gt;&lt;br /&gt;== Gould - no dynamic shift instruction ==&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;E.g. on Gould we  did not have them:  we had to resort to the moral equivalent of self-modifying code, generating the shift by constant in a register and then using the execute register instruction.&lt;br /&gt;&lt;br /&gt;== Discussion ==&lt;br /&gt;&lt;br /&gt;Apart from this, why are dynamic shifts slow?   What machines make them slow?&lt;br /&gt;&lt;br /&gt;A dynamic shift is always going to be more expensive than a shift by 1 or 2 bits.  At least, you can probably build 4 shift by 1s, but only 1 dynamic full width shifter, on a typical datapath.  But not necessarily more expensive  than a shift by a large constant, unless tyhe sort of optimization Paul was talking about is done.  Which has not been the case in my experieence.&lt;br /&gt;&lt;br /&gt;But, you are right: dynamic shifts are often penalized.&lt;br /&gt;&lt;br /&gt;This sounds like an essay for the comp-arch.net wiki.  &lt;br /&gt;&lt;br /&gt;What machines make dynamic shifts slower than, say, a shift by 29?&lt;br /&gt;&lt;br /&gt;Why?&lt;br /&gt;&lt;br /&gt;Is it fundamental, or is it an accident of the instruction set, as it was for x86 and Gould?&lt;br /&gt;&lt;br /&gt;What would an instruction set definition of dynamic shift look like that did NOT causse such implementation artifacts look like?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-1312401781471375092?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Why_dynamic_shift_count_instructions_are_often_slower_than_constant_shift_count_instructions' title='Dynamic shifts'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/1312401781471375092/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=1312401781471375092' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1312401781471375092'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1312401781471375092'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/05/dynamic-shifts.html' title='Dynamic shifts'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7762299530707286811</id><published>2011-05-15T12:51:00.000-07:00</published><updated>2011-05-15T12:51:20.183-07:00</updated><title type='text'>ISO ACLs for Wiki; considering Drupal</title><content type='html'>I have long wanted ACLs for wikis, both at companies like Intel and AMD, and on comp-arch.net.&lt;br /&gt;&lt;br /&gt;I punted ion this when I set up comp-arch.net, with the semipublic and public areas.  Abandoning the public areas when I got spammed.&lt;br /&gt;&lt;br /&gt;But now my need is growing again. I am wiki'ing things that may get patented, so they cannot be initially public --- but it is too much hassle to write them up privately, and then later post to a wiki, a year or so later when the patent is published.  I want to write them up in the wiki once, and then have them go public when allowed.&lt;br /&gt;&lt;br /&gt;Setting up a private wiki is a pain, because wikidmin wants to be shared.  It's a pain to have  to propagate changes made  to a template on a public wiki to a private wiki, and vice versa.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;This became annoying enough today that I considered switching to a CMS like Drupal or Joomla, from mediawiki.&lt;br /&gt;&lt;br /&gt;Transferring the mediawiki content will be a hassle.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;I''m starting to dream of a file based wiki.  Each wiki page, or attachment, living in a filesystem.   OS filesystem security and ACLs.  Webdav if appropriate.  Wikiserver running as OS user accounts - requiring setuid CGI scripts or the equivalent.&lt;br /&gt;&lt;br /&gt;The content files being separate from the wiki web presentation engine.  E.g. one might have a CMS-like web system, or a wiki system, that interpret the files separately.  In which case changing the CMS/wiki engine is just changing the scripts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7762299530707286811?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7762299530707286811/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7762299530707286811' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7762299530707286811'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7762299530707286811'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/05/iso-acls-for-wiki-considering-drupal.html' title='ISO ACLs for Wiki; considering Drupal'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-5109492614157626579</id><published>2011-05-11T13:27:00.000-07:00</published><updated>2011-05-11T13:27:59.873-07:00</updated><title type='text'></title><content type='html'>Love it!!!!!&lt;br /&gt;&lt;br /&gt;http://www.math.utah.edu/~palais/pi.html&lt;br /&gt;&lt;br /&gt;Pi is wrong.&lt;br /&gt;&lt;br /&gt;Not mathematically - but observes that setting some other symbol to be what we now call 2pi or 6.28... leads to many simplifications in formulae, many fewer factors of 2 - and probably far fewer errors.&lt;br /&gt;&lt;br /&gt;The astounding thing is how recent the modern use of pi=3.14... emerged.  Of  course, mathematicians back to the Greeks knew the concept, but the convention arose only in the 18th century, and stuck when Euler used it, borrowing it from a much less prominent Welsh mathematician. (I want to say "an obscure Welsh mathematician, William Jones, but I'll get in trouble for that.)  Euler apparently previously used p/c, where  p is the periphery and c the radius of any circle.&lt;br /&gt;&lt;br /&gt;I'm just piling on to a flash mob, and I'm arriving late.  But, what the heck.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-5109492614157626579?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.math.utah.edu/~palais/pi.html' title=''/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/5109492614157626579/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=5109492614157626579' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/5109492614157626579'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/5109492614157626579'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/05/love-it-httpwww.html' title=''/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-6968323412457040531</id><published>2011-04-18T20:47:00.001-07:00</published><updated>2011-04-18T20:47:53.216-07:00</updated><title type='text'>Potential customers  for tagged memory</title><content type='html'>http://semipublic.comp-arch.net/wiki/Potential_customers_for_tagged_memory&lt;br /&gt;&lt;br /&gt;There are many potential customers for [[tagged memory]] or [[memory metadata]].&lt;br /&gt;I have long maintained a list (or, rather, lists, since I have had to recreate these lists on 3 occasions)&lt;br /&gt;of uses for tagged memory.&lt;br /&gt;Here is a list that is by no means complete (previous versions of this list have exceeded 20 items).&lt;br /&gt;&lt;br /&gt;Which should be allocated the tags?&lt;br /&gt;Obviously, my favorite idea...&lt;br /&gt;&lt;br /&gt;Glew opinion:  tagged memory should be microarchitecture, not macroarchitecture.  If there are physical tags, it is a good idea to be able to use them to speed things up.  But there should always be a fallback proposal that does not depend on having main memory tags.  See Milo Martin's Hardbound for an example of how to have metadata in memory, without tags.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;* Tagged memory to identify data types - integer, floating point ...  &lt;br /&gt;:* Problem: there is an infinite number of types ...&lt;br /&gt;:* 1..n bits. Per qword?  Maybe per byte?&lt;br /&gt;* Tagged memory to aid in [[garbage collection]]. [[Read and write barriers]].&lt;br /&gt;:* 1 bit (or more) per minimum allocation unit, 1-4 words&lt;br /&gt;* Tagged memory to aid security, e.g. to create [[non-forgeable pointers]] as in the IBM AS400&lt;br /&gt;:* This may well be my favorite use of only a single bit of tagged memory.&lt;br /&gt;:* 1 bit per pointer - 64 or 128 bits&lt;br /&gt;* Tagged memory for [[taint or poison propagation]], as in [[DIFT (Dynamic Information Flow Tracking)]]. E.g. Raksha.&lt;br /&gt;:* 1 bit per ... word? byte?&lt;br /&gt;:* Raksha uses multiple bits&lt;br /&gt;* Tagged memory for [[transactional memory]] support&lt;br /&gt;:* 1 bit per ...  many TM systems use cache line granularity, although smaller better&lt;br /&gt;* Tagged memory for [[debugging]]&lt;br /&gt;:* 1 bit per ... can deal with coarse granularity, e.g. by trapping, determining not interested, and going on.&lt;br /&gt;* Tagged memory for [[performance monitoring]] - e.g. to mark all memory accessed in a time interval.&lt;br /&gt;:* 1 bit per cache line&lt;br /&gt;* Tagged memory for [[uninitialized memory]]&lt;br /&gt;:* 1 bit per ... byte? word? &lt;br /&gt;* Tagged emory for [[synchronization]] of parallel programs.  E.g. [[full/empty bits]].&lt;br /&gt;* Fine grain [[copy-on-write]] data structures&lt;br /&gt;** [[Myrias style parallelism]]&lt;br /&gt;:* 1 bit per ... byte? word? ...&lt;br /&gt;&lt;br /&gt;Heck, I suppose that conventional width oriented ECC can be considered a form of tagged memory.  And although ECC is common, at least at the time of writing it is not in 99% of all PCs.  It causes problems enough to warrant the invention of [[Poor Man's ECC]], a non-width oriented implementation of ECC that avoids physical tagged memory.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-6968323412457040531?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Potential_customers_for_tagged_memory' title='Potential customers  for tagged memory'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/6968323412457040531/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=6968323412457040531' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6968323412457040531'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6968323412457040531'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/04/potential-customers-for-tagged-memory.html' title='Potential customers  for tagged memory'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7360209062919769901</id><published>2011-04-18T20:37:00.001-07:00</published><updated>2011-04-18T20:37:49.571-07:00</updated><title type='text'>Wide operand cache</title><content type='html'>http://semipublic.comp-arch.net/wiki/Wide_operand_cache&lt;br /&gt;= By Microunity =&lt;br /&gt;&lt;br /&gt;The [[wide operand cache]] is a concept originally by [[Microunity]] =&lt;br /&gt;&lt;br /&gt;Situation:&lt;br /&gt;* you want to design a RISC machine, with "naturally 32 or 64 bit regissters.  &lt;br /&gt;** or even 1024 bit registers - no matter how wide you go, there will nearly always be a reason to have significantly wider operands.&lt;br /&gt;* but you also want to support "simple" instructions that just happen to have very wide operands.&lt;br /&gt;&lt;br /&gt;See [[#Examples of instructions with wide operands]]. &lt;br /&gt;For  the purposes of this discussion, we will consider [[BMM (bit matrix multiply)]].&lt;br /&gt;BMM wants to have one, say, 64 bit operand, and a second operand, the matrix, that is 64x64 bits in size - 4Kib.&lt;br /&gt;&lt;br /&gt;Defining an instruction such as&lt;br /&gt; BMM destreg64b := srcreg64b * srcreg4x64b&lt;br /&gt;could be possible - but it has many issues:&lt;br /&gt;* how many 64x64=4Kib operand registers do you want to define?  1? 2? 4?&lt;br /&gt;* you probably do not want to copy such a wide operand around - instead, you might want it to live inside its execution unit&lt;br /&gt;etc.&lt;br /&gt;&lt;br /&gt;The [[wide operand cache]] approach is as follows: define an instruction with a memory operand&lt;br /&gt; BMM destreg64b := srcreg64b * M[...]&lt;br /&gt;* You may spercify the memory operand simply as register direct, M[reg], or you may define it using a normal addressing mode such as M[basereg+offset], or even using scaled indexing.&lt;br /&gt;&lt;br /&gt;Conceptually, the wide operand is loaded before every use:&lt;br /&gt; BMM destreg64b := srcreg64b * M[addr]&lt;br /&gt;   tmp64x64b := load M[addr]&lt;br /&gt;   destreg64b := srcreg64b * tmpreg64x64b&lt;br /&gt;&lt;br /&gt;However, we may avoid unnecessary memory traffic by caching the wide operand.&lt;br /&gt;&lt;br /&gt;This leads to the basic issue: [[#TLB semantics versus coherent|]].&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= TLB semantics versus coherent =&lt;br /&gt;&lt;br /&gt;The [[wide operand cache]] concept admits a basic question:&lt;br /&gt;* do you snoop the wide operand cache, keeping it coherent with main memory, so that if somebody writes to a cached wide operand it is reflected in the next execution of the instruction?&lt;br /&gt;* or do you use "[[TLB semantics]]", i.e.  making it a [[noncoherent cache]], requiring the user to explicitly invalidate it (requiring an [[invalidate wide operand cache(s) instruction]].&lt;br /&gt;&lt;br /&gt;If coherent then an implementation can vary the number of wide operand cache entries transparently.&lt;br /&gt;&lt;br /&gt;If noncoherent then not only can the number of entries be detected, but also you run the risk of getting different answers depending on the context switching rate.&lt;br /&gt;&lt;br /&gt;[[Glew opinion]]: I prefer coherent, but would live with noncoherent if necessary.&lt;br /&gt;&lt;br /&gt;= So Close... =&lt;br /&gt;&lt;br /&gt;Frustrating anecdote: at AMD I was working, with Alex Klaiber, out how to support instructions like with very wide operands, such&lt;br /&gt;as [[BMM]].  My whiteboard was full of scribblings, with the basic question of [[#TLB semantics versus coherent|]].&lt;br /&gt;&lt;br /&gt;I then went to a meeting where Microunity presented their patents.&lt;br /&gt;&lt;br /&gt;So close...  Actually, probably off by 5-10 years.  But I was following a path that I did not know had been trailblazed...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= Examples of instructions with wide operands =&lt;br /&gt;Any instruction that has an operand that is significantly wider than some of its inputs,&lt;br /&gt;and which either tends to be constant, or which tends to be modified in place,&lt;br /&gt;is a candidate for a [[wide operand in memory]] implemented via a [[wide operand cache]].&lt;br /&gt;For that matter, one could have wide operand instructions whose operands are all pseudo-regosters in a wide operand cache.&lt;br /&gt;* [[BMM (bit matrix multiply)]] - 64b X 64x64b&lt;br /&gt;* [[vector-matrix instructions]] - N X NxN&lt;br /&gt;* [[permutation index vector instruction]] - Nbits*log2(Nbits)&lt;br /&gt;* [[permutation bit matrix instruction]] - really a form of [[BMM]]&lt;br /&gt;* [[superaccumulator]] - 32 bit - thousands ...&lt;br /&gt;* [[regex instructions]] - with large regex operands that can be compiled&lt;br /&gt;* [[LUT (lookup table) instructions]]&lt;br /&gt;* [[texture sampling]]&lt;br /&gt;* [[interpolation instructions]]&lt;br /&gt;* [[CAM instructions]]&lt;br /&gt;&lt;br /&gt;TBD&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7360209062919769901?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Wide_operand_cache' title='Wide operand cache'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7360209062919769901/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7360209062919769901' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7360209062919769901'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7360209062919769901'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/04/wide-operand-cache.html' title='Wide operand cache'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-6504811479818128769</id><published>2011-04-09T12:09:00.000-07:00</published><updated>2011-04-09T12:10:10.024-07:00</updated><title type='text'>New models for Industrial Research (in reply to: The death of Intel Labs and what it means for industrial research)</title><content type='html'>Matt Walsh's post on "The death of Intel Labs and what it means for industrial research" must have struck a nerve with me, because I have spent a morning writing a long response.&lt;br /&gt;&lt;br /&gt;&lt;h1&gt;BRIEF:&lt;/H1&gt;Intel's lablets have been shut down, not the labs. I helped started Intel's labs, but not the lablets. It's not clear how effective the lablets ever were. Same for the labs.  I discuss models for research, including&lt;br /&gt;&lt;br /&gt;(1) Academia  far-out, and industry close-in  (nice if it were true)&lt;br /&gt;&lt;br /&gt;(2) Google's 20%&lt;br /&gt;&lt;br /&gt;(3) IBM and HP's business group motivated research labs&lt;br /&gt;&lt;br /&gt;(4) Some of my experience from Intel, in both product groups and research labs&lt;br /&gt;&lt;br /&gt;(5) Open Source  (if ever I can retire I would work on Open Source.  But I have not yet managed to find a job that allowed me to work on Open Source.)&lt;br /&gt;&lt;br /&gt;I think my overall point is that each of these models works, sometimes - and each is subject to herd mentality, deference to power, etc. Perhaps there is room for new ways of doing research, invention, and invention - a new business model.&lt;br /&gt;&lt;br /&gt;Finally, I mention briefly, providing links to quotes, Intellectual Venture's website. With a disclaimer saying that I don't speak for IV, although obviously I have hope for its potential since I left Intel to join IV.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h1&gt;DETAIL:&lt;/H1&gt;&lt;em&gt;Intel recently announced that it is closing down its three "lablets"&lt;br /&gt;in Berkeley, Seattle, and Pittsburgh&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;So it goes.  This might be unfortunate.  None of the lablet work in my&lt;br /&gt;field, computer architecture, has caught my eye, although I did enjoy&lt;br /&gt;interacting with Todd Mowry's group in Pittsbugh on Log Basse&lt;br /&gt;Architecture (I had come up with Log Based Microarchitecure at&lt;br /&gt;Wisconsin).&lt;br /&gt;&lt;br /&gt;However, it is wrong to say that the lablets reflects the death of&lt;br /&gt;Intel Labs.  I was involved with the creation of Intel Labs, circa&lt;br /&gt;1995 inside Intel.&lt;br /&gt;&lt;br /&gt;This was historically a hard sell, since Intel had been *created* by&lt;br /&gt;refugees from the research labs of other companies.  It was a&lt;br /&gt;touchstone of Intel culture that Intel would never do ivory tower&lt;br /&gt;research not relevant to product groups.&lt;br /&gt;&lt;br /&gt;E.g. while campaigning for the creation of Intel Labs I created a&lt;br /&gt;slideset that said "Intel must start doing our own research in&lt;br /&gt;computer architecture, now that we have copied all of the ideas from&lt;br /&gt;older companies."  I am not sure, but it seems like Craig Barrett may&lt;br /&gt;have seen these slides when he was quoted in the Wall Street Journal&lt;br /&gt;&lt;em&gt;''Now we're at the head of the class, and there is nothing left to&lt;br /&gt;copy,'' Mr. Barrett was quoted as having said.&lt;/em&gt; &lt;br /&gt;&lt;br /&gt;(Ironically, DEC used this to justify their patent infringement&lt;br /&gt;lawsuit against Intel circa 1997 -- but when I created these slides I&lt;br /&gt;had IBM in mind, not DEC, since I freely admit that much of my work at&lt;br /&gt;Intel was built upon a foundation of IBM work on RISC and Tomasulo&lt;br /&gt;out-of-order.  Not DEC Alpha. Perhaps I should never have created&lt;br /&gt;those slides, but it put the case pithily, it helped justify the&lt;br /&gt;creation of Intel Labs.  And *I* did not quote them to the WSJ.)&lt;br /&gt;&lt;br /&gt;Ref: http://query.nytimes.com/gst/fullpage.html?res=9F02E3D61F39F937A25756C0A961958260&amp;pagewanted=2&lt;br /&gt;&lt;br /&gt;Matt says "Before the Labs opened, Intel Research was consistently&lt;br /&gt;ranked one of the lowest amongst all major technology companies in&lt;br /&gt;terms of research stature and output."  Well, yes and no.  The lablets&lt;br /&gt;opened in 2001. MRL, the Microprocessor Research Lab I helped start,&lt;br /&gt;opened in 1995, as did some of the other labs.  When I was at AMD in&lt;br /&gt;2002-2004 my AMD coworkers were already sating to me that Intel's MRL&lt;br /&gt;work was the most interesting work being published in computer&lt;br /&gt;architecture conferences like ISCA, HPCA and Micro.  I.e. I think MRL&lt;br /&gt;was picking up steam well before the lablets were created.&lt;br /&gt;&lt;br /&gt;Actually, the lablets were part of a trend to "academify" Intel&lt;br /&gt;Labs. E.g. around that time my old lab MRL was taken over by a famous&lt;br /&gt;professor imported from academia, who proceeded to do short term work&lt;br /&gt;on the Itanium - and over the next few years most of the senior&lt;br /&gt;researchers who did not agree with Itanium left or were forced out.&lt;br /&gt;Ironically, the academic created a much shorter term focus at MRL by&lt;br /&gt;betting on VLIW - and then ultimately he moved out of Intel.&lt;br /&gt;&lt;br /&gt;Now, don't get me wrong: the guys left over the lab formerly known as&lt;br /&gt;MRL do good work.  Chris Wilkerson has published lots of good&lt;br /&gt;papers. Jared Stark accomkplished the most successful technology&lt;br /&gt;transfer I am aware of, of branch prediction to SNB.  Chris and Jared&lt;br /&gt;are largely the guys whose work my former AMD coworkers admired.&lt;br /&gt;&lt;br /&gt;But, such work at Intel is largely incremental, evolutionary.  I&lt;br /&gt;mentioned that the famous professor in charge of my old research lab&lt;br /&gt;tried to play politics by favoring Itanium, even though his&lt;br /&gt;researchers were opposed. &lt;br /&gt;&lt;br /&gt;Annoyingly, from when I joined Intel in 1991 to when Intel Labs&lt;br /&gt;started in 1995 computer architecture work inside Intel was pretty&lt;br /&gt;much 10 years ahead of academia.  Out-of-order execution like P6 did&lt;br /&gt;not come from mainstream academia, who were busy following the fads&lt;br /&gt;RISC and in-order.  (OOO came from at that time not mainstream&lt;br /&gt;academia like Yale Patt and Wen-Mei Hwu, but they became maiinstream&lt;br /&gt;as OOO became successful.)&lt;br /&gt;&lt;br /&gt;But companies like Intel rest on their laurels. Having defeated RISC,&lt;br /&gt;Intel did not need to do any serious computer architecture work for,&lt;br /&gt;what, 10 years? 16 years now?&lt;br /&gt;&lt;br /&gt;Matt says: &lt;em&gt;I am very concerned about what happens if we don't have&lt;br /&gt;enough long-range research. One model that could evolve is that&lt;br /&gt;universities do the far-out stuff and industry focuses on the shorter&lt;br /&gt;term. &lt;br /&gt;It is hard to justify the Bell Labs model in today's world,&lt;br /&gt;though no doubt it had tremendous impact.&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;I share your concern.  But my experience is that universities are not&lt;br /&gt;necessarily good at doing the far-out stuff.&lt;br /&gt;&lt;br /&gt;About my experience: I'm not an academic, although perhaps I should&lt;br /&gt;have been one. I failed to complete my Ph.D. twice (first marriage,&lt;br /&gt;then when my daughter got born).  I've never had an academic paper&lt;br /&gt;published (although I had a few rejected, that later got built in&lt;br /&gt;successful products).  But I made some useful contributions to the&lt;br /&gt;form of OOO that is in most modern computers.  You are almost 100%&lt;br /&gt;likely to have used some of my stuff, probably in the computer&lt;br /&gt;you are reading this on. At one time I was Intel's most prolific&lt;br /&gt;inventor. I helped start Intel Labs.  &lt;br /&gt;&lt;br /&gt;I wasted too many years of my life on what I think is the next major&lt;br /&gt;step forward in computer architecture to improve single threaded&lt;br /&gt;execution - speculative multithreading (SpMT). I say "wasted" not&lt;br /&gt;because I think that SpMT is a bad idea, but because I spent far too&lt;br /&gt;many of those years seeking funding and approval, rather than just&lt;br /&gt;doing the work.  The actual work was only a few intense months,&lt;br /&gt;embedded in years of PowerPoint and poltics.  But even though SpMT has&lt;br /&gt;not proven a success yet, a spin-off idea, Multi-cluster&lt;br /&gt;Multithreading (MCMT), the substrate that I wanted to build SpMT on,&lt;br /&gt;is the heart of AMD's next flagship processor, Bulldozer. 7 years&lt;br /&gt;after I left AMD in 2004.  13+ years after I came up with the idea of&lt;br /&gt;MCMT, at Wisconsin during my second failed attempt to get a PhD.&lt;br /&gt;&lt;br /&gt;My last major project at Intel 2005-2009 has not yet seen the light of day,&lt;br /&gt;but newsblurbs such as &lt;em&gt;Intel developing security 'game-changer':&lt;br /&gt;Intel CTO says new technology will stop zero-day attacks in their&lt;br /&gt;tracks&lt;/em&gt; suggest that it may.&lt;br /&gt;&lt;br /&gt;Source: http://www.computerworld.com/s/article/9206366/Intel_developing_security_game_changer_&lt;br /&gt;&lt;br /&gt;But in my last year at Intel, this major project, and a couple of&lt;br /&gt;minor projects, were cancelled under me, until I was forced to work on&lt;br /&gt;Larrabee, a project that I was not quite so opposed to as I was to&lt;br /&gt;Itanium. Enough being enough, I left.&lt;br /&gt;&lt;br /&gt;So: I am not an academic, but I have worked at, and hope to remain&lt;br /&gt;working at, the leading edge of technology.  I have tried to create&lt;br /&gt;organizations and teams that do leading edge research, like MRL, but I&lt;br /&gt;am more interested in doing the work myself than in being a manager.&lt;br /&gt;&lt;br /&gt;The question remains: where do we get the ideas for the future?  How&lt;br /&gt;do we fund research, invention, and innovation?&lt;br /&gt;&lt;br /&gt;Matt says &lt;em&gt; One model that could evolve is that universities do the&lt;br /&gt;far-out stuff and industry focuses on the shorter term.&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;"Could evolve"?  Believe me, this is what every academic research&lt;br /&gt;grant proposal said that I saw when I sat on an Intel committee giving&lt;br /&gt;out research grants.  It usually doesn't work - although once in a while it does.&lt;br /&gt;&lt;br /&gt;Matt says: &lt;em&gt;Google takes a very different approach, one in which&lt;br /&gt;there is no division between "research" and "engineering."&lt;/em&gt; This is&lt;br /&gt;an interesting approach. Myself, I am not a very good multi-tasker - I&lt;br /&gt;tend to work intensely on a problem for weeks at a time.  I don't know&lt;br /&gt;how well I could manage 20% time, 1 day a week, for new projects.&lt;br /&gt;(Although I am supposedly doing something like this for my current&lt;br /&gt;job, the 90% main job tends to expand to fill all time available to&lt;br /&gt;it.)  But it may work for some people.&lt;br /&gt;&lt;br /&gt;Somebody else posts about industry research: &lt;em&gt;IBM Research and HP&lt;br /&gt;Labs don't really have an academic research mindset and haven't for a&lt;br /&gt;long time thanks to business unit based funding.&lt;/em&gt; Then goes on to say&lt;br /&gt;&lt;em&gt;Even within Intel Research, successful researchers (in terms of&lt;br /&gt;promotion beyond a certain key point) also had to have some kind of&lt;br /&gt;significant internal impact.&lt;/em&gt; Which is, I suspect, why the famous&lt;br /&gt;professor who ran MRL after I left emphasized the VLIW dead-end,over&lt;br /&gt;the objections of his senior researchers.&lt;br /&gt;&lt;br /&gt;My own vision for Intel's Microprocessor Research Labs that the best&lt;br /&gt;technology transfer is by transferring people.  You can't throw&lt;br /&gt;research over the wall and expect a product group to use it. Instead,&lt;br /&gt;I wanted to have people flow back andforth between product groups and&lt;br /&gt;research. In part I wanted to use the labs as an R&amp;R stop for smart&lt;br /&gt;people in the product groups - give them a place to recharge their&lt;br /&gt;batteries after an exhausting 5 to 7 year product implementation.  A&lt;br /&gt;place to create their own new ideas, and/or borrow ideas from the&lt;br /&gt;academics they might interact with in the labs.  And then go back to a&lt;br /&gt;product group, perhaps dragging a few of the academics along with&lt;br /&gt;them, when they align with the start of a new project. "Align with the&lt;br /&gt;start of a new project" - this is important.  Sometimes a project&lt;br /&gt;finishes, and there is no new project for the smart guys coming off&lt;br /&gt;the old project to join, because of the vagaries of project schedules.&lt;br /&gt;All too often people jump ship off an old project too early, because&lt;br /&gt;they want to get onto the sexy new project at the right time for their&lt;br /&gt;career growth.  By providing such a scheduling buffer, this thrashing&lt;br /&gt;may be avoided - and the even worse happenstance, when a smart guy&lt;br /&gt;leaves the company, because there is no new project for him at his&lt;br /&gt;current employer, while there is at the new company.  And, finally,&lt;br /&gt;once in a while a new project flows out of the lab.&lt;br /&gt;&lt;br /&gt;I am particularly sympathetic to the Anonymous poster who said &lt;em&gt;How&lt;br /&gt;about a totally different alternative model?&lt;/em&gt; and then talks about&lt;br /&gt;memes popular in the Open Source community, such as &lt;em&gt;People do not&lt;br /&gt;need to spend half of their life in formal schooling to start doing&lt;br /&gt;cutting edge work.&lt;/em&gt;  But then he says:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Most academic research outside of the top 5-10 schools in any field&lt;br /&gt;is not useful, even by academic standards.&lt;/em&gt; I go further: MOST&lt;br /&gt;academic work at MOST schools, even the top 5-10 schools, is not&lt;br /&gt;useful.  But oftyen the best academic work is at some little known&lt;br /&gt;third or fouth tier school, and has trouble getting published.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Code is (far) more useful than papers.&lt;/em&gt; I am very sympathetic to&lt;br /&gt;this.  But (a) most engineers and programmers are not free to publish&lt;br /&gt;either code or papers, limited by their employment agreements. And&lt;br /&gt;occasionaly (b) the papers are a useful summary of the good ideas.&lt;br /&gt;&lt;br /&gt;I look forward to the day when we can have Open Source computer&lt;br /&gt;hardware.  I don't say this facetiously: some of my best friends are&lt;br /&gt;working on it.  I would also, if I did not need an income.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Many of the people capable of contributing at a high level in&lt;br /&gt;academia have the ability to start significant companies and create&lt;br /&gt;genuine wealth.&lt;/em&gt; Many, but not most.  Not every good technical person&lt;br /&gt;is a good business person.&lt;br /&gt;&lt;br /&gt;Which leads in to my closing point: Not every good technical person is&lt;br /&gt;a good business person.  Not every inventor is capable of building a&lt;br /&gt;company around his investions.  Many of the most useful inventions&lt;br /&gt;cannot justify a completely new and independent company: they need the&lt;br /&gt;ecosystem of an existing product line, and the support of a larger&lt;br /&gt;organization.&lt;br /&gt;&lt;br /&gt;For many years my resume said that my gioal was to re-create Thomas&lt;br /&gt;Edison's Invention Factory in the modern world.  In 2009 I left Intel&lt;br /&gt;for the second time, and joined Intellectual Ventures. (With whom I&lt;br /&gt;had earlier worked on some inventions I had made in 2004, in the short&lt;br /&gt;time between my leaving AMD and rejoining Intel, the only time in my&lt;br /&gt;career that my ideas belonged to me, and not my employer.)&lt;br /&gt;&lt;br /&gt;I'm not authorized to speak for Intellectual Ventures, but I can refer&lt;br /&gt;you to some of the things on the IV website,&lt;br /&gt;http://www.intellectualventures.com:&lt;br /&gt;&lt;br /&gt;“An industry dedicated to financing inventors and monetizing their&lt;br /&gt;creations could transform the world.”  Nathan Myhrvold, Founder and&lt;br /&gt;CEO   &lt;br /&gt;&lt;br /&gt;&lt;ul&gt;We believe ideas are valuable. At Intellectual Ventures, we invest    both expertise and capital in the development of inventions. We    collaborate with leading inventors and partner with pioneering    companies. ...     We are creating an active market for invention... We do this by:     * Employing talented inventors here at Intellectual Ventures who work      on new inventions to help solve some of the world’s biggest        problems.         * Purchasing inventions from individual inventors and businesses ...         * Partnering with our international network of more than 3,000          inventors and helping them to monetize their inventions. &lt;/ul&gt;&lt;br /&gt;Most of posters, the original blogger and the authors of the comments,&lt;br /&gt;on this topic are interested in promoting research, invention, and&lt;br /&gt;innovation.  Sometimes you need to create a new economic or business&lt;br /&gt;model.  I hope it works.&lt;br /&gt;&lt;br /&gt;Matt says: &lt;em&gt;It is hard to justify the Bell Labs model in today's&lt;br /&gt;world, though no doubt it had tremendous impact.&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Somebody else once said to me that Bell Labs could have been kept running, avoiding its decline, based solely on its patent royalties.&lt;br /&gt;&lt;br /&gt;Would that have been worth it?  These were the people that gave us the&lt;br /&gt;transistor. Information Theory. UNIX.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Finally, I must make the following disclaimer:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The content of this message is my personal opinion only. Although I am&lt;br /&gt;an employee (currently of Quantum Intellectual Property Services,&lt;br /&gt;working for Intellectual Ventures; in the past of companies such as&lt;br /&gt;Intel, AMD, Motorola, and Gould), I reveal this only so that the&lt;br /&gt;reader may account for any possible bias I may have towards my&lt;br /&gt;employer's products. The statements I make here in no way represent my&lt;br /&gt;employer's position on the issue, nor am I authorized to speak on&lt;br /&gt;behalf of my employer.&lt;/b&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-6504811479818128769?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://matt-welsh.blogspot.com/2011/04/death-of-intel-labs-and-what-it-means.html' title='New models for Industrial Research (in reply to: The death of Intel Labs and what it means for industrial research)'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/6504811479818128769/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=6504811479818128769' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6504811479818128769'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6504811479818128769'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/04/new-models-for-industrial-research-in.html' title='New models for Industrial Research (in reply to: The death of Intel Labs and what it means for industrial research)'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-3456979555875189705</id><published>2011-04-05T19:41:00.000-07:00</published><updated>2011-04-05T19:41:50.440-07:00</updated><title type='text'>hy_saying_"You_need_N-way_associativity_in_the_TLB_to_guarantee_forward_progress"_is_bogus_and_reflective_of_an_in-order_mindset</title><content type='html'>http://semipublic.comp-arch.net/wiki/Why_saying_%22You_need_N-way_associativity_in_the_TLB_to_guarantee_forward_progress%22_is_bogus_and_reflective_of_an_in-order_mindset&lt;br /&gt;[[Category:Virtual Memory]]&lt;br /&gt;&lt;br /&gt;= The Annoying Quote =&lt;br /&gt;&lt;br /&gt;One occasionally hears (and reads in computer architecture textbooks) statements such as&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;"Instruction set XXX can access N memory locations in a single instruction, so therefore requires a TLB with at least N-way associativity."&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;This statement is wrong in several ways, both as an underestimate and as an overestimate.&lt;br /&gt;It reflects ignorance of out-of-order processors, and even for in-order processors it reflects ignorance of other implementations.&lt;br /&gt;&lt;br /&gt;= Basic Assumption =&lt;br /&gt;&lt;br /&gt;The basic assumption begind this statement is something like this:&lt;br /&gt;&lt;br /&gt;Assume you have an instruction whose operation may be described in pseudocode or microcode as:&lt;br /&gt;&lt;blockquote&gt;tL1 := load(M[A1])&lt;br /&gt;...&lt;br /&gt;tLN := load(M[AN])&lt;br /&gt;&lt;br /&gt;tC1 := f1(tL1..tLN)&lt;br /&gt;...&lt;br /&gt;tCN := fN(tL1..tLN)&lt;br /&gt;&lt;br /&gt;store( M[A1] := tC1 )&lt;br /&gt;...&lt;br /&gt;store( M[AN] := tCN )&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Assume that you have a [[restart versus resume instruction exception|restart and not a resume instruction fault architecture]] - i.e. assume that you must access all of M[A1] .. M[AN], eventually, without a fault or TLB miss.&lt;br /&gt;Then you need N TLB entries to hold all of the [[translation]]s for M[A1]..M[A2].&lt;br /&gt;&lt;br /&gt;Or, equivalently, assume that you are not allowed to take a fault or TLB miss in the "[[commit phase]]" of the instruction.&lt;br /&gt;Then, once again, you need N TLB entries to hold all of the [[translation]]s for M[A1]..M[A2].&lt;br /&gt;&lt;br /&gt;Sounds simple, eh?&lt;br /&gt;&lt;br /&gt;= Out-of-order with Speculative TLB miss handling =&lt;br /&gt;&lt;br /&gt;This betrays an in-order mindset.  It does not necessarily work on an [[out-of-order]] machine that does not block on a TLB miss,&lt;br /&gt;i.e. which can perform ALU operations, memory references, and TLB misses out-of-order.&lt;br /&gt;&lt;br /&gt;It doesn't work because, even though an operation such as load, store, or a [[tickle]] in the pseudocode  or microcode for an instruction may load an TLB entry,&lt;br /&gt;this TLB entry may be thrashed out of the TLB by (a) a later operation in same instruction, (b) an earlier operation in the same  instruction (remember, out-of-order), or (c) a TLB use from a different instruction (remember, out-of-order and non-blocking: other instructions may be executing at the same time).&lt;br /&gt;&lt;br /&gt;In particular, not that there is northing that says that the operations within a single instruction will be performed in-order --- and, indeed, on an out-of-order machine like the Intel P6 where the microinstructions within an instruction were performed out-of-order - so you can't necessarily make  assumptions about the order of accesses, how they will affect LRU, etc.&lt;br /&gt;&lt;br /&gt;== Kluges to Make It Work ==&lt;br /&gt;&lt;br /&gt;* Allow out-of-order between instructions, but impose ordering restrictions within an instruction - e.g. by implementing every instruction with a strictly in-ordr state machine.&lt;br /&gt;&lt;br /&gt;* TLB misses in-order, at commit time&lt;br /&gt;&lt;br /&gt;= Other Implementations =&lt;br /&gt;&lt;br /&gt;== Save Translations ==&lt;br /&gt;Allow a "translation" to be saved in a register.&lt;br /&gt;&lt;blockquote&gt;tT1 := save_translation_and_permission(M[A1])&lt;br /&gt;...&lt;br /&gt;tTN := save_translation_and_permission(M[AN])&lt;br /&gt;&lt;br /&gt;tL1 := load_using_saved_translation_and_permission(M[phys_tr=tT1])&lt;br /&gt;...&lt;br /&gt;tLN := load_using_saved_translation_and_permission(M[phys_tr=tTN])&lt;br /&gt;&lt;br /&gt;tC1 := f1(tL1..tLN)&lt;br /&gt;...&lt;br /&gt;tCN := fN(tL1..tLN)&lt;br /&gt;&lt;br /&gt;store_using_saved_translation_and_permission( M[phys_tr=tT1] := tC1 )&lt;br /&gt;...&lt;br /&gt;store_using_saved_translation_and_permission( M[phys_tr=tTN] := tCN )&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Issues: &lt;br /&gt;* such a "saved translation" should contain not only the physical address corresponding to a virtual address, but also permissions.&lt;br /&gt;* it is relatively easy to provide such an operation to microcode. It is harder to make it available to software.&lt;br /&gt;** it obviously cannot be provided to user code&lt;br /&gt;** virtualizing such a saved translation may be a challenge - even the OS may not be allowed to see the true physical address or permissions&lt;br /&gt;&lt;br /&gt;== Who Cares? So What? ==&lt;br /&gt;&lt;br /&gt;One could take the viewpoint of "Who Cares?": allow multiple TLB misses in the same instruction, don't restart.&lt;br /&gt;&lt;br /&gt;Issue:&lt;br /&gt;* there arises the possibility of [[intra-instruction translation inconsistency]] - different accesses to the same virtual address may receive different translations, different physical addresses, or, perhaps worse, a check may pass but a subsequent access fault.&lt;br /&gt;&lt;br /&gt;Again, one may say "Who cares? The OS should not be changing translations while an instruction may be in flight."&lt;br /&gt;&lt;br /&gt;But&lt;br /&gt;* Saying that the OS should not do something does not always mean that it will&lt;br /&gt;* An OS implemented on top of a VMM may lead to issues: the VMM may not be tracking where the OS keeps its page tables&lt;br /&gt;* While this strategy may be acceptable, it may have lousy performance because of the necessity of stopping multiple processors for a [[TLB shootdown]] while changing a translation, e.g. in a [[page table]] in memory.&lt;br /&gt;&lt;br /&gt;= Conclusion =&lt;br /&gt;&lt;br /&gt;Saying&lt;br /&gt;&lt;blockquote&gt;"Instruction set XXX can access N memory locations in a single instruction, so therefore requires a TLB with at least N-way associativity."&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;* underestimates the TLB entries required for an out-of-order processor with non-blocking TLB miss handling&lt;br /&gt;&lt;br /&gt;* overestimates the TLB entries required for several reasonable implementation strategies, such as "Save Translations" and "Who Cares?"&lt;br /&gt;&lt;br /&gt;More  accurately, one might say&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;"Instruction set XXX can access N memory locations in a single instruction, &lt;br /&gt;and for a certain set of microarchitecture assumptions&lt;br /&gt;may require a TLB with at least N-way associativity."&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;But there are other ways...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-3456979555875189705?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Why_saying_%22You_need_N-way_associativity_in_the_TLB_to_guarantee_forward_progress%22_is_bogus_and_reflective_of_an_in-order_mindset' title='hy_saying_&quot;You_need_N-way_associativity_in_the_TLB_to_guarantee_forward_progress&quot;_is_bogus_and_reflective_of_an_in-order_mindset'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/3456979555875189705/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=3456979555875189705' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3456979555875189705'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/3456979555875189705'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/04/hysayingyouneedn-wayassociativityinthet.html' title='hy_saying_&quot;You_need_N-way_associativity_in_the_TLB_to_guarantee_forward_progress&quot;_is_bogus_and_reflective_of_an_in-order_mindset'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-6679106806336284243</id><published>2011-04-01T23:03:00.000-07:00</published><updated>2011-04-01T23:03:24.195-07:00</updated><title type='text'>Reset: Hard, Soft, Cold, Warm</title><content type='html'>http://semipublic.comp-arch.net/wiki/Reset:_Hard,_Soft,_Cold,_Warm&lt;br /&gt;= The need for RESET after  REST at power-on =&lt;br /&gt;&lt;br /&gt;In the beginning there was [[RESET]]: a signal asserted while powering on, deasserted when power and logic levels were stable, so that the circuit could could be initialized.&lt;br /&gt;&lt;br /&gt;Soon thereafter, or even before, there was POWEROK:&lt;br /&gt;* !POWEROK and RESET =&gt; do not attempt operation&lt;br /&gt;* POWEROK and RESET =&gt; power good, now do initialization&lt;br /&gt;* POWEROK and !RESET =&gt; initialization done, running.&lt;br /&gt;&lt;br /&gt;But let us not get obsessed by the details of signalling: I'm okay, you're okay, I've initialized and you are ready.&lt;br /&gt;Moving forward: ...&lt;br /&gt;&lt;br /&gt;Eventually people realized that they wanted to be able to restore the state of the system to that right after RESET, without going through the full power on sequence.&lt;br /&gt;Hence the concept of [[soft reset]] was born.&lt;br /&gt;On Intel x86 systems the INIT pin or meessage can be considered to be approximately a [[soft reset]].&lt;br /&gt;With the old reset and power on constituting [[hard reset]].&lt;br /&gt;&lt;br /&gt;But [[soft reset]]s like INIT cannot recover from all errors.  Sometimes a computer is truly hung, and a [[power cycle]] is necessary.  &lt;br /&gt;Or, if not a power cycle, an assertion of the RESET signal&lt;br /&gt;so that all ofthe rest of the system is initialized?&lt;br /&gt;&lt;br /&gt;Did I mention that, perhaps before [[soft reset]], people would build circuits that could interrupt power with a relay, and thus provooke a power on [[hard reset]] under software control?&lt;br /&gt;&lt;br /&gt;Trouble is, after a [[hard reset]] system state might be unreliable, or might be initialized to a true reset state, such as all zeros.&lt;br /&gt;How can you then distinguish a true [[power on reset]] from a [[hard reset]] invoked under software control?&lt;br /&gt;Perhaps used to recover from a system hang?&lt;br /&gt;&lt;br /&gt;What you need is a softer hard reset - a [[warm reset]] that acts like a [[hard reset]], asserting the RESET signal to the rest of the system, I/O devices et al,&lt;br /&gt;so that the state  of the machine is as close as you can get to a true power on [[cold reset]] as possible.&lt;br /&gt;But where power persists across the [[warm reset]],&lt;br /&gt;so that at least certain status registers can be reliably read.&lt;br /&gt;&lt;br /&gt;At first the state that persisted across such a [[warm reset]] might reside only in a battery backed up unit near to the reset state machine.&lt;br /&gt;But when the amount of state that you want to persist grows large,&lt;br /&gt;e.g. the [[MCA (Machine Check Architecture)]] error status log registers,&lt;br /&gt;state  inside the [[CPU]] may be allowed, indeed, required, to persist across such a [[warm reset]].&lt;br /&gt;&lt;br /&gt;= A possible sequence of progressively harder RESETs for error recovery =&lt;br /&gt;&lt;br /&gt;I am sure that you can see that [[cold reset]] versus [[warm reset]] and [[hard reset]] versus [[soft reset]] are not necessarily discrete points but,&lt;br /&gt;as usual, may be points on a spectrum,&lt;br /&gt;a not necessarily 1D ordered list of possible reset mechanisms.&lt;br /&gt;&lt;br /&gt;If an error is detected, &lt;br /&gt;e.g. if a processor stops responding&lt;br /&gt;* first you might try sending it a normal interrupt, of progressively higher priority&lt;br /&gt;* then you might try sending in an [[NMI (Non-Maskable Interrupt)]]&lt;br /&gt;** then any of the flavors of [[even less maskable non-maskable interrupt]]s, such as Intel's [[SMI (System Management Interrupt]], some sort of [[VMM]] or [[hypervisor]] interrupt&lt;br /&gt;* then you might try sending a [[soft reset]] message like INIT to the apparently hung processor&lt;br /&gt;* this failing, you may try to do a [[warm reset]] of the hung processor&lt;br /&gt;** although by this time you probably want to reset all of the processors and I/O that are tightly bound to the hung processor&lt;br /&gt;** exactly how you define such a "reset domain" is system specific, although it is often the same as a [[shared memory cache coherency domain]].&lt;br /&gt;* failing this, you might try to use a [[hard reset]] under the  control of an external circuit, e.g. at the power supply, that can trip a relay and then untrip it after a time has elapsed&lt;br /&gt;** heck, this can be done at power supply points increasingly distant: inside the PC or blade, at the rack, in the datacenter...&lt;br /&gt;* all of these failing, you can try to notify the user, although by this point that probably is impossible; or you may rely on some external mechanism, such as a user or a watchdog, to try to use ever more extreme forms of resetting the system.&lt;br /&gt;&lt;br /&gt;The above tends to imply that there is a linear order of reset mechanisms.  This is not necessarily true.  You may reset subsystems in heuristic order.&lt;br /&gt;&lt;br /&gt;= RESET is a splitting concept =&lt;br /&gt;&lt;br /&gt;I.e. overall [[RESET]] is one of those concepts that inevitably split,&lt;br /&gt;whenever you look too closely at it.&lt;br /&gt;Of  course, create no more flavors of reset than you need, because each brings complexity.&lt;br /&gt;But inevitably, if your system is successful and lasts for several years,&lt;br /&gt;you will need one or two more flavors of RESET than were originally anticipated.&lt;br /&gt;&lt;br /&gt;= Similar =&lt;br /&gt;&lt;br /&gt;[[Watchdog timer]]s are a concept that splits similarly to RESET.&lt;br /&gt;Indeed, each level of progressive reset forerror  recovery of a hung system is often driven&lt;br /&gt;by a new level of [[watchdog timer]].&lt;br /&gt;Or [[sanity timer]].&lt;br /&gt;&lt;br /&gt;(Or [neurosis timer]], or [[psychosis timer]] ... no, I am writing this on April Fool's.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-6679106806336284243?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Reset:_Hard,_Soft,_Cold,_Warm' title='Reset: Hard, Soft, Cold, Warm'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/6679106806336284243/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=6679106806336284243' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6679106806336284243'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6679106806336284243'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/04/reset-hard-soft-cold-warm.html' title='Reset: Hard, Soft, Cold, Warm'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-9212065163076488845</id><published>2011-03-31T21:28:00.001-07:00</published><updated>2011-03-31T21:28:53.962-07:00</updated><title type='text'>Hardware data structures</title><content type='html'>Computer software education (at least, if not the whole field) was greatly influenced by Niklaus Wirth's 1976 book&lt;br /&gt;"Algorithms + Data Structures = Programs".&lt;br /&gt;This popularized the importance of a programmer or software practitioner having a toolkit of datastructures,&lt;br /&gt;and algorithms that employ or operate on such datastructures.&lt;br /&gt;&lt;br /&gt;The situation in computer hardware is somewhat inchoate, less well defined.&lt;br /&gt;Nevertheless the same [[hardware data structures]]&lt;br /&gt;crop up over and over again in different applications.&lt;br /&gt;&lt;br /&gt;I am not really sure what "algorithms" would be in the hardware version of "Algorithms + Data Structures = Programs".&lt;br /&gt;Nor, for that matter, what "programs" would be.&lt;br /&gt;Modules?  Devices?&lt;br /&gt;Sure, hardware implements algorithms, but it is hard to tease apart the concept of algorithms and data structures.&lt;br /&gt;Moderately hard for software; especially so for hardware.&lt;br /&gt;&lt;br /&gt;However, we should always keep in mind that hardware "algorithms" are just like software algorithms,&lt;br /&gt;and  data structures, except that implementation in hardware has diofferent constraint:&lt;br /&gt;* size is limited - software often treats program size and data memory suze as infinite&lt;br /&gt;* communications is restricted: whereas in software any module can talk to any other module or object, in hardware only a limited number of nearby communications are cheap.&lt;br /&gt;&lt;br /&gt;== List of [[hardware data structures]] == &lt;br /&gt;* [[Arrays]]&lt;br /&gt;** [[RAMs]] versus [[CAMs]]&lt;br /&gt;** [[Encoded CAMs versus Decoded CAMs]]&lt;br /&gt;-&lt;br /&gt;* [[CAM]]s&lt;br /&gt;** [[Equality match CAM]]s&lt;br /&gt;** [[Priority CAM]]s&lt;br /&gt;** [[Greater than CAM]]s&lt;br /&gt;** [[Range CAM]]s&lt;br /&gt;** [[Bitmask CAM]]s, a more general case of [[Ternary CAM]]s&lt;br /&gt;-&lt;br /&gt;* [[Pipelines]] - I'm not sure that these should be considered a [[hardware data structure]]&lt;br /&gt;** [[Stalling pipelines]]&lt;br /&gt;** [[Stall-free pipelines]]&lt;br /&gt;*** [[Draining pipelines]]&lt;br /&gt;*** [[Replay pipelines]]&lt;br /&gt;** [[Bubble collapsing pipelines]]&lt;br /&gt;-&lt;br /&gt;* [[Queues and Buffers]]&lt;br /&gt;** [[Collapsing queues]]&lt;br /&gt;-&lt;br /&gt;* [[Circular arrays]]&lt;br /&gt;-&lt;br /&gt;* [[FIFOs]]&lt;br /&gt;** [[True data movement FIFOs versus array-based circular buffers]]&lt;br /&gt;** [[Synchronizers (asynchronous logic)]]&lt;br /&gt;*** [[Synchronization and  clock domain crossing FIFOs]]&lt;br /&gt;**** [[AMD synchronization FIFO]]&lt;br /&gt;**** [[Intel Bubble Generating FIFO (BGF)]]&lt;br /&gt;-&lt;br /&gt;** [[Alignment issues for queues, buffers, and pipelines]]&lt;br /&gt;** [[Holes in queues, buffers, and pipelines]]&lt;br /&gt;-&lt;br /&gt;[[Caches]]&lt;br /&gt;* [[Fully associative in-array CAM tag matching]]&lt;br /&gt;* [[Separate tag and data arrays]]&lt;br /&gt;* [[N-way associative tag matching outside array]]&lt;br /&gt;** [[index sequential]] - tag match, and then index data array&lt;br /&gt;** [[index parallel]] - read out tags and data in parallel, then do a late mux&lt;br /&gt;** [[in-array tag matching for N-way associative arrays]]&lt;br /&gt;-&lt;br /&gt;* [[Bloom Filters]]&lt;br /&gt;** [[M of N Bloom filter]]&lt;br /&gt;** [[N x 1 of M Bloom filter]]&lt;br /&gt;** [[Pipelined Bloom filter]]&lt;br /&gt;** [[Counted Bloom filter]]&lt;br /&gt;-&lt;br /&gt;* [[Arrays of 2-bit saturating counters]] crop up again and again, in predictors and elsewhere&lt;br /&gt;-&lt;br /&gt;* [[Permutation bit matrix]]es&lt;br /&gt;-&lt;br /&gt;* Schedulers&lt;br /&gt;** [[Pickers versus packers]]&lt;br /&gt;** [[Prioritizers]]&lt;br /&gt;*** [[Carry chain prioritizer]]&lt;br /&gt;**** [[Pick most extreme on one side]] - [[pick oldest (or youngest)]]&lt;br /&gt;**** [[Pick most extreme on both sides]] - [[pick oldest and youngest]]&lt;br /&gt;*** [[CAM matching prioritizer]]&lt;br /&gt;**** [[Pick N-th]] - [[pick N-th oldest]]&lt;br /&gt;*** [[Prioritizers after permutation bit matrix]]&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;In the aftermath of [[Mead and Conway]]'s "VLSI revolution"&lt;br /&gt;there was a flurry of work on arbitrary hardware datastructures,&lt;br /&gt;often 1:1 corresponding to software datastructures.&lt;br /&gt;These include&lt;br /&gt;* [[hardware sorting networks]]&lt;br /&gt;* [[hardware heap sorting structures]]&lt;br /&gt;* [[true hardware stack]]s&lt;br /&gt;* [[true hardware FIFO]]s&lt;br /&gt;Unfortunately, many of these have not proved practical in most computer applications.&lt;br /&gt;&lt;br /&gt;== Duality between Hardware and Software Data Structures ==&lt;br /&gt;&lt;br /&gt;We have long observed a regular pattern of [[duality between hardware and software data structures]].&lt;br /&gt;&lt;br /&gt;This can be observed&lt;br /&gt;* When implementing a hardware microarchitecture inside a software simulator.  &lt;br /&gt;* When converting an algorithm implemented in software into hardware&lt;br /&gt;Oftentimes the most direct implementatin of a hardware data structure in software is stupidly slow, but there are regular patterns of good equivalences.&lt;br /&gt;&lt;br /&gt;Such duals include&lt;br /&gt;&lt;br /&gt;&lt;TABLE border="1"&gt;&lt;TR&gt;&lt;TH&gt;Hardware&lt;/TH&gt;&lt;TH&gt;Software&lt;/TH&gt;&lt;TH&gt;Comments&lt;/TH&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;br /&gt;Equality match CAM&lt;br /&gt;&lt;/TD&gt;&lt;TD&gt;&lt;br /&gt;Hash table&lt;br /&gt;&lt;/TD&gt;&lt;TD&gt;&lt;br /&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;br /&gt;Range or Greater Than CAM&lt;br /&gt;&lt;/TD&gt;&lt;TD&gt;&lt;br /&gt;No good SW equivalent&lt;br /&gt;&lt;/TD&gt;&lt;TD&gt;&lt;br /&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;br /&gt;[[PLA]]s&lt;br /&gt;&lt;/TD&gt;&lt;TD&gt;&lt;br /&gt;[[Lookup table (LUT)]]&lt;br /&gt;&lt;/TD&gt;&lt;TD&gt;&lt;br /&gt;[[HW]] can handle irregular lookup tables:&lt;br /&gt;different granularity of LUTs for different ranges.&lt;br /&gt;[[SW]] must&lt;br /&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TABLE&gt;&lt;br /&gt;The duality is useful,&lt;br /&gt;not just for implementing HW in SW or vice versa,&lt;br /&gt;but also because trends in VLSI scaling have meant, more and more, that the "traditional" hardware datastructures&lt;br /&gt;must be foregone.&lt;br /&gt;E.g. limiting the number of bits on a bit-line in a [[RAM array]] or a [[CAM match bitline]]&lt;br /&gt;often means that that hash table like approaches should  be used.&lt;br /&gt;&lt;br /&gt;== Dual or equivalent implementations for [[hardware data structures]] ==&lt;br /&gt;&lt;br /&gt;* [[Arrays are equivalent to multiple flopped registers outputting to a MUX]]&lt;br /&gt;&lt;br /&gt;== Other [[hardware data structures]] ==&lt;br /&gt;&lt;br /&gt;The ShuffleBox: a rectangular array of bits that can store, XOR and rotate all bits in all directions. &lt;br /&gt;* Hala A. Farouk &lt;br /&gt;* An efficient hardware data structure for cryptographic applications &lt;br /&gt;* http://www.universitypress.org.uk/journals/cc/cc-10.pdf&lt;br /&gt;&lt;br /&gt;=== Shifters ===&lt;br /&gt;&lt;br /&gt;Various shift implementations - [[barrel shifter]]s,&lt;br /&gt;[[cascaded multistage shifter]]s&lt;br /&gt;- "feel" to me like the hardware instantiation of an algorithm more than a data structure,&lt;br /&gt;since they are stateless.&lt;br /&gt;&lt;br /&gt;Perhaps I should limit the concept of a [[hardware data structure]] to something that can store state for retrieval later.&lt;br /&gt;&lt;br /&gt;= References =&lt;br /&gt;* http://en.wikipedia.org/wiki/Algorithms_%2B_Data_Structures_%3D_Programs&lt;br /&gt;* Wirth, Niklaus (1976) (in English). Algorithms + Data Structures = Programs. Prentice Hall. ISBN 978-0-13-022418-7. 0130224189.&lt;br /&gt;&lt;br /&gt;Intel BGF mentioned in: &lt;br /&gt;* ftp://download.intel.com/design/processor/datashts/320835.pdf&lt;br /&gt;&lt;br /&gt;* http://en.wikipedia.org/wiki/Mead_%26_Conway_revolution&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-9212065163076488845?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Hardware_data_structures#List_of_hardware_data_structures' title='Hardware data structures'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/9212065163076488845/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=9212065163076488845' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/9212065163076488845'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/9212065163076488845'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/03/hardware-data-structures.html' title='Hardware data structures'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-4412585697917678346</id><published>2011-03-02T21:51:00.000-08:00</published><updated>2011-03-02T21:51:10.175-08:00</updated><title type='text'>Dynamic dead code elimination and hardware futures</title><content type='html'>[[Category:BS]]&lt;br /&gt;http://semipublic.comp-arch.net/wiki/Dynamic_dead_code_elimination_and_hardware_futures&lt;br /&gt;= [[Dynamic dead code elimination]] =&lt;br /&gt;&lt;br /&gt;It has been shown that often more than 1/3 of all computations, data movement, etc., are dynamically dead - they will never be used by the program, on its current path of execution.  &lt;br /&gt;&lt;br /&gt;There have  been hardware proposals that delay computations, and elide some such dynamically dead code.  Something like the flip side of [[hardware futures]].&lt;br /&gt;&lt;br /&gt;; [[Semi-static optimization]]&lt;br /&gt;&lt;br /&gt;Dynamic dead code elimination may be done as a [[semi-static optimization]] in an [[optimized trace cache]].&lt;br /&gt;Basically, one applies classic compiler algorithms to the code to be cached,&lt;br /&gt;augmented by reachability predicates.&lt;br /&gt;&lt;br /&gt;Or, another classic way of doing hardware optimization is to examine the post-retirement instruction stream, eliminate dead code, and install that in the [[optimized trace cache]].&lt;br /&gt;&lt;br /&gt;; [[Rename time optimization]]&lt;br /&gt;&lt;br /&gt;Dynamic dead code elimination may also be  done in the [[renamer]] pipestage.  The basic idea is, instead of emitting uops as the are decoded,one assembles blocks of uops in the renamer.&lt;br /&gt;Essentially, large expressions, potentially with multiple inputs and outputs.&lt;br /&gt;One does not emit such a block of uops from the renamer to execution until a usage of the block is detected.&lt;br /&gt;&lt;br /&gt;Typically, a store that may be visible to another processor constitutes a usage that will cause uops to be emitted.&lt;br /&gt;(Although certain stores can be elided, e.g. push/pop/push [[stack optimizations]] and [[temporary variable optimizations]].)&lt;br /&gt;&lt;br /&gt;Similarly, the evaluation of a branch condition may cause uops to be emitted. But one only need emit the subset of the buffered uops necessary to evaluate the branch.&lt;br /&gt;&lt;br /&gt;Exceptions and other situations whether the entire program state may cause uops to be emitted.&lt;br /&gt;&lt;br /&gt;And overflow of the renamer buffer.&lt;br /&gt;&lt;br /&gt;If the output of an expression buffered in the renamer is overwritten, that part of the expression can be eliminated as dead code.&lt;br /&gt;&lt;br /&gt;; [[Dataflow optimization]]&lt;br /&gt;&lt;br /&gt;[[Dynamic dead code elimination]] can also be done in the OOO dataflow scheduling part of the machine.&lt;br /&gt;Essentially, one assembles the dataflow graph, but one does not necessarily start execution as soon as inputs are ready.&lt;br /&gt;&lt;br /&gt;The renamer adds an additional [[WAW]] arc to the dataflow graph, for a new uop that overwrites an old uop's output.&lt;br /&gt;If that old uop has no [[RAW]] dependencies, then it can "fire", cancelling itself,&lt;br /&gt;and signalling that cancellation to its own input dependencies in turn.&lt;br /&gt;&lt;br /&gt;This is essentially [[reverse dataflow execution]].&lt;br /&gt;It is straightforward, although it does require hardware that is not normally present.&lt;br /&gt;It is easiest to accomplish with a [[bit matrix]] scheduler, where the [[input bitvector]] becomes the [[output bitvector]] of a cancelled uop.&lt;br /&gt;&lt;br /&gt;It is not at all clear how useful this form of dataflow dead code elimination is:&lt;br /&gt;it is buildable, but how much does it help?&lt;br /&gt;It is probably most useful if it can be remembered in an [[optimized trace cache]].&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= [[Hardware futures]] =&lt;br /&gt;&lt;br /&gt;[[Hardware futures]] are the dual of [[dynamic dead code elimination]].&lt;br /&gt;Instead of eliminating code that does not need to be executed,&lt;br /&gt;it delays execution and evaluation of code until it is known to be needed.&lt;br /&gt;This delay naturally accomplishes [[dynamic dead code elimination]].&lt;br /&gt;It also provides increased scope for [[hardware CSE]],&lt;br /&gt;and other optimizations.&lt;br /&gt;But it potentially increases latency.&lt;br /&gt;&lt;br /&gt;As for [[dynamic dead code elimination]],&lt;br /&gt;it is fairly easy to see how to do this at the renamer:&lt;br /&gt;have the renamer accumulate expressions.&lt;br /&gt;Only release an expression to be evaluated when a usage that demands evaluation is detected.&lt;br /&gt;&lt;br /&gt;We can go further, and avoid the buffering at the renamer becoming an issue by emitting such buffered instructions for a hardware future&lt;br /&gt;into a sort of side car storage, rather like that of Akkary's [[CFP]].&lt;br /&gt;When evaluation is demanded, we issue  a token to cause insertion of those instructions.&lt;br /&gt;&lt;br /&gt;Issue: are the buffered instructions pre or post rename?  It is easiest to buffer them post rename,&lt;br /&gt;using dataflow single assignment registers and memory locations.&lt;br /&gt;&lt;br /&gt;Also as above, this can be deferred into the OOO dataflow scheduler.&lt;br /&gt;But, again, that is using a fairly critical resource.&lt;br /&gt;&lt;br /&gt;To make hardware futures less dynamic, more [[semi-static]] at the [[optimized trace cache]]&lt;br /&gt;involves dividing the future into two parts:&lt;br /&gt;the first part captures any live inputs,&lt;br /&gt;that must be captured lest they change;&lt;br /&gt;the second part does the actual evaluation on demand.&lt;br /&gt;Typically memory load values must be live inputs, as well as registers,&lt;br /&gt;and the future is constrained from doing stores that may be visible to other threads.&lt;br /&gt;&lt;br /&gt;I.e. the hardware future is constrained to be a chunk of code that has no internal memory references. &lt;br /&gt;Or at least only sharply limited internal memory references.&lt;br /&gt;&lt;br /&gt;It is unclear how useful such a future would be.  It would have to be saving considerable register based computation&lt;br /&gt;- complicated instructions such as transcendentals - or be a complicated calculation on a small amount of memory.&lt;br /&gt;&lt;br /&gt;Such hardware futures might be much more useful if we were released from the constraints of the [[Von Neumann memory model]],&lt;br /&gt;e.g. if we had classic dataflow style [[single assignment memory]].&lt;br /&gt;But in 2011 that does not seem likely any time soon.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-4412585697917678346?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Dynamic_dead_code_elimination_and_hardware_futures' title='Dynamic dead code elimination and hardware futures'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/4412585697917678346/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=4412585697917678346' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4412585697917678346'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4412585697917678346'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/03/dynamic-dead-code-elimination-and.html' title='Dynamic dead code elimination and hardware futures'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7683833427491707663</id><published>2011-03-02T21:02:00.000-08:00</published><updated>2011-03-02T21:50:17.349-08:00</updated><title type='text'>Using Redundancies to Find Errors</title><content type='html'>Cleaning out old journals from my bookshelf.&lt;br /&gt;&lt;br /&gt;Came across&lt;br /&gt;&lt;br /&gt;Yichen Xie, Dawson Engler, "Using Redundancies to Find Errors," IEEE Transactions on Software Engineering, pp. 915-928, October, 2003 &lt;br /&gt;&lt;br /&gt;(actually, a slightly different version in Software Engineering Notes / SIGSOFT 2002/FSE 10.&lt;br /&gt;&lt;br /&gt;Very interesting paper:  used analysis tools that detected dead code, redundant conditionals, idempotent operations such as x&amp;x to detect bugs.&lt;br /&gt;&lt;br /&gt;Causes me to wonder about applying similar techniques to look for hardware/RTL/HDL bugs, e.g. in Verilog or VHDL.&lt;br /&gt;&lt;br /&gt;For that matter, it caused me to think about similar hardware techniques.  For example, it has been shown that often more than 1/3 of all computations, data movement, etc., are dynamically dead - they will never be used by the program, on its current path of execution.  There have  been hardware proposals that delay computations, and elide some such dynamically dead code.  Something like the flip side of hardware futures.&lt;br /&gt;&lt;br /&gt;Now, such dynamic  dead code detection *might* help locate bugs - but it suffers because it is dynamic, bound to the current path of execution.  Whereas Xie's work looks at reachable code, including paths that may not be currently taken but which might use the values that are dead on the current path.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;&lt;br /&gt;Done in xgcc, so code should  be publicly available.&lt;br /&gt;&lt;br /&gt;I wish I had had such tools in my last project.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7683833427491707663?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.154.1175&amp;rep=rep1&amp;type=pdf' title='Using Redundancies to Find Errors'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7683833427491707663/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7683833427491707663' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7683833427491707663'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7683833427491707663'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/03/using-redundancies-to-find-errors.html' title='Using Redundancies to Find Errors'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-739673364568088110</id><published>2011-03-01T08:33:00.000-08:00</published><updated>2011-03-01T08:34:45.816-08:00</updated><title type='text'>Stream Processors, specifically Stream Descriptor Processors</title><content type='html'>http://semipublic.comp-arch.net/wiki/Stream_processor&lt;br /&gt;{{Terminology Term}}&lt;br /&gt;[[Category:ISA]]&lt;br /&gt;[[Category:Memory]]&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= [[Stream processor]] is an overloaded term =&lt;br /&gt;&lt;br /&gt;The term [[stream processor]] is fashionable and overloaded.&lt;br /&gt;It often means just a processor that is "optimized" for a [[stream workload]].&lt;br /&gt;Where "optimized" may mean just "has little cache".&lt;br /&gt;(E.g. the use of the term "stream processor" in [[GPU]]s such as tyhose by AMD/ATI and Nvidia.)&lt;br /&gt;&lt;br /&gt;= What is a [[stream]]? =&lt;br /&gt;&lt;br /&gt;A [[stream workload]] operates on sets of data,&lt;br /&gt;operating on one or a few data elements&lt;br /&gt;before moving on,&lt;br /&gt;not to return to those data elements for a long time.&lt;br /&gt;Because of this access pattern,&lt;br /&gt;[[LRU replacement]] is often ineffective, and may even be pessimal,&lt;br /&gt;for [[stream workload]]s.&lt;br /&gt;Keeping cache entries around that are not going to be  reused is wasteful.&lt;br /&gt;Sometimes a stream is restarted, in which case an MRU policy that keeps the head of the stream in cache is better than LRU.&lt;br /&gt;Sometimes there  is no reuse at all.&lt;br /&gt;&lt;br /&gt;= [[Stream descriptor processor]] =&lt;br /&gt;&lt;br /&gt;By [[stream processor]], I mean processors with [[stream descriptor]]s&lt;br /&gt;- processors that have specific features, whether ISA or microarchitecture,&lt;br /&gt;that support streaming.&lt;br /&gt;(I would prefer to reserve this term for ISA features, but [[many good ISA features become microarchitecture]]].)&lt;br /&gt;Possibly should call them [[stream descriptor processor]]s,&lt;br /&gt;to distinguish them from the overloaded term [[stream processor]].&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= [[Stream descriptor]] =&lt;br /&gt;&lt;br /&gt;A [[stream descriptor]] is a description, in a register or memory data structure, that describes a memory access pattern.&lt;br /&gt;&lt;br /&gt;For example, a [[1D stream descriptor]] might include&lt;br /&gt;* data item description - size, type&lt;br /&gt;* base&lt;br /&gt;* stride&lt;br /&gt;* length &lt;br /&gt;* current position&lt;br /&gt;&lt;br /&gt;A [[2D stream descriptor]] might include&lt;br /&gt;* data item description - size, type&lt;br /&gt;* base&lt;br /&gt;* horizontal stride, length &lt;br /&gt;* vertical stride, length&lt;br /&gt;* current position, row and column&lt;br /&gt;&lt;br /&gt;Obviously extendable to N dimensions.&lt;br /&gt;&lt;br /&gt;In addition to arrays, there may be other types of stream descriptor&lt;br /&gt;&lt;br /&gt;A scatter/gather descriptor&lt;br /&gt;* a 1D descriptor of a vector of addresses&lt;br /&gt;* with an indication that each addressshould be dereferenced&lt;br /&gt;&lt;br /&gt;A linked list descriptor&lt;br /&gt;* Base&lt;br /&gt;* a skeleton [[structure descriptor]]&lt;br /&gt;** payload to be accessed while traversing the list - offset, type&lt;br /&gt;** position within a list element of pointer to next element&lt;br /&gt;&lt;br /&gt;A tree or B-tree descriptor&lt;br /&gt;* Base&lt;br /&gt;* a skeleton [[structure descriptor]]&lt;br /&gt;** payload &lt;br /&gt;** key&lt;br /&gt;** pointers&lt;br /&gt;&lt;br /&gt;For the case of a tree or B-tree  descriptor,&lt;br /&gt;the access pattern may &lt;br /&gt;(a) traverse all of a tree or subtree&lt;br /&gt;or (b) traverse only a path seeking an individual element&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= Why [[stream processor]]s? =&lt;br /&gt;&lt;br /&gt;There are  two main motivations for stream processors:&lt;br /&gt;* memory access patterns&lt;br /&gt;* data manipulation overhead&lt;br /&gt;I think there is an additional property, probably nort considered important by most, but [[IMHO]] quite important:&lt;br /&gt;* decoupling and ease of programming&lt;br /&gt;&lt;br /&gt;; Memory Access  Patterns&lt;br /&gt;&lt;br /&gt;Support for streaming memory access patterns is probably the most important.&lt;br /&gt;As mentioned  above, many [[stream workload]]s do not fit in the cache.&lt;br /&gt;But that could be handled  by a simple [[cache bypass hint]] instruction.&lt;br /&gt;&lt;br /&gt;Of more long term interest is the possibility that [[stream descriptor]]s, etc., may allow &lt;br /&gt;streams that fit in the cache to be managed.&lt;br /&gt;As cache sizes grow, [[memory data structures]] that did not fit in an old, smaller, cache, may fit in a new, larger, cacge.&lt;br /&gt;E.g. a 1000x1000x32b array does not fit in a 32K [[L1$]], but may fit in an 8M [[L3$]].&lt;br /&gt;Of course, this cannot be managed in isolation:&lt;br /&gt;a workload that accessed only one 2MB 2D array might fit in an 8MB cache, but one that accessed 4 such would not.&lt;br /&gt;Nevertheless, explicitly indicating the data structures provides more information to whoever is trying to manage the cache,&lt;br /&gt;whether hardware or software.&lt;br /&gt;&lt;ul&gt;The [[stream descriptor]] may have a bit associated with it, either inside or [[non-adjacent]], that indicates which cache levels to use. &lt;/UL&gt;Furthermore, even in the absence of cache control in the stream descriptor, the stream descriptor provides guidance for [[cache prefetch]].&lt;br /&gt;Guidance that may be more  accurate that [[SW prefetch using cache control instructions]].&lt;br /&gt;A stream descriptor may be associated with a prefetch control data structure, that seeks always to avoid delays accessing a stream,&lt;br /&gt;without excessive prefetch.&lt;br /&gt;&lt;br /&gt;; Data Manipulation Overhead&lt;br /&gt;&lt;br /&gt;Users of [[SIMD packed vector]] instruction sets quickly come  to realize&lt;br /&gt;that the  actual memory access and computation is only a small part of  the task of programming code  to use such instructions.&lt;br /&gt;Often more than half of the instructions in such kernels are&lt;br /&gt;data manipulation overhead &lt;br /&gt;* unpacking 8 bit numbers into 16 bit numbers, or, worse, even fancier unpackings such as 5:6:5&lt;br /&gt;* transposng rows and columns&lt;br /&gt;* [[chunky versus planar]]  data, i.e. [[SOA versus AOS]]&lt;br /&gt;&lt;br /&gt;Explicit support  for streams via [[stream descriptor]]s can eliminate much of this overhead, avoiding explicit instructions.&lt;br /&gt;&lt;br /&gt;[[The CSI Multimedia Architecture]] goes further: not only does it hide the data packing and unpacking,&lt;br /&gt;but it also seeks to hide the [[SIMD packed vector]] nature of the hardware datapath.&lt;br /&gt;Instead of the programmer loading from a stream into a register of N bits, operating, and  then storing,&lt;br /&gt;this paper proposes complex streaming instructions that are implemented N bits at a time&lt;br /&gt;in such a way that the code does not need  to be changed when the microarchitecture  changes the width of N.&lt;br /&gt;&lt;br /&gt;Others have attempted to go even further, providing ways of  expressing streaming kernels using simple instructions&lt;br /&gt;that automatically get mapped onto a fixed length datapath.&lt;br /&gt;&lt;br /&gt;; Decoupling and ease of programming&lt;br /&gt;&lt;br /&gt;Streams are the equivalent of UNIX pipes.&lt;br /&gt;&lt;br /&gt;If you had an architecture that allowed [[lightweight thread forking]],&lt;br /&gt;some code could be written by connecting two library function kernels together by a stream&lt;br /&gt;- instead of &lt;br /&gt;* writing two separate loops that store the intermediate value to memory&lt;br /&gt;* or merging the two operations into a single loop for a single pass&lt;br /&gt;&lt;br /&gt;PROBLEM: allocating buffer space  for such inter-thread streams. It may be necessary to spill to memory in some situations.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= If streams are  such a good idea, why aren;t they popular yet? =&lt;br /&gt;&lt;br /&gt;TBD.&lt;br /&gt;&lt;br /&gt;Proper [[stream descriptor processor]]s have not taken off  by 2010.&lt;br /&gt;&lt;br /&gt;Instead, [[SIMD packed  vector]] has  dominated.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= See Also =&lt;br /&gt;&lt;br /&gt;;[[The CSI Multimedia Architecture]], IEEE T. VLSI, Jan 2005.&lt;br /&gt;: [[Stream descriptor]]s, plus optimizations to allow implicit SIMDification.&lt;br /&gt;* [[iWarp]] - stream descriptors bound to registers&lt;br /&gt;* [[Wm]] - William Wulf's processor, firs instance  I know of stream descriptors bound to register&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-739673364568088110?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Stream_processor' title='Stream Processors, specifically Stream Descriptor Processors'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/739673364568088110/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=739673364568088110' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/739673364568088110'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/739673364568088110'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/03/stream-processors-specifically-stream.html' title='Stream Processors, specifically Stream Descriptor Processors'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-8204861895496401703</id><published>2011-02-27T18:06:00.001-08:00</published><updated>2011-02-27T18:06:41.065-08:00</updated><title type='text'>Unified_versus_Split_or_Typed_Register_Files</title><content type='html'>http://semipublic.comp-arch.net/wiki/Unified_versus_Split_or_Typed_Register_Files&lt;br /&gt;&lt;br /&gt;There used to be a big debate in computer architecture about whether&lt;br /&gt;one should have [[Unified versus Split Integera and Floating Point]].&lt;br /&gt;Both ISA or macroarchitecurally, in the programmer exposed register&lt;br /&gt;file.  Or microarchitecturally, in the implementation.&lt;br /&gt;&lt;br /&gt;E.g. the [[Intel P6 microarchitecture]] implemented the [[Intel&lt;br /&gt;x86/x87 ISA]], with separate integer and floating point.  But the&lt;br /&gt;original P6 microarchitecture implemented this is a unified datapath,&lt;br /&gt;in which both integer and floating point use the same scheduler, and&lt;br /&gt;occupy the same [[PRF]] in the form of the unified [[ROB]], each entry&lt;br /&gt;of which can hold a full 80+ bit x87 FP value.  But they are split at&lt;br /&gt;the [[RRF]].&lt;br /&gt;&lt;br /&gt;Over time, Intel added [[SIMD packed vector extensions]]. [[MMX]]&lt;br /&gt;shared the x86 registers, both architecturally and&lt;br /&gt;microarchitecturally (in most implementations).  But wider SIMD, such&lt;br /&gt;as SSE with 128 bit [XMM]] regisyyers, and AVX with 256 bit [[YMM]]&lt;br /&gt;registers, introduced yet another split architectureal register file.&lt;br /&gt;Microarchitecturally, they may have kept the P6 unified structure, but&lt;br /&gt;the wastage of storing an 8, 16, 32, or even 64 bit scalar in a 128 or&lt;br /&gt;256 bit wide PRF entry becomes more and more of a concern.&lt;br /&gt;&lt;br /&gt;AMD, by contrast, has typically had separate integer and FP/SIMD&lt;br /&gt;clusters, with separate schedulers and PRFs and ...&lt;br /&gt;&lt;br /&gt;Intel's 2011 processor, Sandybridge,  has split integer and SIMD PRFs,&lt;br /&gt;a unified scheduler, but split scalar and SIMD clusters.&lt;br /&gt;&lt;br /&gt;It is becoming increasingly obvious that any desire to split datapaths&lt;br /&gt;should be based not on type, not on integer versis floating point, but&lt;br /&gt;should be based on width, intrger or scalar versus SIMD packed&lt;br /&gt;vectors.  Narrow data types, 8, 16, 32, 64 bits, versus wide packed&lt;br /&gt;data types, 128, 256, possibly even 512 bits wide. Latency versus&lt;br /&gt;throughput.&lt;br /&gt;&lt;br /&gt;Currently it is pretty standard to separate&lt;br /&gt;&lt;br /&gt;* integer, scalar, 8, 16, 32, and 64 bit data.  Addresses and branches. Latency sensitive&lt;br /&gt;&lt;br /&gt;* SIMD packed  vector, both integer and  FP.  128, 256, and possibly 512  bit data. Bandwidth.&lt;br /&gt;&lt;br /&gt;The only real question is whether one should support scalar FP on a&lt;br /&gt;datapath optimized for low latency, and interaction with branches and&lt;br /&gt;addresses.  Or whether scalar FP should only be handled as a single&lt;br /&gt;lane in a vector register, as in SSE, in a throughput manner.&lt;br /&gt;&lt;br /&gt;Now, there will always be workloads where FP scalar latency will&lt;br /&gt;matter.  For that matter, there will always be workloads where vector&lt;br /&gt;latency matters. Nevertheless, this division into narrow latency and&lt;br /&gt;wide throughput datapaths seems to be a trend.&lt;br /&gt;&lt;br /&gt;It will always be desirable to be able to access the bits of an FP&lt;br /&gt;number, whether scalar or vector, in an ad-hoc manner.  It will always&lt;br /&gt;be desirable to branch and form addresses based on such accesses.  But&lt;br /&gt;these desires will not always outweigh the separation into narrow&lt;br /&gt;latency and wide throughput clusters.&lt;br /&gt;&lt;br /&gt;Power concerns now are paramount. Split datapaths naturally save&lt;br /&gt;power; but unified datapaths may be able to save power in an even&lt;br /&gt;finer grained manner.  Much depends on the granularity possible with&lt;br /&gt;[[clock gating]] (which can be quite fine grained, e.g. to bytes of a&lt;br /&gt;datapath) and [[power gating]] (which in 2011 is quite coarse grained,&lt;br /&gt;unlikely to be done within the width of a unified datapath).  For that&lt;br /&gt;matter, separate narrow/latency and wide/throughput clusters would&lt;br /&gt;probably be easier to clock at different frequencies, with different&lt;br /&gt;coltages - although as of 2011 this does not seem yet to be done.&lt;br /&gt;&lt;br /&gt;I can make no hard and fast conclusion: although there is a trend&lt;br /&gt;towards split narrow/latency and wide/throughput datapaths, there will&lt;br /&gt;still be occasional winning attempts to unifiy.  Much depends on the&lt;br /&gt;evolution of power, frequency, and voltage management, in terms of&lt;br /&gt;granularity.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-8204861895496401703?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Unified_versus_Split_or_Typed_Register_Files' title='Unified_versus_Split_or_Typed_Register_Files'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/8204861895496401703/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=8204861895496401703' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8204861895496401703'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8204861895496401703'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/02/unifiedversussplitortypedregisterfiles.html' title='Unified_versus_Split_or_Typed_Register_Files'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-2489365523708955684</id><published>2011-02-27T17:30:00.000-08:00</published><updated>2011-02-27T17:30:10.461-08:00</updated><title type='text'>Why non-inclusive/non-exclusive caches?</title><content type='html'>Sheesh! I can't usually take all the time this article needed to write in one sitting.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;http://semipublic.comp-arch.net/wiki/Why_non-inclusive/non-exclusive_caches&lt;br /&gt;&lt;br /&gt;[[Category:Cache]]&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[[Non-inclusive/non-exclusive (NI/NE)]] caches are used in&lt;br /&gt;[[multilevel cache hierarchies]] that are, duh, neither [[inclusive]]&lt;br /&gt;nor [[exclusive]].&lt;br /&gt;&lt;br /&gt;I.e. there is no enforcement of either the [[cache inclusion&lt;br /&gt;property]] nor the [[cache exclusion property]].  A cache line in an&lt;br /&gt;inner cache may or may not be in an outer cache.  You can't be sure.&lt;br /&gt;If necessary, you must check.&lt;br /&gt;&lt;br /&gt;Because there is no [[cache inclusion]], both the inner and outer&lt;br /&gt;caches must be checked, i.e. snooped or probed, when responding to&lt;br /&gt;requests from other processors.&lt;br /&gt;&lt;br /&gt;= Historical Background - Accidental Inclusion in the Original [[P6]] =&lt;br /&gt;&lt;br /&gt;Probably the most widely known processor that has an [[NI/NE]] cache&lt;br /&gt;is the [[Intel P6 family]].&lt;br /&gt;&lt;br /&gt;The original P6 had an external L2 cache, on a [[back-side bus]].  In&lt;br /&gt;an ascii diagram (TBD, I need drawing in this wiki):&lt;br /&gt;&lt;br /&gt;   [[L2$]] &lt;---&gt;  P([[I$,L1$]]) &lt;---&gt; [[FSB]]&lt;br /&gt;&lt;br /&gt;In more detail&lt;br /&gt;&lt;br /&gt;   [[L2$]] &lt;---------&gt; [[SQ]] &lt;---&gt; [[FSB]]&lt;br /&gt;                   ^&lt;br /&gt;                   |&lt;br /&gt;                   [[FB]]&lt;br /&gt;                   |&lt;br /&gt;                   +-----+----+&lt;br /&gt;                   |     |    |&lt;br /&gt;                   v     v    v&lt;br /&gt;                   [[I$]]    [[P]]    [[L1$]]&lt;br /&gt;&lt;br /&gt;Most of the symbols used in the diagrams above are standard; as for&lt;br /&gt;those that are not:&lt;br /&gt;&lt;br /&gt;* [[SQ (super queue)]] - on the [[P6 Pentium Pro]] die, coordinated interaction with the [[Front side bus (FSB)]],  the back side buss and the off-chip [[L2]], and [[DCU]], mainly interacting with the [[fill buffers (FB)]].&lt;br /&gt;&lt;br /&gt;* [[FB (fill buffers)]] - a staging area under the control of the [[data cache unit (DCU)]] of the P6 cpu core, talking to the [[bus cluster]] and hence the [[SQ]], [[L2$]], and the outisde world via the [[FSB]], and the caches of the [[processor core]], the [[IFU]] and [[DCU]] arrays.  Also talked directly to the processor core, to minimize latency.&lt;br /&gt;&lt;br /&gt;The original P6 had a completely external [[L2$]].  Not just the&lt;br /&gt;[[cache data array]]s, but also the [[cache tag array]]s.&lt;br /&gt;&lt;br /&gt;Now, there were several reasons why the original P6 had [[NI/NE]] or&lt;br /&gt;[[accidental inclusion]]:&lt;br /&gt;&lt;br /&gt;== Backside L2$ ==&lt;br /&gt;&lt;br /&gt;Because the L2$ was on a backside bus, it was not necessarily faster&lt;br /&gt;to access than the L1 caches, the [[IFU]] and [[DCU]].  In fact, you&lt;br /&gt;can easily imagine circumstances where it might have been slower to&lt;br /&gt;[[snoop]] or [[probe]].&lt;br /&gt;&lt;br /&gt;&lt;UL&gt;Note; having the L2$ tags on-chip might have made it possible toprobe the L2$ faster than the L1$.  &lt;/UL&gt;&lt;br /&gt;==  [[Deterministic Snoop Latency]] ==&lt;br /&gt;&lt;br /&gt;One of the design goals of the original P6 was to have [[deterministic&lt;br /&gt;snoop latency]] - to be able to respond within a relatively small&lt;br /&gt;fixed number of cycles to an snoop from remote processors on the&lt;br /&gt;[[FSB]].&lt;br /&gt;&lt;br /&gt;[[Cache inclusion]]] would get in the way of this. You would first&lt;br /&gt;have to prove the L2$, and then, if a hit, probe the L1$.  This&lt;br /&gt;naturally leans towards a [[variable snoop latency]], and slows the&lt;br /&gt;snoop down.&lt;br /&gt;&lt;br /&gt;In the long run I think that [[deterministic snoop latency]] must be&lt;br /&gt;considered a worthy goal, but a dead-end.  I think even the original&lt;br /&gt;P6 gave up on it, providing it in most cases, but also provided a&lt;br /&gt;snoop stall as a way of handling cases where it could not be provided.&lt;br /&gt;Nevertheless, the goal of [[dererministic snoop latency]] influenced&lt;br /&gt;the design.&lt;br /&gt;&lt;br /&gt;==  L2$ relatively large compared to the L1$ ==&lt;br /&gt;&lt;br /&gt;The external, off-chip but inside package, L2$ for the original P6&lt;br /&gt;Pentium Pro was 512Kib, 1Mib, with some versions as small as 256KiB.&lt;br /&gt;&lt;br /&gt;The DCU or L1 data cache was die dieted over the life of the project&lt;br /&gt;from 64KiB in our dreams to 16 or 8KiB.&lt;br /&gt;&lt;br /&gt;Thus, the L2 was typically 32x bigger than the L1 data cache,&lt;br /&gt;and similarly wrt the instruction cache.&lt;br /&gt;&lt;br /&gt;Although Intel has experience with [[exclusive cache]]s from earlier&lt;br /&gt;processors, with a 32:1 or 16:1 size ratio, there is little point in&lt;br /&gt;using an exclusive cache to try to increase cache hit rate.  The&lt;br /&gt;increase in the hit rate from having circa 3-66% more data is small.&lt;br /&gt;Most of the time the data in the L1$ is included in the L2$ buyy&lt;br /&gt;accident.&lt;br /&gt;&lt;br /&gt;This property Wen-Hann Wang called [[accidental inclusion]].  Wen-Hann&lt;br /&gt;Wang was the leader of the P6 cache protocol design, and the inventor&lt;br /&gt;(in his PhD) of [[cache inclusion]].&lt;br /&gt;&lt;br /&gt;==  Simplifying Design ==&lt;br /&gt;&lt;br /&gt;In truth I must admit that we started P6 expecting to build an&lt;br /&gt;[[inclusive cache]] hierarchy.  However, Wen-Hann's performance data&lt;br /&gt;showed that [[accidental inclusion]] because of the relatively large&lt;br /&gt;L2$ removed most of the performance incentive for inclusion.  And&lt;br /&gt;moving to [[NI/NE]] considerably simplified the cache protocol,&lt;br /&gt;removing many boundary conditions.&lt;br /&gt;&lt;br /&gt;For example, how do you handle cases where a cache line is being made&lt;br /&gt;dirty in the L1 exactly as it is being evicted from the L2?  ... It is&lt;br /&gt;not impossible to handle these cases.  But [[accidental inclusion]]&lt;br /&gt;made it just plain easier to leave them decoupled.&lt;br /&gt;&lt;br /&gt;&lt;UL&gt;By the way, at the time on P6 we did not really have a term for thecache inclusion properties of the P5 L2, or lack thereof.  Wen-Hanncalled it [[accidental inclusion]], but that term has stuck only inGlew's memory.  The term [[non-inclusive/non-exclusive (NI/NE)]] wascreated years later.&lt;/UL&gt;&lt;br /&gt;==  [[Silent eviction of clean data]] == &lt;br /&gt;&lt;br /&gt;It is, of course, always necessary to perform a [[writeback]] when&lt;br /&gt;[[evicting dirty data]].&lt;br /&gt;&lt;br /&gt;When [[evicting clean data]], however, one can silently drop the clean&lt;br /&gt;line.  It is not always necessary to inform the outer cache.&lt;br /&gt;&lt;br /&gt;Now, this isn't 100% linked to [[cache inclusion]] or [[NI/NE]].  One&lt;br /&gt;can perform [[silent eviction of clean data]] in an [[inclusive&lt;br /&gt;cache]].  However, doing so means that any [[inclusion tracking]] that&lt;br /&gt;the outer cacheis performing, so that it knows which cachelines in the&lt;br /&gt;outer cache are inside which inner cache(s), may be inaccurate.&lt;br /&gt;I.e. [[possibly inaccurate inclusion tracking ]].&lt;br /&gt;&lt;br /&gt;[[Accurate inclusion tracking]] is a [[slippery slope]].  It is not&lt;br /&gt;necessary, but it has several nice properties.  Worse, some people&lt;br /&gt;implicitly assume [[accurate inclusion tracking]] in any [[inclsuive&lt;br /&gt;cache]] design.&lt;br /&gt;&lt;br /&gt;[[Accidental inclusion]] or [[NI/NE]] avoids this slippery slope&lt;br /&gt;altogether.  Which not only simplifies design, bit also avioids the&lt;br /&gt;sort of incorrect assumptions that often arise on such a [[slippery slope]].&lt;br /&gt;&lt;br /&gt;== Allocation in L1$ without L2$ interaction ==&lt;br /&gt;&lt;br /&gt;[[Accidental inclusion]] enabled or made easier certain cache protocol&lt;br /&gt;optimizations, in particular [[eliminating unnecesary RFOs]] for&lt;br /&gt;full-line cache writes, without requiring L2 involvement.&lt;br /&gt;Especially as used in [[P6 fast strings]].&lt;br /&gt;&lt;br /&gt;This probably was not a design goal for most of the people involved,&lt;br /&gt;but it was something I (Andy Glew) was happy to take advantage of.&lt;br /&gt;&lt;br /&gt;The basic observation is that a sizeable fraction of all cache lines&lt;br /&gt;are completely overwritten, without any of the old data being read.&lt;br /&gt;Therefore, reading the old data via an [[RFO (read for ownership)]] &lt;br /&gt;bus request is wasted bandwidth.&lt;br /&gt;&lt;br /&gt;In operations such as [[block fill]], e.g. zeroing a page of memory,&lt;br /&gt;50% of the data bus bandwidth is wasted by such unnecessary RFOs.  In&lt;br /&gt;[[block copy]], 33%, (Assuming all miss the cache - not that bad an&lt;br /&gt;assumption for an 8 or 16Hob L1$, but also not unreasonable even on&lt;br /&gt;many larger caches, for many [[block memory operations]].)&lt;br /&gt;&lt;br /&gt;Even in non-block memory code, unnecessary RFOs occupy a significant&lt;br /&gt;fraction of data bus bandwidth.  I have measured speedups of 10-20% on&lt;br /&gt;many simulated workloads, and even on a few [[FIB-edited chips]].&lt;br /&gt;&lt;br /&gt;&lt;UL&gt;The biggest challenge with [[eliminating unnecessary RFOs]] ismaintaining [[memory ordering]], in particular [[store ordering]].  Itis easy to do on a [[weakly ordered]] system.  I have invented severaltechniques for maintaining memory ordering, some of which are [[NDA]],but some of which can be discussed here.  When I get around to writingthem up. In my [[copious free time]].  &lt;/UL&gt;&lt;br /&gt;The original P6 eliminated unnecessary RFOs using the [[deferred&lt;br /&gt;invalidation cache protocol]] for [[fast strings]] [[block memory&lt;br /&gt;operations]]. The bandwidth benefit was clear; unfortunately, the&lt;br /&gt;overhead of microcode, in the absence of microcode branch prediction,&lt;br /&gt;made fast strings not always the best approach.  Since the original P6&lt;br /&gt;[[fast strings]] have been hit and miss: when somebody in microcode&lt;br /&gt;gets around to tuning them they can be a win, but if untuned they may&lt;br /&gt;be a loss compared to the best hand rolled assembly code.&lt;br /&gt;&lt;br /&gt;P6's [[WC memory type]] also eliminated unnecessary RFOs, but was not&lt;br /&gt;allocated in the cache.  Similarly the [[SSE streaming stores]] added&lt;br /&gt;in later processors.&lt;br /&gt;&lt;br /&gt;The [[deferred invalidate cache protocol]] continued in use all the&lt;br /&gt;way to Intel's Nehalem processor, the last of the P6 family.  Nehalem,&lt;br /&gt;however, added an inclusive [[LLC]] with snoop filtering, killing the&lt;br /&gt;deferred invalidate cache protocol.&lt;br /&gt;&lt;br /&gt;So much for history.  The point remains: not having an inclusive cache&lt;br /&gt;enables certain cache protocol optimizations, such as [[eliminating&lt;br /&gt;unnecessary RFOs]], that are much harderto buildin an inclusive cache&lt;br /&gt;system.&lt;br /&gt;&lt;br /&gt;== Optionally Removing the L2$ ==&lt;br /&gt;&lt;br /&gt;The [[NI/NE]] cache structure makes it easier to optionally disable or&lt;br /&gt;remove the [[L2$]].  This option has been explored several times.&lt;br /&gt;&lt;br /&gt;Now, it is possible to do this with an inclusive cache structure by&lt;br /&gt;"faking out" the absent L2$, always returning "Yes, the cache line is&lt;br /&gt;included in all possible internal caches", such as the instruction and&lt;br /&gt;data and other caches.&lt;br /&gt;&lt;br /&gt;But it is very easy, in an inclusive cache hierarchy, to slip dfown&lt;br /&gt;the slippery slope and assume that the L2$ is present.  And then to&lt;br /&gt;take advantage of the L2$ being present in a way that breaks when the&lt;br /&gt;L2$ is absent. [[NI/NE]] seeems to avoid this.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;== Enabling [Funky Caches]] and Decoupling Associativity ==&lt;br /&gt;&lt;br /&gt;In an [[inclusive cache]] system, one tends to want the associativity&lt;br /&gt;of the outer, inclusive, cache to be greater than or equal to the sum&lt;br /&gt;of the associativities of the inner caches it "covers".&lt;br /&gt;&lt;br /&gt;E.g. if you have a 16 way associative outer cache that is inclusive of&lt;br /&gt;8 inner caches, all of 4 way associativity, and if the usual indexing&lt;br /&gt;of the cache using bitfields is performed (which usually implies a&lt;br /&gt;1:many relationship between inner sets and outer sets), then it is&lt;br /&gt;possible that for some set So of the outer cache mapping to sets Si of&lt;br /&gt;each of the inner caches, each inner cache may want to have a distimct&lt;br /&gt;line in each of the 4 ways.  Implying that the inner caches may "want"&lt;br /&gt;to have 8*4 distinct lines.  Which exceeds the 16 way associativity of&lt;br /&gt;the outer cache.&lt;br /&gt;&lt;br /&gt;[[NI/NE]] caches do not have this constraint on associativity.&lt;br /&gt;&lt;br /&gt;Furthermore, when playing the [[Funky Cache Rag]], exploring&lt;br /&gt;structures such as [[victim caches]] or [[prefetch caches]] or&lt;br /&gt;specialized caches for different data types, [[cache inclusion]] once&lt;br /&gt;again gets in the way.  Often these [[Funky Caches]] are fully&lt;br /&gt;assiociative.  They do a great job of reducing glass jaws related to&lt;br /&gt;cache associativity - e.g. a 16 or 32 entry victim cache makes a 4 way&lt;br /&gt;cache look like it has a much higher effective associativity.  But a&lt;br /&gt;32 entry fully associative victim or prefetch cache may only be able&lt;br /&gt;to use 16 entries, if all entries map to the same way in the outer&lt;br /&gt;cache.  [[Cache inclusion]] prevents some of the worst [[glass jaws]]&lt;br /&gt;that these caches can fix, from being fixed.&lt;br /&gt;&lt;br /&gt;Now, these interactions are not necessarily large, at least in average&lt;br /&gt;effect, although they can be a fertiole source of [[glass jaws]].&lt;br /&gt;&lt;br /&gt;However, they do couple the design of the outer cache and the inner&lt;br /&gt;cache.  They mean, for example, that the outer cache design team&lt;br /&gt;cannot simply change associativity without checking with the inner&lt;br /&gt;cache design team, which may be making assumptions based on the&lt;br /&gt;earlier outer cache design team [[POR]] for associativity.  It means&lt;br /&gt;that the outer and inner caches are more tightly coupled.  Which can&lt;br /&gt;lead to more efficient designs.  But which can also make it slower and&lt;br /&gt;harder to innovate.&lt;br /&gt;&lt;br /&gt;== Backwards Invalidation Traffic ==&lt;br /&gt;&lt;br /&gt;There are two main methods to maintain [[cache inclusion]]:&lt;br /&gt;[[backwards invalidation]], and the considerablly less practical&lt;br /&gt;[[outer cache victim identification by inner cache]].  I will ignore&lt;br /&gt;the latter.&lt;br /&gt;&lt;br /&gt;[[Backwards invalidation]]s are simple and straightforward.  But they&lt;br /&gt;also consume power, affect performance on an outer to inner cache&lt;br /&gt;interconnect that may be saturated.  And, in any case, are an&lt;br /&gt;incremental source of complexity.&lt;br /&gt;&lt;br /&gt;[[NI/NE]] cache hierarchies eliminate [[backwards invalidation]]s.&lt;br /&gt;&lt;br /&gt;Now, in the 2011 Intel processors that motivated this article,&lt;br /&gt;Sandybridge, with an inclusive snoop filtering [[LLC]] but a [[NI/NE]]&lt;br /&gt;[[MLC]], backwwards invalidations are still required for LLC&lt;br /&gt;evictions.  But not for MLC evictions.&lt;br /&gt;&lt;br /&gt;Let's talk some scenarios:  &lt;br /&gt;&lt;br /&gt;In workloads that use mainly read-only data, [[NI/NE]] allows [[silent eviction of dirty data]], but [[cacheinclusion]] requires BIs.&lt;br /&gt;&lt;br /&gt;In workloads that use a lot of shared data, there may be multiple BIs for every inclusive cache eviction.&lt;br /&gt;&lt;br /&gt;Etc. etc.&lt;br /&gt;&lt;br /&gt;It is by no means certain that power saved by not [[probe filtering]] is not outweighed by the power saved by eliminating BIs.&lt;br /&gt;It all depends on the type of multiproceessing in the system.&lt;br /&gt;If the coherency traffic from beyond the outer cache is small relative to the cache thrashing inside the system, &lt;br /&gt;e.g. if you have 16 cores crunching on large data serts, but no other multicore CPU chips and relatively small I/O,&lt;br /&gt;BI traffic may outweigh snoop filttering.&lt;br /&gt;But if you are building a very large shared memory system with many other CPUs and much cache coherent I/O traffic,&lt;br /&gt;then inclusion may win.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= Are [[NI/NE]] caches still important? =&lt;br /&gt;&lt;br /&gt;Obviously, since [[NI/NE]] caches are still in use in the Intel&lt;br /&gt;Sandybridge processors being released at the time of writing in 2011,&lt;br /&gt;they are still important in a practical sense?  But is this just&lt;br /&gt;[[legacy computer architecture]] dating back to the original P6?  If&lt;br /&gt;they were desiged from scratch, would [[NI/NE]] still be built&lt;br /&gt;&lt;br /&gt;Maybe... lacking access to full simulation results, it is hard to say.&lt;br /&gt;&lt;br /&gt;However, many of the advantages of [[NI/NE]] relate to making design&lt;br /&gt;easier, decoupling inner and outer cache design.  This is probably one&lt;br /&gt;source of the longevity of the [[P6 microarchitecture]], which is now&lt;br /&gt;coming to an end with Sandybridge.&lt;br /&gt;&lt;br /&gt;Certainly, the old advantage of [[eliminating unnecessary RFOs]] no&lt;br /&gt;longer stands.  But that was always one of the most minor advantagesof&lt;br /&gt;[[NI/NE]].&lt;br /&gt;&lt;br /&gt;Much depends on the system.  [[Probe filtering]] [[coherency traffic]]&lt;br /&gt;makes a big difference in large multiprocessor systems with many outer&lt;br /&gt;caches.  It also helps with large amounts of high bandwidth cache&lt;br /&gt;coherent I/O, as sometimes arises with ill-tuned [[discrete GPU]].&lt;br /&gt;(Most discrete GPUs are tuned to use [[non-cache coherent I/O]], if&lt;br /&gt;accessing [[UMA]] or processor memory at all.)&lt;br /&gt;&lt;br /&gt;But as multicore  systems get more  cores,&lt;br /&gt;and GPUs are largely integrated,&lt;br /&gt;there may be less bandwidth on the other side of the [[outermost cache]].&lt;br /&gt;&lt;br /&gt;== Where does cache inclusion win? ==&lt;br /&gt;&lt;br /&gt;=== Snoop or probe filtering ===&lt;br /&gt;&lt;br /&gt;One big win for cache inclusion is [[snoop filtering]].&lt;br /&gt;&lt;br /&gt;However, such [[probe filtering]] does not necessarily require that&lt;br /&gt;the cache data be inclusive.  A data-less [[probe filter]] or&lt;br /&gt;[[ownership cache]], a structure that allows ownership to be tracked,&lt;br /&gt;and which permits lines to be exclsuively owned even though in main&lt;br /&gt;memory may provide the benefits of [[probe filtering]] without the&lt;br /&gt;costs of cache inclusion on a relatively small data cache.  At the&lt;br /&gt;cost of a new [[hardware data structure]], the [[ownership cache]].&lt;br /&gt;&lt;br /&gt;Basically, the ownership cache maintains the cache inclusion property,&lt;br /&gt;not necessarily the data cache.  Since the ownership cache maintains&lt;br /&gt;only a few state bits per cache line, the ownership cache can covedr&lt;br /&gt;much more memory than an ordinary data cache.  It is therefore less&lt;br /&gt;liable to the issues of cache associativity.&lt;br /&gt;&lt;br /&gt;=== Power management ===&lt;br /&gt;&lt;br /&gt;Probably the most important win for [[cache inclusion]] is wrt power&lt;br /&gt;management - really just an elaboration of [[snoop filtering]].&lt;br /&gt;&lt;br /&gt;If you have [[cache inclusion]] you can power down all of the inner&lt;br /&gt;caches, putting them into a low power mode that maintains state, but&lt;br /&gt;which does not snoop.&lt;br /&gt;&lt;br /&gt;Whereas if you have a [[NI/NE]] cache, you either must keep snooping&lt;br /&gt;the inner caches, or you must flush the inner caches when going into&lt;br /&gt;such a [[no-snoop cache mode]].&lt;br /&gt;&lt;br /&gt;Now, you always needto be able to do such cache flushes.  But they&lt;br /&gt;take time, and if they are not required you can enter tye low power&lt;br /&gt;mode more quickly, and more often, and thereby save more power.&lt;br /&gt;&lt;br /&gt;Note that an inclusive LLC in combination with a NI/NE [[MLC]] and [[FLCs]]&lt;br /&gt;gives the best of both worlds in this regard.&lt;br /&gt;Note also that a datalerss [[snoop filter]] also works.&lt;br /&gt;&lt;br /&gt;= Conclusion =&lt;br /&gt;&lt;br /&gt;Many people, taught in school about [[inclusive caches]], are&lt;br /&gt;surprised to learn how common and important [[exclusive caches]] and&lt;br /&gt;[[non-inclusive/non-exclusive caches]] are, or at least have been, in&lt;br /&gt;history.&lt;br /&gt;&lt;br /&gt;It is by no means clear that any approiach is always best.  All 3&lt;br /&gt;approaches have pros and cons, and win in some situations.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-2489365523708955684?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Why_non-inclusive/non-exclusive_caches' title='Why non-inclusive/non-exclusive caches?'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/2489365523708955684/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=2489365523708955684' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/2489365523708955684'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/2489365523708955684'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/02/why-non-inclusivenon-exclusive-caches.html' title='Why non-inclusive/non-exclusive caches?'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-4678022993508129134</id><published>2011-02-27T10:31:00.001-08:00</published><updated>2011-02-27T10:31:48.678-08:00</updated><title type='text'>Cache replacement - tweaks and variations</title><content type='html'>http://semipublic.comp-arch.net/wiki/Cache_replacement&lt;br /&gt;{{Terminology Term}}&lt;br /&gt;[[Category:Cache]]&lt;br /&gt;&lt;br /&gt;[[Cache replacement]] or [[cache victim selection]] is the method one uses to choose which cache line will be overwritten or replaced, i.e. victimized, when a new cache line is pulled in.&lt;br /&gt;&lt;br /&gt;Common algorithms include&lt;br /&gt;* [[Random replacement]]&lt;br /&gt;* [[LRU (least recently used)]]&lt;br /&gt;** approximations to [[LRU]:&lt;br /&gt;*** [[pseudo-LRU]] or [[tree LRU]]&lt;br /&gt;*** [[the clock algorithm]]&lt;br /&gt;&lt;br /&gt;= Minor Tweaks =&lt;br /&gt;&lt;br /&gt;== When is the cache [[LRU information]] or [[cache usage information]] updated? ==&lt;br /&gt;&lt;br /&gt;E.g. do you update it speculatively, for instructions that may be on a [[branch misprediction wrong path]], thereby corrupting the  LRU?&lt;br /&gt;&lt;br /&gt;Should you update at retirement?  (Probably not, but just in case.)&lt;br /&gt;:By the way, here is a reason to update at  retirement: so that you get more [[deterministic behavior]], which eases validation.  However, this only works if you do not do [[speculative cache misses]].&lt;br /&gt;&lt;br /&gt;Perhaps you should only update it for slightly speculative instructions:&lt;br /&gt;* e.g. do not update for highly speculative [[SpMT]] threads?&lt;br /&gt;* e.g. update only for isntructions that have not retired, but for which all earlier branches have been resolved.  (Such instructions are  stil speculative - there may be a page  fault or exception or interrupt - but they7 are only [[slightly speculative]]).&lt;br /&gt;&lt;br /&gt;== What requests update the LRU information? ==&lt;br /&gt;&lt;br /&gt;Should all requests, reads and writes, update the LRU information equally?&lt;br /&gt;&lt;br /&gt;Many have proposed [[non-temporal hint bits]] in instructions, to say "ignore this access".&lt;br /&gt;&lt;br /&gt;It is an open issue whether prefetches, whether initiated by [[prefetch instruction]]s or by [[hardware prefetchers]],&lt;br /&gt;should update the LRU.&lt;br /&gt;&lt;br /&gt;Multilevel caches:&lt;br /&gt;* update the LRU on all accesses&lt;br /&gt;* on misses&lt;br /&gt;* on [[LRU leak-through]] from the inner cache&lt;br /&gt;* only on capacity evictions from the inner cache (suggested in comp.arch by EricP on 2/25/2011)&lt;br /&gt;&lt;br /&gt;By the way, the idea of updating the cache usage information only on capacity evictions from the inner cache&lt;br /&gt;exposes several issues:&lt;br /&gt;&lt;br /&gt;&lt;UL&gt;Updating the LRU information and advancing the LRU pointer are  two different issues.Invalidation traffic may result in several coherency cache misses between capacity misses.It would be bad to keep choosing the same victim.Oftentimes coherency misses do not need a victim chosen: they fill into the empty or stale or non-present line left behind by the invalidation.(May not happen in all systems.)Dirty writebacks naturally notify the outer cache of a capacity replacement.However, replacing a clean line may not naturally require such notiification:i.e. we may have [[silent replacement of clean cache lines]].This may require [[LRU leakthrough]] so that the outrr cache can track. &lt;/UL&gt;&lt;br /&gt;== When is the victim chosen? ==&lt;br /&gt;&lt;br /&gt;Should you choose the victim at the time the cache is missed,&lt;br /&gt;or at the time the data to be placed in the cache line has returned?&lt;br /&gt;&lt;br /&gt;It may not matter on an in-order machine.  However, on an out-of-order machine, chooising the victim early may raise isssues, such as what should happen if too many cache misses to the same set occur &lt;br /&gt;- i.e. if the victim itself needs to be victimized before the first victim's replacement has arrived.&lt;br /&gt;&lt;br /&gt;One advantage of choosing the victim early is that you may be able to send the data for the [[cache line  fill]]&lt;br /&gt;returning directly to where it belongs in the cache - you may not have to stage it through a [[fill buffer]],&lt;br /&gt;and you may be able to avoid fairly expensive [[fill buffer forwarding]] logic.&lt;br /&gt;I.e. you may be able to have [[data-less fill buffers]].&lt;br /&gt;&lt;br /&gt;&lt;UL&gt;Half baked Idea:  choose the  victim early.  But also allocate a fill buffer.  At [[fill time]], determine if the early victim choice is still accurate. If not, write to the fill buffer.(I call this a half baked idea because it doesn't really solve the problem of wanting to avoid data fill buffers.Elaboration: choose the victim early.  Allocate a [[address and control fill buffer]], but do not allocate a [[data fill buffer]].If the  victim is thrashed, allocate a [[data fill; buffer]] (which may simply be a [[spill buffer]] allocated circularly.At fill time, choose.&lt;/UL&gt;&lt;br /&gt;== Biasing the victim choice ==&lt;br /&gt;&lt;br /&gt;* Prefer invalid lines, rather than replacing valid lines.&lt;br /&gt;* Prefer  clean lines to dirty lines &lt;br /&gt;:: Thereby avoiding [[dirty writeback]] traffic.&lt;br /&gt;&lt;br /&gt;* in multilevel caches, prefer lines  that are in none of, or the fewest, inner  caches&lt;br /&gt;&lt;br /&gt;* prefer to replace lines containing data to lines containing instructions&lt;br /&gt;** or vice versa - although oftentimes [[I$]] misses are more expensive than [[D$]] misses&lt;br /&gt;&lt;br /&gt;* extend the above to more data types - integer versus FP (FP is often accessed in a cache  thrashing manner)&lt;br /&gt;&lt;br /&gt;= Better cache replacement algorithms =&lt;br /&gt;&lt;br /&gt;There has been much work on trying to adjust these algorithms.&lt;br /&gt;&lt;br /&gt;Ideally, we know that [[Belady's algorithm]] is optimal for many assumptions.&lt;br /&gt;It amounts to "replacing the cache line that will be used furthest in the future".&lt;br /&gt;It must be adjusted when there are non-uniform costs.&lt;br /&gt;&lt;br /&gt;&lt;UL&gt;There have been attempts to [[approximate Belady cache replacement using speculation and lookahead]].E.g. do not replace a line that is  used in an instruction window.This works better when you have a really large instruction window, such as might be provided by [[SpMT]].It also may not be necessary if the [[LRU bits]] or [[cache usage information]] is updated speculatively.Others have attempted to unroll memory access pattern predictors,to get a list of predicted accesses, against which a Belady query can be made&lt;/UL&gt;&lt;br /&gt;See [[Victim Choice for Multilevel and Shared Caches]] for a discussion of issues in multilevel cache victim selection.&lt;br /&gt;One of the main issues is that in a [[multilevel cache hierarchy]] the LRU bits of the outer caches  are not adjusted&lt;br /&gt;by accesses to the inner  caches, so choosing a victim based purely on LRU bits updated only by accesses sent to the outer cache&lt;br /&gt;is often not good.&lt;br /&gt;Many proposals [[leak-through LRU]] information, to allow the outer cache to track the inner LRU.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;It is well known that certain access patterns are not well suited to LRU cache replacement.&lt;br /&gt;For example, circularly accessing N lines, in a cache of M lines, M &lt; N, is better suited by MRU cache replacement than LRU.&lt;br /&gt;Many have proposed to exploit this, e.g. by [[non-temporal]] hint bits attached to instructions,&lt;br /&gt;or by predictors that attempt to identify such non-temporal cache access patterns.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-4678022993508129134?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Cache_replacement' title='Cache replacement - tweaks and variations'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/4678022993508129134/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=4678022993508129134' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4678022993508129134'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4678022993508129134'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/02/cache-replacement-tweaks-and-variations.html' title='Cache replacement - tweaks and variations'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-58561746949793521</id><published>2011-02-27T08:26:00.000-08:00</published><updated>2011-02-27T08:26:11.674-08:00</updated><title type='text'>Reading List for Computer  Architecture (seed)</title><content type='html'>http://semipublic.comp-arch.net/wiki/Reading_List_for_Computer_Architecture&lt;br /&gt;&lt;br /&gt;TBD: &lt;br /&gt;&lt;br /&gt;I am often asked  for a reading list for computer architecture.&lt;br /&gt;&lt;br /&gt;Unfortunately, there aren't many good books in computer architecture.&lt;br /&gt;&lt;br /&gt;= [[Hennessy and Patterson]] =&lt;br /&gt;&lt;br /&gt;[[Hennessy and Patterson]] is the current must-read, since its first publication&lt;br /&gt;but that is not really a comprehensive book or  survey of the field, &lt;br /&gt;as a text book suitable for classes, espousing a quantitative design philisophy.&lt;br /&gt;&lt;br /&gt;In fact, one of the reasons I am trying to create the comp.arch wiki is as an antidote to Hennessy and Patterson.&lt;br /&gt;I want to collect a survey or grab bag of many techniques not mentioned in [[H&amp;P]],&lt;br /&gt;or potentially deprecated by them.&lt;br /&gt;Many students raised on H&amp;P, particularly the early versions, developed a common attituide&lt;br /&gt;- they were [[RISC bigot]], and deprecated any advanced microarchitecture such as  [[OOO]],&lt;br /&gt;since they felt sure that the [[simple 5-stage pipeline]] was the be-all and end-all.&lt;br /&gt;It is not fair to ascribe these viewpoints to Professors Hennessy and Patterson&lt;br /&gt;- their views are much more nuanced&lt;br /&gt;- but nevertheless this viewpoint was common.&lt;br /&gt;&lt;br /&gt;= Random Grab-bag =&lt;br /&gt;&lt;br /&gt;This list was assembled from a recent email to an undergraduate (2011).&lt;br /&gt;TBD: collect past recommended  reading lists.&lt;br /&gt;&lt;br /&gt;Hennessy and Patterson is required reading.  However, I don't much like any of the textbooks - which is why I am working on my wiki, sort of as an antidote.&lt;br /&gt;&lt;br /&gt;You pretty much have to read the conferences and journals:&lt;br /&gt;* [[ISCA]]&lt;br /&gt;* [[ASPLOS]]&lt;br /&gt;* [[Micro]]&lt;br /&gt;* [[HPCA]]&lt;br /&gt;* [[Hotchips]]&lt;br /&gt;* [[ISSCC]] - usually lower level than computer architecture, but oftentimes the only information you can get about industry chips&lt;br /&gt;&lt;br /&gt;I suggest reading every issue of [[Microprocessor Report]] you can, although I warn you that there was a period when MPR was owned by EE Times when it was not very good.  Back in the 80s and 90s,  MPR was the bible, by installments; and since The Linley Group bought it back it has  become much better than it was when EE Times owned it.&lt;br /&gt;&lt;br /&gt;* the old [[Microprocessor Forum]] conference&lt;br /&gt;&lt;br /&gt;As for reading the conferenves and journals, a good place to start would be the "Best of" issues:&lt;br /&gt;&lt;br /&gt;* 25 Years of ISCA, Selected Papers, ed Guri Sohi&lt;br /&gt;&lt;br /&gt;* 20 years of ACM SIGPLAN conferencvve on PLDA, 1979 - 1999, ed McKinley.&lt;br /&gt;&lt;br /&gt;Other books on my shelf:&lt;br /&gt;&lt;br /&gt;* Mead Conway VLSI - old, but classic.  Read it.&lt;br /&gt;&lt;br /&gt;* Laura Prince, Semiconductor Memories -old; I wish there was a better book on DRAM.  Probably better to read papers.&lt;br /&gt;&lt;br /&gt;* Bell and Newell, &lt;br /&gt;* Bell Sieworik and Newell,&lt;br /&gt;:: Computer Structures&lt;br /&gt;::- this was the classsic when I was in school. Very old now.&lt;br /&gt;&lt;br /&gt;* Mike Flynn, Computer Architecture&lt;br /&gt;* Blauuw and Brooks, Computer Architecture&lt;br /&gt;&lt;br /&gt;* Pfister - In search of clusters.&lt;br /&gt;&lt;br /&gt;Many many instruction set manuals&lt;br /&gt;&lt;br /&gt;esp. &lt;br /&gt;* the IBM 360 Principles of Operation&lt;br /&gt;&lt;br /&gt;* Hacker's Delight&lt;br /&gt;&lt;br /&gt;* Inside the AS/400&lt;br /&gt;&lt;br /&gt;Organick's books on computer hardware&lt;br /&gt;* TBD: list&lt;br /&gt;&lt;br /&gt;* The Soul of a New Machine&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Use a library and used book store.   I bought, and still buy, a lot of 2nd hand books.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;* read manuals on http://bitsavers.org&lt;br /&gt;&lt;br /&gt;= Best of Lists and Collections =&lt;br /&gt;&lt;br /&gt;* papers that won the ACM SIGARCH and IEEE-CS TCCA ISCA Influential Paper Award, http://www.sigarch.org/influential_paper.html&lt;br /&gt;&lt;br /&gt;= Bibliography Surfing =&lt;br /&gt;&lt;br /&gt;Many students, especially graduate students assemble personal bibliographies and reference lists of papers they encountered in their research.&lt;br /&gt;The bravest students may annotate the references, with discussions of value.&lt;br /&gt;Many such lists can be found on the web.&lt;br /&gt;Use them shamelessly for inspiration - but beware that your opinions may not agree with the bibliographers./&lt;br /&gt;&lt;br /&gt;Unfortunately, more experienced professionals, professors, etc.&lt;br /&gt;have usually learned to only provide positive recommendations,&lt;br /&gt;and to never make public anti-recommendations &lt;br /&gt;about authors who may eventually be recommending funding for the reviewer.&lt;br /&gt;&lt;br /&gt;Still more experienced professionals may recommend papers that they dislike,&lt;br /&gt;because not mentioning them would be an obvious dis.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-58561746949793521?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Reading_List_for_Computer_Architecture' title='Reading List for Computer  Architecture (seed)'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/58561746949793521/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=58561746949793521' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/58561746949793521'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/58561746949793521'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/02/reading-list-for-computer-architecture.html' title='Reading List for Computer  Architecture (seed)'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-4496684722084119199</id><published>2011-02-21T14:04:00.000-08:00</published><updated>2011-02-21T14:04:42.928-08:00</updated><title type='text'>Pattern History Invariant Branch Prediction</title><content type='html'>&lt;UL&gt;Taking a vacation day - actually, a so-called "floating holiday"on today, USA President's Day- so that I could be with my daughter, who is off school,and so that I could pick up my wife at the airport.&lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;https://semipublic.comp-arch.net/wiki/Pattern_History_Invariant_Branch_Prediction&lt;br /&gt;{{Terminology Term}}&lt;br /&gt;[[Category:Branch Prediction]]&lt;br /&gt;&lt;br /&gt;= Terminology =&lt;br /&gt;&lt;br /&gt;Many branch predictors have a property that I call [[Pattern History Invariant Branch Prediction]].&lt;br /&gt;By this I mean that the cannot change their prediction for a branch, e.g. from taken to not-taken, or vice versa, without having an intervening misprediction.&lt;br /&gt;&lt;br /&gt;&lt;UL&gt;Okay, I admit: "Pattern History Invariant" is a bad name.  It is pompous, and not very intuitive.I welcome proposal for a different name.  Something like "Branch predictor that can't change it's mind without a misprediction"?  &lt;UL&gt;== Bad Joke ==Oooh, how about "Can't change its mind without a fight"?  I'm tempted to call it a Scottish, Irish,or a "My Wife" branch predictor - but I'll get in trouble for any and all of these,and only mention them because I am humor impaired.&lt;/UL&gt;&lt;/UL&gt;&lt;br /&gt;= Why does this matter? =&lt;br /&gt;&lt;br /&gt;== [[Timing Invariant Branch Prediction]] ==&lt;br /&gt;&lt;br /&gt;Pattern history invariance can be a useful property, because it makes it easier to build&lt;br /&gt;[[Timing Invariant Branch Prediction]].&lt;br /&gt;Which has several good properties, such as making val&lt;br /&gt;&lt;br /&gt;In particular, it means that delaying updating the pattern tables used to make predictions until retirement will NOT result in any more mispredictions than would otherwise occur&lt;br /&gt;- any such mispredictions would already be on the wrong path, younger than a misprediction that would occur no matter what.&lt;br /&gt;&lt;br /&gt;Delaying the pattern table update from, e.g. branch execution to retirement has the possible benefit that [[wrong path]] execution will not corrupt the branch predictor.&lt;br /&gt;&lt;br /&gt;However, it may be pointed out that it is possible that there will be a performance loss because of other benefits of execution on the multiply speculative wrong path that speculatively updating a branch pattern history table may bring.&lt;br /&gt;But to get this, you must have several nested effects:&lt;br /&gt;* there must be a first branch misprediction pending&lt;br /&gt;* a younger branch may update the predictor tables&lt;br /&gt;* this may result in a misprediction underneath the first branch misprediction&lt;br /&gt;* which may itself update the predictor tables&lt;br /&gt;** possibly eliminating a misprediction when execution resumes after the first branch misprediction is repaired&lt;br /&gt;** possibly producing instruction prefetch and other [[good wrong path effects]]&lt;br /&gt;&lt;br /&gt;I.e. with [[Pattern History Invariant Branch Prediction]] there must be a misprediction to result in a change of prediction.&lt;br /&gt;However, that misprediction may itself occur, and be repaired, speculatively.&lt;br /&gt;And that misprediction may have other [[good wrong path effects]].&lt;br /&gt;&lt;br /&gt;== [[BTB Unrolling]] ==&lt;br /&gt;&lt;br /&gt;[[Pattern History Invariant Branch Prediction]]&lt;br /&gt;also enables [[BTB Unrolling]]:&lt;br /&gt;give a current (IP,history) tuple&lt;br /&gt;and a separate pattern table used to make a prediction&lt;br /&gt;one can "unroll" the predictor to produce a trace&lt;br /&gt;of multiple, N, branch (from,to) addresses.&lt;br /&gt;And such a trace unrolling is guaranteed to be as accurate as if the predictor were accessed one branch at a time.&lt;br /&gt;&lt;br /&gt;= Examples of Pattern History Invariance =&lt;br /&gt;&lt;br /&gt;= Examples of non-Pattern History Invariance =&lt;br /&gt;&lt;br /&gt;I was tempted to say&lt;br /&gt;&lt;UL&gt;The most powerful example of a non-pattern history invariant branch predictor is a [[loop predictor]].E.g. it may predict that the loop closing branch is taken 99 times in a row, and then not-taken the 100th time.Even more advanced loop predictors, such as predicting that if the last few loops were executed were 67,66,65, times,then the next will be executed 64 times,are even less pattern history invariant.&lt;/UL&gt;But this is not true: such a loop predictor will always make the same prediction, if the current loop count&lt;br /&gt;is considered to be part of the history that used, along with the branch in question, to index &lt;br /&gt;the pattern tables.&lt;br /&gt;&lt;br /&gt;It is necessary to work ahrder to contrive branch predictors that are not pattern history invariant.&lt;br /&gt;&lt;br /&gt;TBD: I believe that certain McFarling-style branch predictors may break the pattern history invariance property.&lt;br /&gt;TBD: get the details right.&lt;br /&gt;&lt;br /&gt;In particular, in a multi-predictor system such as McFarling,&lt;br /&gt;if you update the predictors and choosers that are NOT being used as well as the predictors and choosers that are being used,&lt;br /&gt;you can "mask" a misprediction in a second predictor,&lt;br /&gt;if a first predictor was being used.&lt;br /&gt;If the choice of which predictor should be used can then change in the pattern table so that&lt;br /&gt;when the same (IP,history) pair is used at a later time the second predictor wins...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= See Also =&lt;br /&gt;&lt;br /&gt;* [[Branch predictor state update]]&lt;br /&gt;* [[Timing Invariant Branch Prediction]]&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-4496684722084119199?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='https://semipublic.comp-arch.net/wiki/Pattern_History_Invariant_Branch_Prediction' title='Pattern History Invariant Branch Prediction'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/4496684722084119199/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=4496684722084119199' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4496684722084119199'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4496684722084119199'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/02/pattern-history-invariant-branch.html' title='Pattern History Invariant Branch Prediction'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-6990099515689769808</id><published>2011-02-21T07:39:00.000-08:00</published><updated>2011-02-21T07:39:11.963-08:00</updated><title type='text'>Timing Invariant Branch Prediction</title><content type='html'>http://semipublic.comp-arch.net/wiki/Timing_Invariant_Branch_Prediction&lt;br /&gt;{{Terminology Term}}&lt;br /&gt;[[Category:Branch Prediction]]&lt;br /&gt;&lt;br /&gt;By [[Timing Invariant Branch Prediction]] I mean branch prediction that is independent of timing.&lt;br /&gt;&lt;br /&gt;E.g. a branch predictor design that, if you slowed down the clock or changed the pipeline, e.g. by inserting idle pipestages,&lt;br /&gt;would not vary.&lt;br /&gt;&lt;br /&gt;Such invariance is very convenient&lt;br /&gt;(1) for validation&lt;br /&gt;(2) for separation of concerns - it allows you to change the pipeline without worrying about its effect on branch prediction accuracy, etc.&lt;br /&gt;&lt;br /&gt;However, many microarchitectures do not adhere to this design principle.&lt;br /&gt;&lt;br /&gt;Tweaks and adjustments to the branch prediction microarchitecture may be  necessary to attain [[Timing Invariant Branch Prediction]].&lt;br /&gt;For example&lt;br /&gt;* Per-branch history often leads to timing dependent branch prediction, which can be remedied by [[path history]]&lt;br /&gt;* Updating the branch history tables can lead to timing dependent branch prediction&lt;br /&gt;&lt;br /&gt;= Per-branch history often leads to timing dependent branch prediction =&lt;br /&gt;&lt;br /&gt;For example, if you are using a [[per-branch history]] such as a [[TNT branch history]],&lt;br /&gt;in many pipelines there are several clocks between instruction fetch,&lt;br /&gt;i.e. delivering an instruction pointer to the instruction cache and branch predictor,&lt;br /&gt;and instruction decode.&lt;br /&gt;This means that an instruction fetch/branch prediction time you do not know where the branch instructions are;&lt;br /&gt;you only know where the branch instructions are at decode time.&lt;br /&gt;&lt;br /&gt;Therefore, unless you stall,&lt;br /&gt;branch prediction may be using a [[per-branch history]] that is several cycles out of date,&lt;br /&gt;and which may miss several branches&lt;br /&gt;from what it would be ideally using if branches could be identified instantaneously.&lt;br /&gt;Moreover, how many branches are missing may depend on timing, such as [[I$]] misses, pipeline stalls, etc.&lt;br /&gt;&lt;br /&gt;== Path history enables timing independent branch prediction ==&lt;br /&gt;&lt;br /&gt;If it is a problem to have timing dependent branch prediction &lt;br /&gt;caused by per-branch history, this can be assuaged by [[path history]] branch prediction.&lt;br /&gt;&lt;br /&gt;Instruction fetch does not necessarily know where the branches are.  However, it necessarily does know the sequence of instruction fetch addresses.&lt;br /&gt;If it is possible to create a path history suitable for use in a branch predictor,&lt;br /&gt;e.g. by XORing the instruction fetch pointers,&lt;br /&gt;then this [[path history]] is accurate and timing invariant.&lt;br /&gt;Since XOR hashing is fast, this probably can be acheived.&lt;br /&gt;&lt;br /&gt;However, XORing all fetch IPs may not be the best [[path history]] to use in branch prediction.&lt;br /&gt;Creating a hash, probably XOR based, of [[branch from/to addresses]],&lt;br /&gt;suffices to describe the path - although it losses information about branches between instruction fetch blocks.&lt;br /&gt;Hashing such a [[branch from/to addresses]] [[path history]] with the current instruction fetch pointer&lt;br /&gt;is about as good as you can do at instruction fetch,&lt;br /&gt;without identifying individual instructions.&lt;br /&gt;&lt;br /&gt;== Combining ... ==&lt;br /&gt;&lt;br /&gt;[[Path history]] based branch predictors are usually reported as being more accurate than [[per-branch history TNT]],&lt;br /&gt;so the pipeline adjustments above may help performance as well as providing [[Timing Invariant Branch Prediction]].&lt;br /&gt;&lt;br /&gt;However, if they do not, you can obtain a hybrid that provides many of the benefits of [[timing invariant branch prediction]]&lt;br /&gt;along with the possible improved accuracy of [[per-branch history]]:&lt;br /&gt;* use [[path history]] at instruction fetch&lt;br /&gt;* use [[per-branch history]] at the decoder, in a form of [[late-pipestage branch prediction]]&lt;br /&gt;&lt;br /&gt;This gives you timing invariance,&lt;br /&gt;but it also gives the [[decoder branch predictor]] the chance to make corrections to the earlier branch prediction.&lt;br /&gt;&lt;br /&gt;= Branch History Update Time =&lt;br /&gt;&lt;br /&gt;Q: when should the prediction tables, the [[pattern history table (PHT)]], also sometimes  called the [[branch histogram]] or [[branch history table (BHT)]],&lt;br /&gt;be updated?&lt;br /&gt;&lt;br /&gt;At execution time, or at retirement time.&lt;br /&gt;&lt;br /&gt;Updating at retirement time enables [[Timing Invariant Branch Prediction]]&lt;br /&gt;&lt;br /&gt;Updating at execution time may cause only minor issues on an in-order machine.&lt;br /&gt;On an out-of-order machine, however,  updates may be done out of order.&lt;br /&gt;In either case they may be done speculatively,&lt;br /&gt;for branches that will not actually be retired because of earlier misspeculations.&lt;br /&gt;&lt;br /&gt;Furthermore there arises the question of what history or [[stew]] is used to update the pattern history table.&lt;br /&gt;If every branch at execution carries its history or stew with it, no problem.&lt;br /&gt;But if a big complicated history is maintained only at retirement,&lt;br /&gt;some processor designs have updated the pattern table for branches at execution&lt;br /&gt;with a history corresponding to a position in the instruction stream several cycles before the branch.&lt;br /&gt;Not necessarily a consistent number of  cycles, either.&lt;br /&gt;&lt;br /&gt;Updating the prediction tables a tretirement time seems to avoid these issues.&lt;br /&gt;&lt;br /&gt;TBD: performance cost&lt;br /&gt;&lt;br /&gt;TBD: latencies of table update - immaterial if [[pattern history invariant branch prediction]].&lt;br /&gt;&lt;br /&gt;= Conclusion =&lt;br /&gt;&lt;br /&gt;Is [[Timing Invariant Branch Prediction]] an absolutely vital design goal?&lt;br /&gt;&lt;br /&gt;Not necessarily - if performance is increased by timing variant branch prediction, so be it.&lt;br /&gt;&lt;br /&gt;However, [[Timing Invariant Branch Prediction]] is definitely a nice thing to have:&lt;br /&gt;* it makes validation much easier&lt;br /&gt;* it makes the design more robust, less fragile, less likely to break if you have to add a pipestage or stall late in the design cycle.&lt;br /&gt;* it is usually associated with higher performance rather than lower performance branch prediction algorithms&lt;br /&gt;* it can usually be achieved by a hybrid predictor design.&lt;br /&gt;&lt;br /&gt;It is my experience that most timing dependent branch predictors&lt;br /&gt;happened by accident, rather than by design:&lt;br /&gt;* naive designers building a [[per-branch history]] out of a textbook&lt;br /&gt;* naive extension of in-order designs to out-of-order exacerbating unnecessary timing dependence&lt;br /&gt;etc.&lt;br /&gt;&lt;br /&gt;[[Timing Invariant Branch Prediction]] is not necessarily a must-have,&lt;br /&gt;but it is always a good thing to keep in mind when designing&lt;br /&gt;your branch predictor and your microarchitecture/pipeline.&lt;br /&gt;It is a pity to lose its benefits due to ignorance rather than deliberation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-6990099515689769808?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Timing_Invariant_Branch_Prediction' title='Timing Invariant Branch Prediction'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/6990099515689769808/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=6990099515689769808' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6990099515689769808'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6990099515689769808'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/02/timing-invariant-branch-prediction.html' title='Timing Invariant Branch Prediction'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-8131117133079760485</id><published>2011-02-18T20:55:00.001-08:00</published><updated>2011-02-18T20:55:40.074-08:00</updated><title type='text'>CSE (Common Subexpressions) in HW  andSW</title><content type='html'>http://semipublic.comp-arch.net/wiki/CSE_(common_sub-expression)&lt;br /&gt;&lt;br /&gt;CSE is an optimization that recognizes common sub-expressions, and saves them and reuses them rather than recalculating them.&lt;br /&gt;&lt;br /&gt;E.g.&lt;br /&gt;&lt;br /&gt;   var1 := (a*b) + c&lt;br /&gt;   var2 := (a*b) + d;&lt;br /&gt;&lt;br /&gt;can be rewritten as&lt;br /&gt;&lt;br /&gt;   tmp := a*b&lt;br /&gt;   var1 := tmp + c&lt;br /&gt;   var2 := tmp + d&lt;br /&gt;&lt;br /&gt;This is obvious, right?&lt;br /&gt;&lt;br /&gt;It is less obvious when you have, e.g. complex addressing modes such as (base+index*scale+offset).&lt;br /&gt;Should you CSE part of address computation?&lt;br /&gt;&lt;br /&gt;Sometimes it is cheaper to recalculate than it is to store, allocating a register that may spill other registers.&lt;br /&gt;&lt;br /&gt;[[Memoization]] is a form of hardware CSE, where an expensive execution recognizes recently performed calculations and avoids reperforming them.&lt;br /&gt;&lt;br /&gt;[[1/x inverse instructions]] provide greater opportunity for CSE than do ordinary [[divide instructions]].&lt;br /&gt;&lt;br /&gt;[[Micro-optimization_primitive_instructions]] provide more  opportunity for [[CSE]]ing.&lt;br /&gt;&lt;br /&gt;[[Hardware common-sub-expression elimination]] has been proposed - by me (Andy Glew), in my abandoned Ph.D. on [[instruction refinement]],&lt;br /&gt;if by no one elsse.  In fact [[nearly all Dragon-book compiler optimizations can be doone in hardware]].&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[[Instruction reuse]] is a hardware technique  that extenddsCSE dynamically over moderate  to large distances.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-8131117133079760485?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/CSE_(common_sub-expression)' title='CSE (Common Subexpressions) in HW  andSW'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/8131117133079760485/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=8131117133079760485' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8131117133079760485'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8131117133079760485'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/02/cse-common-subexpressions-in-hw-andsw.html' title='CSE (Common Subexpressions) in HW  andSW'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-4019919407459124260</id><published>2011-02-18T20:42:00.001-08:00</published><updated>2011-02-18T20:42:45.972-08:00</updated><title type='text'>Micro-optimization primitive instructions</title><content type='html'>http://semipublic.comp-arch.net/wiki/Micro-optimization_primitive_instructions&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;By [[micro-optimization primitive instructions]] I mean instructions that expose the internal steps and partial results of what are otherwise considered to be primitive instructions.&lt;br /&gt;&lt;br /&gt;One of the best academic and published examples is in the  Dally paper referenced at the  bottom: &lt;br /&gt;[[Micro-optimization of floating-point operations]].&lt;br /&gt;However, other examples have been found in actual instruction sets, particularly before transistor  counts increased to the point where fully pipelined floating point [[FADD]] and [[FMUL]] instructions were common.&lt;br /&gt;&lt;br /&gt;= Examples of Floating-Point Micro-Optimization Instructions (from Dally paper) =&lt;br /&gt;&lt;br /&gt;Examples from Dally paper:  (transcribed not quite exactly)&lt;br /&gt;&lt;br /&gt;;Automatic Block Exponent&lt;br /&gt;:Identifying the largest exponent is cascaded additions, aligning all mantissae and adding without normalization.&lt;br /&gt;* Saves: power of intermediate normalizations and roundings, unnecesary shifts&lt;br /&gt;* Cost: Precision, unless extra mantissae width.See also [[block floating-point]], [[superaccumulator]].&lt;br /&gt;&lt;br /&gt;;Shift Combining&lt;br /&gt;:An alternative to automatic block exponent used in casxes where order of operations must be preserved: avoids repeatedly shifting left for normalization, and then shifting right for alignment.&lt;br /&gt;;I.e. combines normalization and alignment shifts.&lt;br /&gt;* Saves: power in redundant shifts.&lt;br /&gt;* Cost: precision, sometimes&lt;br /&gt;&lt;br /&gt;;Post Multiply Normalization&lt;br /&gt;:Maintain extra guad bits to the right of mantissa, to avoid normalizzation shift, since multiplication of normalized numbers can denormalize by at most one bit position. (Denorm*norm can produce greater shifts.)&lt;br /&gt;&lt;br /&gt;;Conventional Optimizations&lt;br /&gt;:E.g.&lt;br /&gt; (A + B) * (A - B)&lt;br /&gt;can [[CSE (common sub-expression)]] the alignment of A and B.&lt;br /&gt;* Saves: power&lt;br /&gt;* Cost: the exposed micro-optimization, possibly requiring a few extra bits of mantissa and exponent.&lt;br /&gt;&lt;br /&gt;;Scheduling&lt;br /&gt;:Exposing primitive operations permits more effective optimization by compiler (or, for that matter, by a dynamic or [[OOO]] scheduler).&lt;br /&gt;&lt;br /&gt;Dally suggests a fairly horizontal instruction format for these microoptimizations, consisting of &lt;br /&gt;* Exponent Ops&lt;br /&gt;** E+, E1 - exponent add/subtract (EC &lt;- EA op EB)** FF1 - returns shift required  to normalize mantisssa MA (EC &lt;- FF1 MA)* LDE, STE - load  or store exponent as integer* Mantissa Ops** M+, M-, M* - mantisssa add, subtract, multiply (MC &lt;- MA  op MB)** SHR, SHL - mantissa shift left or right; supports [[shifting by a negative count]] in either dirrection** ABS, NEG - zeroes and  complements the mantissa sign bit** LDM, STM - loads and stores mantissa as integer** LDF, STF - load and store mantissa and exponent together as standard floating point number* Branch Ops** BNEG - [[branch on exponent negative]]** Bcond - [[branch on exponent and mantissa compare]]] (EA,MA) relop (EB,MB)= Other Possible Micro-optimizations === Booth Encoding ==Many processors use Booth encoding and multiplierarrays,for both integer and floating point.Booth encoding involves (a) preparing multiples of one operand.  Unfortunatyely, the actual multiples depeend on how advanced the multiplier is - {-1,0,+1}, {-2,-1,0,+1,+2}, 3X, 4X, etc.(b) using the  second operand selecting which multiples of the  first operand are to be fed into the array for each digit group.We can certainly imagine [[CSE]]ing the work in preparing such multiples, for eother operand.Unfortunately, the multiples needed change with the exact multiplier used - one microarchitecture might require produced -1X and 3X multiples (1X and  2X are "free", shifting - while another might require 5X, etc.Therefore, the number of bits needed might change  from chip to chip.We can imagine storing these large intermediate results in a SIMD packed vector, via an instruction that looks like: vreg256b &lt;- booth_encode(64b)Heck - that might be a useful operation to provide even if not doing microoptimization. It has uses.The details of the Booth encoding used might be hidden.However, perhaps an easier approach might be to do this microoptimization in microarchitecture:e.g. have a [[dynamic scheduler]] recognize several multiplications that share a common first operand,and then decide to skip the [[Booth encoding]] pipestage.The  skip might shorten the pipeline - or it might just be used to save power.This amounts to [[memoizing]] the Booth encoding.== Virtual Memory Translations ==There are many, many, repeated translations of nearby virtual addresses that result in the same physical page address.; CacheTranslation with Base RegistersE.g. Austin and Sohi suggested caching the translations next to the base register, and thereby avoiding TLB lookup, and supporting more  translatuons on a low ported TLB.&lt;UL&gt;&lt;br /&gt;Todd M. Austin and Gurindar S. Sohi. 1996. High-bandwidth address translation for multiple-issue processors. In Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96). ACM, New York, NY, USA, 158-167. DOI=10.1145/232973.232990 http://doi.acm.org/10.1145/232973.232990&lt;br /&gt;&lt;/UL&gt;Equivalently, power can be saved.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;; Instructions to Save Translations&lt;br /&gt;We can imagine that this might be exposed to software as an instruction that looks like&lt;br /&gt;  treg := [[translate_virtual_address]]( virtual address )&lt;br /&gt;  dreg := load_using_physical_adddress( treg + offset )&lt;br /&gt;  store_using_physical_address( treg + offset)&lt;br /&gt;&lt;br /&gt;The first instruction saves  the translation, and any permissions bits required, in treg.&lt;br /&gt;treg might be an ordinary address sized integer register, or a special register to hold such a translation.&lt;br /&gt;&lt;br /&gt;Note: this instruction, [[translate_virtual_address]], is often requested even outside the context of micro-optimization.&lt;br /&gt;See [[uses of an instruction to save virtual to physical address translation]].&lt;br /&gt;&lt;br /&gt;The load and store operations use a saved translation. I would imagine that they would trap if the memory location specified by offset and sizze fell outside the saved translation.&lt;br /&gt;&lt;br /&gt;NOTE: in a [[virtual machine]] environment, a Guest OS executing this instruction would get at most a partial benefit.&lt;br /&gt;&lt;br /&gt;;User Mode Instructions to Save Translations&lt;br /&gt;Instructions such as [[load_using_physical_address]] can only be used by privileged code - if software can construct arbitrary physical registers.&lt;br /&gt;&lt;br /&gt;But if the saved translation lives in a special treg, translation register, that can only be written by certain instructions,&lt;br /&gt;then unprivileged code could employ this instruction.&lt;br /&gt;Equivalently, this could be a [[privilege tagged capability register]] - possibly an ordinary register, with an extra bit that &lt;br /&gt;gets set by the translate instruction.&lt;br /&gt;The physical memory accesses would require that bit to be set.&lt;br /&gt;Other operations on such a register might clear the tag bit.&lt;br /&gt;&lt;br /&gt;This is how [[capability registers]] work for other capabilities;&lt;br /&gt;physical addresses are just an extra capability.&lt;br /&gt;&lt;br /&gt;Except...  physical addresses can change, e.g. during [[virtual memory]] operations such as [[page fault]]s and [[COW]].&lt;br /&gt;&lt;br /&gt;(1) we could expose the [[translation registers]] to the OS, which can treat them as an extended TLB, changing them as necessary.&lt;br /&gt;: This is not unlike what old OSes needed to do when [[multitasking using base registers]].&lt;br /&gt;&lt;br /&gt;(2) However, we might then need to re-translate.  This could be  accomplished by storing the original virtual address inside the  treg [[[translation register]].&lt;br /&gt;&lt;br /&gt;I.e. we have circled back to Austin and  Sohi - except that instead  of the physical address  being a cache, it is more exposed.&lt;br /&gt;&lt;br /&gt;Not completely exposed - it is a [[covert channel security hole]] to expose physical addresses.&lt;br /&gt;But it is an explicitly acknowledged, albeit opaque, data quantity.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;== Division Step, etc. ==&lt;br /&gt;&lt;br /&gt;Any "step" operation, such as divide-step, multiply-step, sqrt-step,&lt;br /&gt;can be  considered as a micro-optimization primitive.&lt;br /&gt;&lt;br /&gt;With the concomitant problem, that the algorithm, may vary between versions of a processor.&lt;br /&gt;&lt;br /&gt;== TBD: other micro-optimizations ==&lt;br /&gt;&lt;br /&gt;TBD&lt;br /&gt;&lt;br /&gt;= References =&lt;br /&gt;W. J. Dally. 1989. [[Micro-optimization of floating-point operations]]. SIGARCH Comput. Archit. News 17, 2 (April 1989), 283-289. DOI=10.1145/68182.68208 http://doi.acm.org/10.1145/68182.68208&lt;br /&gt;&lt;br /&gt;W. J. Dally. 1989. Micro-optimization of floating-point operations. In Proceedings of the third international conference on Architectural support for programming languages and operating systems (ASPLOS-III). ACM, New York, NY, USA, 283-289. DOI=10.1145/70082.68208 http://doi.acm.org/10.1145/70082.68208&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-4019919407459124260?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Micro-optimization_primitive_instructions' title='Micro-optimization primitive instructions'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/4019919407459124260/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=4019919407459124260' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4019919407459124260'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4019919407459124260'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/02/micro-optimization-primitive.html' title='Micro-optimization primitive instructions'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7004295341044102133</id><published>2011-02-12T12:11:00.001-08:00</published><updated>2011-02-12T12:11:47.759-08:00</updated><title type='text'>RISC versus CISC</title><content type='html'>http://semipublic.comp-arch.net/wiki/RISC_versus_CISC&lt;br /&gt;&lt;br /&gt;* [[RISC (Reduced  Instruction Set Computer)]]&lt;br /&gt;* [[CISC (Complicated Instruction Set Computer)]]&lt;br /&gt;&lt;br /&gt;[[RISC]] and [[CISC]] have been discussed elsewhere, outside this wiki, in great detail.&lt;br /&gt;I will not try to add much to what has already been written.&lt;br /&gt;&lt;br /&gt;However, although I am a RISC sympathizer, I may take a contrarian stance, talking about some  of the issues and  problems with RISC.&lt;br /&gt;The RISC philosophy was certainly influential.  However, in many ways it failed.&lt;br /&gt;I may try to talk about those failures.&lt;br /&gt;And how the so-called [[RISC Revolution]] warped a generation of computer architects.&lt;br /&gt;&lt;br /&gt;= Conclusion =&lt;br /&gt;&lt;br /&gt;I did not really want to write this article.  There is not much to add to all that has been written about the so-called [[RISC Wars]] except to say&lt;br /&gt;that they caused a lot of thrashing, amounted to less than promised, although they did bring some improvements.&lt;br /&gt;&lt;br /&gt;However, I could  not really write a wiki on computer architecture without mentioning [[RISC versus CISC]].'&lt;br /&gt;&lt;br /&gt;I would prefer, however, to discuss interesting aspects of RISC computer architecture, that may not be discussed in many other places,&lt;br /&gt;than to rehash this old debate:&lt;br /&gt;&lt;br /&gt;* [[Breaking out of the 32-bit RISC instruction set straitjacket]]&lt;br /&gt;** [[variable length RISC instruction sets]]&lt;br /&gt;** [[16-bit RISC instruction sets]]&lt;br /&gt;** [[RISC instruction sets with prefix instructions]]&lt;br /&gt;&lt;br /&gt;* [[post-RISC proliferation of instructions]]&lt;br /&gt;&lt;br /&gt;* [[How do you count instructions in an instruction set architecture?]]&lt;br /&gt;&lt;br /&gt;* [[Microcoded instructions and RISC]]&lt;br /&gt;* [[Hardwired instructions and RISC]]&lt;br /&gt;* [[Hardware state-machine sequenced instructions and RISC]]&lt;br /&gt;&lt;br /&gt;Many of these latter issues are of the form "XXX and RISC", and really should be only of the form "XXX",&lt;br /&gt;except that one of the leftovers of the [[RISC era]] is that it is considered obligatory to explain&lt;br /&gt;how a feature supports or opposes the [[[RISC philosophy]].&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= Background =&lt;br /&gt;&lt;br /&gt;The late 1970s and early 1980s were an era of increasing complexity of increasing capability and complexity in computers.&lt;br /&gt;The mainframe IBM 360 family was reaching the peak of its influence.&lt;br /&gt;The relatively simple PDP-11 minicomputer &lt;br /&gt;was supplanted by the more complex DEC VAX.&lt;br /&gt;Microprocessors were marching forward: the Intel 8086 started along the road to world domination in 1978,&lt;br /&gt;the Motorola 68000 started along the road to elegant failure in 1979.&lt;br /&gt;Intel had been talking about the &lt;br /&gt;[http://en.wikipedia.org/wiki/Intel_432 Intel iAPX 432]&lt;br /&gt;microprocessor for a while, with features such as bit granular instructions,&lt;br /&gt;garbage  collection in hardware and microcode,&lt;br /&gt;a software object model supported by hardware, etc.&lt;br /&gt;People were trying to solve the so-called software gap with hardware, by building computers  that more  closely mapped&lt;br /&gt;whatever language implementation they were targeting.&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;Complexity seemed increasing.&lt;br /&gt;In actuality, much of the software complexity was moved into microcode.&lt;br /&gt;Woe betide languages and operating systems that did not match the language and operating system use cases the  computer was designed  to support.&lt;br /&gt;&lt;br /&gt;= The RISC Revolution =&lt;br /&gt;&lt;br /&gt;Into this came a bolt of sanity:&lt;br /&gt;&lt;UL&gt;David A. Patterson and David R. Ditzel. 1980. The case for the reduced instruction set computer. SIGARCH Comput. Archit. News 8, 6 (October 1980), 25-33. DOI=10.1145/641914.641917 http://doi.acm.org/10.1145/641914.641917&lt;/UL&gt;Heck - it was not even published in a refereed journal!&lt;br /&gt;&lt;br /&gt;One must also mention the [[IBM 801 minicomputer]], arguably the first RISC.&lt;br /&gt;&lt;br /&gt;The late 1980s and early 1990s were full of RISC computers, especially microprocessors: [[Power]], [[PowerPC]], [[MIPS]], [[SPARC]], [[Motorola 88K]]. &lt;br /&gt;Not to mention many more failed companies.&lt;br /&gt;All seemed destined to replace the IBM mainframe, the VAX minicomputer and,&lt;br /&gt;then, as the importance of the PC market grew, the PC microprocessor.&lt;br /&gt;But only the 68000 Apple PC fell to the PowerPC RISC onslaught.&lt;br /&gt;The VAX and other minicomputers  died out.&lt;br /&gt;But the IBM mainframe, and the  Intyel x86, continued, the latter spectacularly.&lt;br /&gt;&lt;br /&gt;Now, by 2010,&lt;br /&gt;* IBM is slowly transitioning to [[Power]], but the IBM Z-series mainframe stays strong&lt;br /&gt;* the PC world is still ruled by Intel [[x86]]&lt;br /&gt;** Intel's RISC and VLIW projects fell by the wayside&lt;br /&gt;* [[PowerPC]] and [[MIPS]] have been relegated to [[embedded computing]]&lt;br /&gt;* [[ARM]] is the biggest RISC success story, in embedded and, particularly, in low power, cell phones, etc.&lt;br /&gt;** [[ARM]] is likely to be Intel's big competition&lt;br /&gt;* Sun [[SPARC]] survives, barely - but Sun itself got swallowed by Oracle&lt;br /&gt;** Sun/Oracle x86 boxes predominate even here&lt;br /&gt;* DEC is long dead, as is [[Alpha]]&lt;br /&gt;&lt;br /&gt;= What Happened? =&lt;br /&gt;&lt;br /&gt;Many RISC companies went after the high priced, high profit margin, workstation or server markets.&lt;br /&gt;But those markets got killed by [[The March of the Killer Micros]], specifically, the x86 PC microprocessor.&lt;br /&gt;It is telling that ARM is the most successful "RISC", and that ARM targetted embedded and low power, the low end,&lt;br /&gt;lower than the PC market, rather than the high end.&lt;br /&gt;&lt;br /&gt;[[PowerPC]] and [[MIPS]] made a concerted attack on the Intel x86 PC  franchise.&lt;br /&gt;Microsoft even provided Windows NT support.&lt;br /&gt;But when Intel proved that they could keep the x86 architecture competitive, &lt;br /&gt;and stay the best semiconductor manufacturer, the "RISC PC" withered.&lt;br /&gt;&lt;br /&gt;Intel proved they could keep the x86 PC microprocessor competitive in several stages:&lt;br /&gt;* the i486 proved  that the x86 could  be pipelined.  &lt;br /&gt;** Up until then one of the pro-RISC arguments was that CISCs were too complicated to pipeline.  But, see the next section&lt;br /&gt;** I was thinking about Motorola 88Ks about this time, when the i486 started being talked about, and I realized - RISC had no fundamental advantage&lt;br /&gt;* the Intel P5/Pentium did in-order superscalar&lt;br /&gt;* the Intel P6, first released as the Pentium Pro, then Pentium II, did out-of-order&lt;br /&gt;** briefly, the fastest microprocessor in the world, beating even DEC Alpha&lt;br /&gt;&lt;br /&gt;Some say that the P6 killed RISC.&lt;br /&gt;&lt;br /&gt;A more nuanced view is that RISC was a response to a short term issue:&lt;br /&gt;the transition from board level to chip level integration.&lt;br /&gt;When only a small fraction of a board level computer could fit on a chip,&lt;br /&gt;RISC made more sense.  When more could fit on a chip, RISC made less sense.&lt;br /&gt;&lt;br /&gt;Not no sense at all. The RISC principles always have value.&lt;br /&gt;But less sense, less of a competitive advantage.&lt;br /&gt;Unnecessary complexity is always wasteful.&lt;br /&gt;&lt;br /&gt;Moving on...&lt;br /&gt;&lt;br /&gt;= The CISCs that survived were not that CISCy =&lt;br /&gt;&lt;br /&gt;The CISCs that failed - DC VAX and the Motorola 68000 - were the most CISCy.&lt;br /&gt;Most instructions were variable length.&lt;br /&gt;Some frequently used instructions could  be very long.&lt;br /&gt;Many instructions had microcode.&lt;br /&gt;Many operations had side effects.&lt;br /&gt;They had  complicated addressing modes - elegant in their generality, but coompliocated, sometimes neceessitating microcode just to calculate an address.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The CISCs that survived - the IBM 360 mainframe family, and the Intel x86 - were not that CISCy.&lt;br /&gt;&lt;br /&gt;Sure, they had some complicated, almost impossible to implement without microcode, instructions.&lt;br /&gt;Just consider x86 FAR  CAL through CALL GATE.&lt;br /&gt;&lt;br /&gt;However, most of their instructions were simple: &lt;br /&gt;  ADD src_reg += dest_reg_or_mem.&lt;br /&gt;True, Note [[load-store]].&lt;br /&gt;But [[load-op]] pipelines were not that hard for the IBM mainframes, and the Intel i486 and P5, to implement.&lt;br /&gt;And the Intel P6 showed that the microarchitecture could be [[load-stiore]], even though the ISA was not.&lt;br /&gt;&lt;br /&gt;The IBM 360 family has relatively simple instruction decoding.&lt;br /&gt;&lt;br /&gt;The Intel x86 has painful instruction encodings, ranging from a single  byte up to 15 bytes.&lt;br /&gt;But the most frequently used instructions are small.&lt;br /&gt;The  really complicated instruction encodings, with prefixes, proved possible to (a) implement in hardware, but (b) at a significant performance cost, on the order of 1 cycle per prefix originally.&lt;br /&gt;&lt;br /&gt;Most important, these instruction sets had few side effects.&lt;br /&gt;Yes, the x86 has condition codes.&lt;br /&gt;But most x86 instructions overwrite all of the arithmetic condition codes (INC and DEC not affecting the carry flag being the notable exception).&lt;br /&gt;This avoided the sort of RAW hazard on the condition codes that would have been required for an OOO implementation of, say, the Motorola 68000.&lt;br /&gt;&lt;br /&gt;= Didn't RISC win anyway? =&lt;br /&gt;&lt;br /&gt;Did not RISC win anyway?  After all, doesn't a modern CISC processor translate to RISC uops internally?&lt;br /&gt;&lt;br /&gt;Well, yes and no.  Let's look at some of the RISC principles proposed in early papers&lt;br /&gt;&lt;br /&gt;* fixed length instructions - 32 bit&lt;br /&gt;** at the [[macroinstruction set level]], not  so much&lt;br /&gt;** at the [[microinstruction set]] or [[UISA]] level, maybe&lt;br /&gt;*** maybe- but definitely not "small".  Is it a RISC if the (micro)instructions are 160 bits wide&lt;br /&gt;*** even here, the recent trend is to compressing microcode.  Some are small, some take more bits.&lt;br /&gt;** the most popular surviving RISC instruction sets have 16 bit subsets to increase code density&lt;br /&gt;&lt;br /&gt;* simple instruction decoding&lt;br /&gt;** ISA level, no&lt;br /&gt;** UISA level - undocumented. Probably.  But, again, very wide!!&lt;br /&gt;&lt;br /&gt;* software floating point&lt;br /&gt;** nope!!&lt;br /&gt;&lt;br /&gt;* large uniform register set&lt;br /&gt;** in the early days, not so much&lt;br /&gt;** over  time, the register set has grown. As has the complexity of the ISA encoding, [[REX bytes]] etc.&lt;br /&gt;&lt;br /&gt;* small number of instructions&lt;br /&gt;** definitely not!!!&lt;br /&gt;** microcode instruction sets have long been full of many instructions, particularly widget instructions&lt;br /&gt;** even macroinstruction sets have increased dramatically in size since 1990. More than quadrupled.&lt;br /&gt;&lt;br /&gt;Some have said that the point of RISC was not reduced instruction count,&lt;br /&gt;but reduced instruction complexity.&lt;br /&gt;This may be true - certainly, this was always the  argument that I used *against* rabid RISC enthusiasts who were trying to reduce the number of instructions in the instruction set.&lt;br /&gt;But nevertheless, there were many, many, RISC zealots and managers who evaluated proposals by counting instructions.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= What's left? =&lt;br /&gt;&lt;br /&gt;The most important intellectual survivor of the  RISC philosophy has been to aversion to complicated microcode sequences.&lt;br /&gt;Expanding instructions to 1 to 4 [[uop]]s may be okay,&lt;br /&gt;but taking a cycle or more to branch into a [[microcode sequencer]], perform the operation, and branch back is a performance penalty nobody wants to take.&lt;br /&gt;&lt;br /&gt;The number of instructions in instruction sets has increased dramatically over the past decade.&lt;br /&gt;But the vast majority of these are instructions that can be implemented directly by hardware,&lt;br /&gt;by combinatoric logic,&lt;br /&gt;or by simple state machines in the case of operations such as divide.&lt;br /&gt;&lt;br /&gt;One may conjecture that the original RISC aversion to floating point&lt;br /&gt;was actually to microcode floating point:&lt;br /&gt;when the most important floating point operations like [[FADD]] and [[FMUL]] became pipelined,&lt;br /&gt;capable of being started one or more per clock&lt;br /&gt;even though the operation took several cycles of latency,&lt;br /&gt;the objections to FP on a RISC died away.&lt;br /&gt;&lt;br /&gt;== Bad Effects ==&lt;br /&gt;&lt;br /&gt;We are still paying the cost of certain RISC-era design decisions.&lt;br /&gt;&lt;br /&gt;For example, Intel MMX is irregular. &lt;br /&gt;It does not have all the reasonable combinations of &lt;br /&gt;datasize={8,16,32},&lt;br /&gt;unsaturated, signed and unsigned saturation.&lt;br /&gt;This irregularity was NOT because of hardware complexity,&lt;br /&gt;but because management was trying to follow RISC principles by counting instructions.&lt;br /&gt;Even when providing all regular combinations would have made the hardware simpler rather than harder.&lt;br /&gt;(Aside: validation complexity is often used as an argument here, against regularity. The complexity of validating all regular combinations grows combinatorically.)&lt;br /&gt;&lt;br /&gt;AMD x86-64 is not a bad instruction set extension.&lt;br /&gt;But life might be easier in the  future, if x86 does not die away, if it had been more regular.&lt;br /&gt;More RISC-like.&lt;br /&gt;But in this way RISC was  its own enemy: &lt;br /&gt;RISC did not achieve the often hoped for great performance improvements over CISC.&lt;br /&gt;RISC reduces complexity, which does not directly improve performance.&lt;br /&gt;So people went chasing after  a Holy Grail of VLIW performance, which also did not pan out.&lt;br /&gt;&lt;br /&gt;= Conclusion =&lt;br /&gt;&lt;br /&gt;I did not really want to write this article.  There is not much to add to all that has been written about the so-called [[RISC Wars]] except to say&lt;br /&gt;that they caused a lot of thrashing, amounted to less than promised, although they did bring some improvements.&lt;br /&gt;&lt;br /&gt;However, I could  not really write a wiki on computer architecture without mentioning [[RISC versus CISC]].'&lt;br /&gt;&lt;br /&gt;I would prefer, however, to discuss interesting aspects of RISC computer architecture, that may not be discussed in many other places,&lt;br /&gt;than to rehash this old debate:&lt;br /&gt;&lt;br /&gt;* [[Breaking out of the 32-bit RISC instruction set straitjacket]]&lt;br /&gt;** [[variable length RISC instruction sets]]&lt;br /&gt;** [[16-bit RISC instruction sets]]&lt;br /&gt;** [[RISC instruction sets with prefix instructions]]&lt;br /&gt;&lt;br /&gt;* [[post-RISC proliferation of instructions]]&lt;br /&gt;&lt;br /&gt;* [[How do you count instructions in an instruction set architecture?]]&lt;br /&gt;&lt;br /&gt;* [[Microcoded instructions and RISC]]&lt;br /&gt;* [[Hardwired instructions and RISC]]&lt;br /&gt;* [[Hardware state-machine sequenced instructions and RISC]]&lt;br /&gt;&lt;br /&gt;Many of these latter issues are of the form "XXX and RISC", and really should be only of the form "XXX",&lt;br /&gt;except that one of the leftovers of the [[RISC era]] is that it is considered obligatory to explain&lt;br /&gt;how a feature supports or opposes the [[[RISC philosophy]].&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7004295341044102133?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/RISC_versus_CISC' title='RISC versus CISC'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7004295341044102133/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7004295341044102133' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7004295341044102133'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7004295341044102133'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/02/risc-versus-cisc.html' title='RISC versus CISC'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-5808680738424039051</id><published>2011-02-09T21:01:00.001-08:00</published><updated>2011-02-09T21:01:36.917-08:00</updated><title type='text'>Non-speculative or less speculative versus more speculative</title><content type='html'>http://semipublic.comp-arch.net/wiki/Non-speculative_or_less_speculative_versus_more_speculative&lt;br /&gt;&lt;br /&gt;When discussing [[speculative execution]] techniques such as [[speculative multithreading]] (or [[skipahead]], or [[slipahead]], or ... )&lt;br /&gt;we often need to talk about less speculative and more speculative threads or instructions.&lt;br /&gt;&lt;br /&gt;E.g. a later, younger, more speculative load instruction may be blocked by an earlier, older, less speculative store instruction to the same address.&lt;br /&gt;&lt;br /&gt;E.g. in [[SpMT]] or  [[SkMT]] a less speculative thread may fork a more speculative thread.&lt;br /&gt;&lt;br /&gt;Often in early work we only talk about non-speculative threads forking,&lt;br /&gt;and do not discuss the possibility of a speculative  thread forking.&lt;br /&gt;Late, it is  realized that it is entiirely reasonable for speculative  threads to fork.&lt;br /&gt;Thus, often we need to read "non-speculative" in early work,&lt;br /&gt;and implicitly substitute "less speculative".&lt;br /&gt;&lt;br /&gt;E.g. a common [[speculative thread management policy]] is to always keep the least speculativethreads,&lt;br /&gt;and  cancel any more speculativethreads,&lt;br /&gt;when there is an opportunity to spawn a less speculative  thread.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;Note that "less speculative" is not necessarily the same as "older".&lt;br /&gt;E.g. in a [[fork-on-call]] thread, certain instructions may beindependent of the skipped code,&lt;br /&gt;and be guaranteed to be executed;&lt;br /&gt;whereas other instructions, inside a skipped function,&lt;br /&gt;or older in the [[Von Neumann instruction sequence]],&lt;br /&gt;but are actually more speculative, as in less likely to be executed.&lt;br /&gt;However, it is often too hard to track this, so often discussions&lt;br /&gt;will implicitly assume "less speculative" is "older".&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;Note that it is not always possible to determine the order of speculative threads or instructions.&lt;br /&gt;E.g. one may fork loop bodies out of order, with no known order,&lt;br /&gt;and later string them together as memory items iterated on by the loop are encountered.&lt;br /&gt;However, arbitrarily imposing an order simplifies many speculative algorithms,&lt;br /&gt;even at the cost of potential performance.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;* See[[spawning versus forking a thread]].&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-5808680738424039051?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Non-speculative_or_less_speculative_versus_more_speculative' title='Non-speculative or less speculative versus more speculative'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/5808680738424039051/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=5808680738424039051' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/5808680738424039051'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/5808680738424039051'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/02/non-speculative-or-less-speculative.html' title='Non-speculative or less speculative versus more speculative'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-4139868735794198385</id><published>2011-02-09T21:00:00.000-08:00</published><updated>2011-02-09T21:00:04.438-08:00</updated><title type='text'>Skipahead Multithreading</title><content type='html'>http://semipublic.comp-arch.net/wiki/SkMT&lt;br /&gt;&lt;br /&gt;[[Skipahead multithreading (SkMT)]] is a form of [[speculative multithreading (SpMT)]]&lt;br /&gt;characterized by "skipping ahead" at certain points in &lt;br /&gt;[[non-speculative or less speculative versus more speculative|non-speculative or less speculative execution to more speculative execution]].&lt;br /&gt;&lt;br /&gt;Typically these "certain points" in thecode are places&lt;br /&gt;where there is a well characterized [[control independence or convergence]] point:&lt;br /&gt;* the instruction after a CALL instruction&lt;br /&gt;* end  of loop&lt;br /&gt;* later iterations of loop&lt;br /&gt;* IF convergence&lt;br /&gt;&lt;br /&gt;I, Andy Glew, coined the term [[SkMT]]&lt;br /&gt;when it became evident that the term [[SpMT]],&lt;br /&gt;which was itself coined by Antonio Gonzales and promoted by me,&lt;br /&gt;was more generic.&lt;br /&gt;I.e. you can imagine creating speculative threads&lt;br /&gt;that do not really skip that far ahead,&lt;br /&gt;but which, e.g. execute past a place where execution would bee blocked,&lt;br /&gt;either an in-order blockage, or where an OOO window would be full.&lt;br /&gt;See [[non-skipahead speculative multithreading]].&lt;br /&gt;&lt;br /&gt;&lt;UL&gt;Oooo.... I just created a new term: [[slip-ahead multithreading]].It rather nicely encapsulates what I just described,and is consistent with published project names such as [[slipstream]].&lt;/UL&gt;&lt;br /&gt;In much the  same way, I had earlier used the term [[implicit multithreading (IMT)]],&lt;br /&gt;and replaced it by [[speculative multithreading (SpMT)]],&lt;br /&gt;which I am now (after, 10 years ago, 2000) specializing to [[skipahead multithreading (SkMT)]].&lt;br /&gt;&lt;br /&gt;The term [[skipahead]] is intended to be contrasted with [[lookahead]],&lt;br /&gt;a term which was once used to characterize all [[out-of-order (OOO)]] execution.&lt;br /&gt;&lt;br /&gt;&lt;UL&gt;Look-ahead processors.Robert M. Keller, Princeton.ACM Computing Surveys,Vol 7, Issue 4, Dec 1975.http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.88.5118&amp;rep=rep1&amp;type=pdf&lt;/UL&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-4139868735794198385?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/SkMT' title='Skipahead Multithreading'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/4139868735794198385/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=4139868735794198385' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4139868735794198385'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/4139868735794198385'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/02/skipahead-multithreading.html' title='Skipahead Multithreading'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-6318722310745611319</id><published>2011-02-06T16:40:00.001-08:00</published><updated>2011-02-06T16:40:17.247-08:00</updated><title type='text'>Managing branch prediction history: copying short history versus pointing to large history</title><content type='html'>http://semipublic.comp-arch.net/wiki/Managing_branch_prediction_history:_copying_short_history_versus_pointing_to_large_history&lt;br /&gt;&lt;br /&gt;When [[branch prediction history]] was short, 8-12 bits, and global,&lt;br /&gt;then it was not unreasonable to manage the history by copying it.&lt;br /&gt;&lt;br /&gt;E.g. in one simulator I actually arranged so that a branch [[uop]]&lt;br /&gt;wrote &amp;nbsp;back, as its result&lt;br /&gt;* an indication of whether it was mispredicted or not&lt;br /&gt;* the taken [[target IP]]&lt;br /&gt;* the branch predictor history to be restored on a branch misprediction.&lt;br /&gt;&lt;br /&gt;I.e. the branch prediction history was propagated, in this simulator, from the [[instruction fetch front end]]&lt;br /&gt;across the scheduler to execution, and back again.&lt;br /&gt;&lt;br /&gt;While simple, this involves a lot of unnecessary data movement - both for the history, but also for the [[target IP]].&lt;br /&gt;Most machines of my acquaintance create a [[branch information table (BIT)]], holding information for branches in flight.&lt;br /&gt;This avoids copying the history from the front end to execution and back again,&lt;br /&gt;but nevertheless may involve making copies of &amp;nbsp;the history.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;I confess an ulterior motive for copying the history and other branch information around: this naturally leads to the number of branches in flight scaling with the window size. Many early OOO designs were crippled by supported too few branches in flight. &lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;Making copies of the history seems silly if, as in a TNT history, they differ only by 2 bits:&lt;br /&gt;&amp;nbsp;new_history := (old_history &amp;lt;&amp;lt; 1) | new_branch_taken_or_not&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;(Note: when shifting a branch history, we will often say h&amp;lt;&lt;n &amp;nbsp;size="" (h &lt;&lt; n)&amp;amp;sizemask="" [[c]].="" [[simulator="" a="" as="" bits="" coding]]="" do="" finite="" have="" i.e.="" if="" implicit="" in="" leave="" loss="" of="" out="" p="" rather="" register.)&lt;="" shifted="" than="" the="" to="" we="" would=""&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;But making copies seems to be required if we want to be able to restore to any mispredicted branch point, i.e. if we want to do [[instantaneous versus incremental branch misprediction repair]].&lt;br /&gt;&lt;br /&gt;Making copies works well enough if the branch prediction history is small enough - 8 bits, etc.&lt;br /&gt;But by the time we are talking about 16 bit histories and 32 or 64 bit IPs, we are talking about quite a few bits.&lt;br /&gt;&lt;br /&gt;Furthermore, in the &amp;nbsp;late 1990s and early 2000s branch predictors arose with &amp;nbsp;hitherto unseen long histories,&lt;br /&gt;such as Seznec's OGEHL predictor (with multiple history lengths 9, 2, 4, 8, 16, 32, 64, 128, ...).&lt;br /&gt;Copying around a 128 bit history, even to a [[BIT]], is wasteful;&lt;br /&gt;copying around the even larger 1000+ bit histories that have been proposed is even worse.&lt;br /&gt;&lt;br /&gt;Hence the interest in &lt;b&gt;pointing&lt;/b&gt; to a position in a branch prediction history,&lt;br /&gt;rather than copying the entire history.&lt;br /&gt;On a branch misprediction one would restore the pointer, rather than overwriting the &amp;nbsp;history with a savedcopy.&lt;br /&gt;&lt;br /&gt;This would &amp;nbsp;be straightforward for a long [[TNT]] history, since only 1 bit depends on any branch.&lt;br /&gt;You would simply keep a history of total length TL=PHL+BIF, the sum of predictor history length plus branches in flight.&lt;br /&gt;On a misprediction you would restore the pointer in this circular buffer.&lt;br /&gt;(Or equivalently shift the buffer- I suspect that shifting is too power hungry.)&lt;br /&gt;&lt;br /&gt;This could scale to almost any length of history, potentially thousands of bits.&lt;br /&gt;&lt;br /&gt;Unfortunately, modern [[stew]] histories are more complicated than [[TNT]] histories.&lt;br /&gt;The [[branch IP]], or even both [[from IP]] and [[to IP]], may be [[hashed, e.g. XORed]] into the stew.&lt;br /&gt;This means that several of the youngest bits in the history may change on every branch.&lt;br /&gt;Simply restoring a pointer will not suffice.&lt;br /&gt;&lt;br /&gt;TBD: explain stew management in more detail.&lt;br /&gt;&lt;br /&gt;Simple strategy: &amp;nbsp;constrain the stew to have only the N youngest bits affected by the most recent branch. &amp;nbsp;Bits TL..N are unaffected &amp;nbsp;by the most recent branch, except for shifting.&lt;br /&gt;One can then keep a copy of the parts of the history affected by recent branches, the N youngest bits, and a pointer that locates the older bits.&lt;br /&gt;&lt;br /&gt;= See Also =&lt;br /&gt;&lt;br /&gt;* [[How to use a really long predictor history]]&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-6318722310745611319?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Managing_branch_prediction_history:_copying_short_history_versus_pointing_to_large_history' title='Managing branch prediction history: copying short history versus pointing to large history'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/6318722310745611319/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=6318722310745611319' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6318722310745611319'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6318722310745611319'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/02/managing-branch-prediction-history.html' title='Managing branch prediction history: copying short history versus pointing to large history'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-1653066225073485186</id><published>2011-02-06T00:21:00.000-08:00</published><updated>2011-02-06T00:21:25.859-08:00</updated><title type='text'>Branch Prediction Stew</title><content type='html'>{{Terminology Term}}&lt;br /&gt;http://semipublic.comp-arch.net/wiki/Stew&lt;br /&gt;&lt;br /&gt;The [[stew]] is a form of history used by certain branch predictors.&lt;br /&gt;&lt;br /&gt;See, for example, US patent 7143273,&lt;br /&gt;Method and apparatus for dynamic branch prediction utilizing multiple stew algorithms for indexing a global history,&lt;br /&gt;Mile, Slade, and Jourdan,&lt;br /&gt;filed March 31, 2003,&lt;br /&gt;assignee Intel.&lt;br /&gt;&lt;br /&gt;A simple [[branch predictor history]] might be &amp;nbsp;a simple [[TNT]] history,&lt;br /&gt;with 0s corresponding to non-taken and 1s corresponding to taken.&lt;br /&gt;Such a simple TNT history cannot distinguish some convergeing paths,&lt;br /&gt;and indirect branches.&lt;br /&gt;&lt;br /&gt;US7143273&lt;br /&gt;describes one embodiment of a stew as&lt;br /&gt;&amp;nbsp;stew = ((stew &amp;lt;&amp;lt; 1)|new_bit ^ &amp;nbsp;ip)&lt;br /&gt;&lt;br /&gt;A stew formed &amp;nbsp;in this way can distinguish converging paths such as &amp;nbsp;("if true" means a condition that evaluates as true, not an unconditional branch):&lt;br /&gt;&amp;nbsp;&amp;nbsp; L1: if true got L2&lt;br /&gt;&amp;nbsp;&amp;nbsp; L2: if true goto L99&lt;br /&gt;&amp;nbsp;&amp;nbsp; L10: if true goto L11&lt;br /&gt;&amp;nbsp;&amp;nbsp; L11: if true &amp;nbsp;goto L&lt;br /&gt;&amp;nbsp;&amp;nbsp; L: &amp;nbsp;if ?? goto L99&lt;br /&gt;&lt;br /&gt;However, it does not distinguish multiple &amp;nbsp;branch targets and paths out of an indirect branch, such as&lt;br /&gt;&amp;nbsp;&amp;nbsp; L1: Reg:= IL1; if true goto L2&lt;br /&gt;&amp;nbsp;&amp;nbsp; L2: if true goto L99&lt;br /&gt;&amp;nbsp;&amp;nbsp; L10: Reg:=IL2; if true goto L11&lt;br /&gt;&amp;nbsp;&amp;nbsp; L11: if true &amp;nbsp;goto L&lt;br /&gt;&amp;nbsp;&amp;nbsp; L: &amp;nbsp;if ?? goto [Reg]&lt;br /&gt;&amp;nbsp;&amp;nbsp; IL1: ...&lt;br /&gt;&amp;nbsp;&amp;nbsp; IL2: ...&lt;br /&gt;It can be seen that mixing in arc information as well as node information remedies this situation,&lt;br /&gt;and distinguishes different paths so long as the hashes do not collide:&lt;br /&gt;&lt;br /&gt;&amp;nbsp;stew &amp;lt;&amp;lt;= number_of_bits_to_discard&lt;br /&gt;&amp;nbsp;stew = hash( stew, from_IP, to_IP, taken/not_taken, ...)&lt;br /&gt;&lt;br /&gt;Issue: how many bits to use? &amp;nbsp;Which may vary as a function of the type of branch: e.g. a direct conditional branch&lt;br /&gt;may not need as many to_ip bits to be hashed in&lt;br /&gt;as a completely random indirect branch.&lt;br /&gt;Similarly, indirect calls and returns may be handled separately.&lt;br /&gt;&lt;br /&gt;(TBD-IP)&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-1653066225073485186?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Stew' title='Branch Prediction Stew'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/1653066225073485186/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=1653066225073485186' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1653066225073485186'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1653066225073485186'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/02/branch-prediction-stew.html' title='Branch Prediction Stew'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-7818425878073886383</id><published>2011-01-31T21:10:00.003-08:00</published><updated>2011-01-31T21:13:30.468-08:00</updated><title type='text'>Slow :-(</title><content type='html'>Minor bitching: writing in the comp-arch wiki is so slow.&lt;br /&gt;&lt;br /&gt;Hard to get big chunks of time.&lt;br /&gt;&lt;br /&gt;Hard to write big chunks when I don't have big &amp;nbsp;chunks of time.&lt;br /&gt;&lt;br /&gt;But at least my present employer allows me to write this. &amp;nbsp;And does not forbid me, like my past employer.&lt;br /&gt;&lt;br /&gt;I wish that I could average 1 new page a day. &amp;nbsp;Instead, I am averaging less than half a page a day, and probably less than 1 substantial page a week.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-7818425878073886383?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/7818425878073886383/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=7818425878073886383' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7818425878073886383'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/7818425878073886383'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/01/slow.html' title='Slow :-('/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-1645047701712363586</id><published>2011-01-31T21:03:00.001-08:00</published><updated>2011-01-31T21:04:47.640-08:00</updated><title type='text'>OOO_versus_Runahead_versus_Log-based_replay_for_runahead</title><content type='html'>http://semipublic.comp-arch.net/wiki/OOO_versus_Runahead_versus_Log-based_replay_for_runahead&lt;br /&gt;&lt;br /&gt;In the era of Willamette I suffered a moral dilemma - or, rather, a dilemma of morale.&lt;br /&gt;I had evangelized [[OOO execution]] with P6.&lt;br /&gt;But I had been introduced to runahead.&lt;br /&gt;And runahead execution seemed to be asymptotically as good as OOO execution,&lt;br /&gt;as memory latencies got longer - i.e. as we entered the era of [[MLP (memory level parallelism)]], which I was then starting to evangelize.&lt;br /&gt;&lt;br /&gt;OOO execution is ... well, I'll assume you know what OOO is, or can read about it [[OOO|elsewhere]].&lt;br /&gt;&lt;br /&gt;= What Runahead Is ==&lt;br /&gt;&lt;br /&gt;[[Runahead]] consists essentially of&lt;br /&gt;* executing&lt;br /&gt;* when an unpredictable, long-latency, event such as a cache miss occurs&lt;br /&gt;** take a checkpoint&lt;br /&gt;** mark the result invalid&lt;br /&gt;* continue executing past the checkpoint in a speculative manner&lt;br /&gt;** saving speculative values into registers&lt;br /&gt;** saving speculative values into a store buffer, or similar [[memory speculation datastructure]]&lt;br /&gt;* when the cache miss returns&lt;br /&gt;** discard all speculative state, registers and memory&lt;br /&gt;** restore from the checkpoint.&lt;br /&gt;&lt;br /&gt;= Runahead and OOO are asymptotically equal =&lt;br /&gt;&lt;br /&gt;Runahead is a comparatively simple action.&lt;br /&gt;And, in some sense, it gets nearly all of the benefit of OOO.&lt;br /&gt;I.e. as memory latencies increase to infinity and execution latencies becomeinfinitesimal by comparison,&lt;br /&gt;runahead execution will approach OOO execution in performance.&lt;br /&gt;&lt;br /&gt;Normally, barring funny effects, runahead &amp;nbsp;will always be lower performance than OOO.&lt;br /&gt;For the same size of instruction window.&lt;br /&gt;But runahead is much cheaper in hardware, and can potentially runahead much farther than OOO for the same hardware cost.&lt;br /&gt;Both runahead and OOO are limited by the [[memory speculation datastructure]];&lt;br /&gt;but runahead need build no large physical register file, etc., save only for a number of RF &amp;nbsp;checkpoints.&lt;br /&gt;&lt;br /&gt;= [[log-based]] =&lt;br /&gt;&lt;br /&gt;Confounded by OOO/run-ahead, I looked for something better than runahead in some asymptotic sense.&lt;br /&gt;Hence &amp;nbsp;[[log-based]] execution - specifically [[log-based verify re-execution]].&lt;br /&gt;&lt;br /&gt;Although subsequently I applied [[log-based]] to [[speculative multithreading]],&lt;br /&gt;at first I thought about it only wrt the same [[single sequencer microarchitecture]]&lt;br /&gt;as OOO or runahead:&lt;br /&gt;&lt;br /&gt;Imagine that you are executing in a runahead manner.&lt;br /&gt;However, instead of re-executing all of the instructions after a cache miss returns and a checkpoint is restored,&lt;br /&gt;instead you (1) record all of the results of runahead execution in a log,&lt;br /&gt;and (2) you execute instructions out of the log.&lt;br /&gt;&lt;br /&gt;This raises several questions:&lt;br /&gt;Why would this be any faster than runahead?&lt;br /&gt;Why would it be any "larger" than OOO or as large as runahead?&lt;br /&gt;Why would it be as cheap as runahead?&lt;br /&gt;&lt;br /&gt;Let;'s answer them out of order.&lt;br /&gt;&lt;br /&gt;;Why would this be as cheap as runahead? &lt;br /&gt;&lt;br /&gt;Well, if you had a dedicated log [[hardware &amp;nbsp;datastructure]] it would not be.&lt;br /&gt;However, the log is accessed sequentially during [[verify re-execution]].&lt;br /&gt;Sequential access patterns can be easily prefetched.&lt;br /&gt;therefore, you could put the log in slow memory - potentially in main memory.&lt;br /&gt;you might need some fast hardware log to "start off" the verify re-execution,&lt;br /&gt;sufficient to tolerate the latency to main memory.&lt;br /&gt;Equivalently, you might manage cache replacement to ensure that the oldest entries in the log are not displaced from a cache of main memory,&lt;br /&gt;as they normally would be in LRU.&lt;br /&gt;&lt;br /&gt;;Why would it be any "larger" than OOO or as large as runahead?&lt;br /&gt;&lt;br /&gt;As explained above, the log need not be a small size hardware datastructure. &amp;nbsp;It could potentially scale in size as main memory.&lt;br /&gt;&lt;br /&gt;;Why would this be any faster than runahead?&lt;br /&gt;&lt;br /&gt;Two reasons.&lt;br /&gt;&lt;br /&gt;First, [[verify re-execution]] is inherently more parallel than normal execution.&lt;br /&gt;&lt;br /&gt;I like to explain this using repeatedly increment a register as an example:&lt;br /&gt;&lt;br /&gt;&amp;nbsp;r9 := load(cache misss)&lt;br /&gt;&amp;nbsp;INC r9&lt;br /&gt;&amp;nbsp;INC r9&lt;br /&gt;&amp;nbsp;...&lt;br /&gt;&amp;nbsp;INC r9&lt;br /&gt;&lt;br /&gt;If INC is unit latency, then runhead re-execution would take N cycles to re-execute N such INCrements of a register.&lt;br /&gt;&lt;br /&gt;However, in [[log-based verify re-execution]],&lt;br /&gt;the example code above would be rewritten in the log as&lt;br /&gt;&lt;br /&gt;&amp;nbsp;r9 := load(cache misss)&lt;br /&gt;&amp;nbsp;assert(r9 = 1042)&lt;br /&gt;&amp;nbsp;INC r9 // assert(r9=1042); r9:= 1043&lt;br /&gt;&amp;nbsp;INC r9 // assert(r9=1043); r9:= 1044&lt;br /&gt;&amp;nbsp;INC r9 // assert(r9=1044); r9:= 1045&lt;br /&gt;&amp;nbsp;...&lt;br /&gt;&amp;nbsp;INC r9 // assert(r9=1042+N); r9:= 1042+N+1&lt;br /&gt;&lt;br /&gt;i.e. the instructions rewritten for storage in the log,&lt;br /&gt;and for [[verify re-execution]] out of the &amp;nbsp;log, consist of nothing except moves of a cionstant to a register.&lt;br /&gt;Rather than N cycles, these would take N/IF cycles, where IF is the instruction width.&lt;br /&gt;&lt;br /&gt;Furthermore, there are many optimizations that can be applied to the log:&lt;br /&gt;removing redundant asserts and redundant overwrites, etc.&lt;br /&gt;In general, for a block of register based code,&lt;br /&gt;[[log-based verify re-execution]] has &amp;nbsp;one assert for every live-in,&lt;br /&gt;and one constant to register move for every live-out.&lt;br /&gt;Assuming the assertions are met.&lt;br /&gt;(And if the assertions are not met, fall back to normal (re-)execution.&lt;br /&gt;&lt;br /&gt;The second reason for log-based verify re-execution being faster is this:&lt;br /&gt;&lt;br /&gt;Normal runahead re-execution is limited by cache associativity.&lt;br /&gt;If you have N independent memory references that were cachemisses prefetched by the runahead speculative execution epoch,&lt;br /&gt;they are normally independent, and all prefetched.&lt;br /&gt;EXCEPT when they happen to collide in the cache, e.g. in the limited associativity cache.&lt;br /&gt;In which case, another runahead/checkpoint/re-execution would begin.&lt;br /&gt;&lt;br /&gt;Log-based verify re-execution suffers this not at all.&lt;br /&gt;If a cache missing load is thrashed out of the cache,&lt;br /&gt;its value is still present in the log,&lt;br /&gt;and dependent instructions can still be executed - or, rather, verify re-executed.&lt;br /&gt;You will need to pay the memory latency to fetch it and verify, but that can be overlapped.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;= Conclusion =&lt;br /&gt;&lt;br /&gt;Log-based verify re-execution, or`replay, for runahead&lt;br /&gt;is asymptotically faster than ordinary runahead in two senses:&lt;br /&gt;non-cache miss execution latency,&lt;br /&gt;and cache thrashing.&lt;br /&gt;&lt;br /&gt;Ultimately, the fashion or fad of runahead petered out.&lt;br /&gt;Runahead did not replace OOO, although it caused much heartburn.&lt;br /&gt;&lt;br /&gt;Runahead did not replace OOO because, although asymptotically better than OOO, we aren't at the asymptote.&lt;br /&gt;Execution unit latencies are not 0, and cache misses infinite.&lt;br /&gt;OOO handles both.&lt;br /&gt;&lt;br /&gt;Also, runahead may consume more power than OOO. &amp;nbsp;This is somewhat debatable:&lt;br /&gt;OOO's hardware datastructures may waste power, e.g. leakage,&lt;br /&gt;whereas runahead may consume static power.&lt;br /&gt;&lt;br /&gt;Probably the biggest reason why runahead did not supplant OOO in the late 1990s and early 2000s is this:&lt;br /&gt;runhead is only as good as OOO.&lt;br /&gt;It doesn't do anything new.&lt;br /&gt;OOO was an already established and well understood technology by the time runahead appeared on the scene.&lt;br /&gt;Although runahead has putative benefits, they were not enough to warrant a switch-over.&lt;br /&gt;&lt;br /&gt;Be this as it may, log-based verify re-execution is a technique that rose out of the [[Runahead versus OOO]] debate.&lt;br /&gt;It is better than runahead in some asymptotic senses. &amp;nbsp;And it has application to other areas, such as [[SpMT]].&lt;br /&gt;&lt;br /&gt;= Hybrids =&lt;br /&gt;&lt;br /&gt;Of course it is possible &amp;nbsp;to create hybrids of OOO, runahead, and log-based.&lt;br /&gt;&lt;br /&gt;For example, one need not record all instruction results in the log. &amp;nbsp;One can re-execute that which is fast,&lt;br /&gt;and only record in the log that which is slow.&lt;br /&gt;&lt;br /&gt;Similarly, an OOO machine need not record all speculative results in the instruction window.&lt;br /&gt;&lt;br /&gt;Onur Mutlu and others have proposed hybrids that fill an OOO instruction window, and then use runahead.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-1645047701712363586?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/OOO_versus_Runahead_versus_Log-based_replay_for_runahead' title='OOO_versus_Runahead_versus_Log-based_replay_for_runahead'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/1645047701712363586/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=1645047701712363586' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1645047701712363586'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1645047701712363586'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/01/oooversusrunaheadversuslog.html' title='OOO_versus_Runahead_versus_Log-based_replay_for_runahead'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-5238851412630233919</id><published>2011-01-29T17:05:00.000-08:00</published><updated>2011-01-29T17:08:44.958-08:00</updated><title type='text'>Cygwin X multidisplay bug</title><content type='html'>Today I figured out a problem that has been plaguing me on Cygwin for a year or two, since &amp;nbsp;things broke in an update. &amp;nbsp; (I probably figured it out before, but neglected to record or remember it.)&lt;br /&gt;&lt;br /&gt;When I start up Xwin -multiwindow, using &amp;nbsp;windows manipulated by MS Windows rather than a desktop in an MS Window window (try saying that fast), the initial xterm was always invisible.&lt;br /&gt;&lt;br /&gt;Eventually I realized that if I maximized it from the MS Windows task bar, I could see it.&lt;br /&gt;&lt;br /&gt;Apparently, this window was being created at location (0,0) in a bounding box for my multi-display configuration. &amp;nbsp; (That's multiple physical displays, not multiple X &amp;nbsp;displays.) &amp;nbsp;However, location (0,0) was invisible - since my multi-display configuration has a big tall portrait mode window in the middle for reading PDFs, with flat landscape displays on either side for webpages. The portrait display rising above landscape displays.&lt;br /&gt;&lt;br /&gt;By the way, it is actually +1+1 asthe &amp;nbsp;default location, not +0+0. Minor &amp;nbsp;difference.&lt;br /&gt;&lt;br /&gt;I need to play around &amp;nbsp;and see if I can find a guaranteed visible place.&lt;br /&gt;&lt;br /&gt;Removing the geometry spec seems to work okay for me.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-5238851412630233919?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/5238851412630233919/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=5238851412630233919' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/5238851412630233919'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/5238851412630233919'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/01/cygwin-x-multidisplay-bug.html' title='Cygwin X multidisplay bug'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-1977792670223634260</id><published>2011-01-28T12:25:00.001-08:00</published><updated>2011-01-28T12:25:54.802-08:00</updated><title type='text'>Why_full-speed_denormal_handling_makes_more_sense_for_CPUs_than_CPUs</title><content type='html'>http://semipublic.comp-arch.net/wiki/Why_full-speed_denormal_handling_makes_more_sense_for_CPUs_than_CPUs&lt;br /&gt;&lt;br /&gt;o a collector of computer architecture trivia like me,&lt;br /&gt;it was quite remarkable&lt;br /&gt;that the Nvidia Fermi GPU&lt;br /&gt;provides full-speed support for denormal operands and results,&lt;br /&gt;whereas Intel and AMD CPUs, at the time of writing, do not.&lt;br /&gt;&lt;br /&gt;By the way, terminology: [[full-speed support for denormals]] is not quite the same as [[hardware support for denormals]].&lt;br /&gt;&lt;br /&gt;The first question is whether you provide hardware support,&lt;br /&gt;or whether it is necessary to [[trap to microcode or software]], to provide the denormal support.&lt;br /&gt;&lt;br /&gt;The second question is,&lt;br /&gt;if you do provide hardware support for denormals and do not need to trap to microcode or software.&lt;br /&gt;Even if denormals are performed in hardware, there may be a performance cost:&lt;br /&gt;it may cost latency, e.g. 5 cycles for an FADD &amp;nbsp;rather than 4 &amp;nbsp;cycles;&lt;br /&gt;or it may cost bandwidth.&lt;br /&gt;The latency and bandwidth impacts are related but may be decoupled:&lt;br /&gt;e.g. it is possible that a scheduler can arrange so that throughput is not reduced even if latency is increased,&lt;br /&gt;so long as there is sufficient [[ILP]].&lt;br /&gt;&lt;br /&gt;By [[full-speed denorms]] we mean mainly that throughput or bandwidth is not affected.&lt;br /&gt;E.g. you may arrange to add to the &amp;nbsp;latency of all FP ops,&lt;br /&gt;to avoid [[writeback port collisions for operations with different latencies]].&lt;br /&gt;Since GPUs are throughput machines, this is usually a reasonable tradeoff;&lt;br /&gt;on some GPUs even integer arithmetic is stretched out to 40 cycles.&lt;br /&gt;&lt;br /&gt;So, why would a GPU like Nvidia Fermi provide full-speed denorms,&lt;br /&gt;whereas x86 CPUs from Intel and &amp;nbsp;AMD do not?&lt;br /&gt;&lt;br /&gt;Let's skip the possibility of a marketing bullet.&lt;br /&gt;&lt;br /&gt;Remember [[GPU-style SIMD or SIMT coherent threading]]?&lt;br /&gt;&lt;br /&gt;If a GPU takes a trap, or otherwise takes a hiccup, to handle denorms, then not only is the current thread impacted.&lt;br /&gt;All 16 or 64 threads the [[wavefront or warp]] are impacted.&lt;br /&gt;And if only one of the [[spatial threads]] in the [[warp]] is &amp;nbsp;taking the trap,&lt;br /&gt;the efficiency of the machine decreases by at least 16-64X in that period.&lt;br /&gt;Let alone the possibility that the denorm handling code is also not well suited to a GPU.&lt;br /&gt;&lt;br /&gt;I.e. the relative cost of [[denorm handling via trapping]] is much higher on a GPU than on a CPU.&lt;br /&gt;Even denorm handling in hardware, in a manner that impacts latency or throughput,&lt;br /&gt;is relatively more expensive and/or harder to deal with.&lt;br /&gt;&lt;br /&gt;Hence: there are good technical reasons to do [[full-speed denorm handling]] in GPUs.&lt;br /&gt;These reasons are not quite so compelling for CPUs&lt;br /&gt;- although I predict that ultimately CPUs will be compelled to follow.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;Ancedote:on one of my few trips to Pixar,I talked to Bruce Perens about denorm handling.Apparently Pixar would be happily rendering a movie at high speed,and then Bam! they would have a scene like a night sky, black, full of denorms,and performance would fall off a cliff.We talked about flush-to-zero modes, which Intel x86 CPUs did not have at that time.I suggested biasing the numbers, e.g. adding 1 or similar,to prevent getting into denorm range.Now that x86 has flush-to-zero the need for such [[kluge]]s is greatly reduced.But, denorms exist for a reason.flush-to-zero can introduce artifacts,and always introduces [[FUD (Fear, Uncretainty, and Doubt)]].[[Full-speed support for denorms]] may just be the coming thing,a place where GPUs lead the way for CPUs.&lt;/ul&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-1977792670223634260?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Why_full-speed_denormal_handling_makes_more_sense_for_CPUs_than_CPUs' title='Why_full-speed_denormal_handling_makes_more_sense_for_CPUs_than_CPUs'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/1977792670223634260/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=1977792670223634260' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1977792670223634260'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1977792670223634260'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/01/whyfull-speeddenormalhandlingmakesmores.html' title='Why_full-speed_denormal_handling_makes_more_sense_for_CPUs_than_CPUs'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-1231973617875866872</id><published>2011-01-27T18:35:00.000-08:00</published><updated>2011-01-27T18:35:58.526-08:00</updated><title type='text'>ISO mediawaiki macro</title><content type='html'>I'm still using mediawiki for my comp-arch wiki.&lt;br /&gt;&lt;br /&gt;I want a "macro", something like&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; {{acronym_and_term SIMT "simultaneous multithreading"}&lt;br /&gt;&lt;br /&gt;that will create pages&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; SMT&lt;br /&gt;&amp;nbsp;&amp;nbsp; simultaneous mulrtithreading&lt;br /&gt;&amp;nbsp;&amp;nbsp; SMT (simultaneous multithreading)&lt;br /&gt;&lt;br /&gt;and redirect them all to&lt;br /&gt;&amp;nbsp;&amp;nbsp; simultaneous multithreading (SMT)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&lt;br /&gt;&lt;br /&gt;Come to think of it, a primitive&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;{{create_redirect_to_current_page &amp;nbsp;Redirect_From}&lt;br /&gt;would be nice.&lt;br /&gt;&lt;br /&gt;I imagine that it would first see if the Redirect_From page already exists. &amp;nbsp;If not, it would create&lt;br /&gt;it, redirecting to the current page. &amp;nbsp;If so, it would probably not do anything - which might be a minor lossage if&lt;br /&gt;the Redirect_From page doesn't link to the current page, but which allows things like disambiguation pages.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Perfunctory attempts to find this on mediawiki fail; I'm not sure where to go ask (and I confess to not liking the wikimedia community).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-1231973617875866872?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://comp-arch.net' title='ISO mediawaiki macro'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/1231973617875866872/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=1231973617875866872' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1231973617875866872'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/1231973617875866872'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/01/iso-mediawaiki-macro.html' title='ISO mediawaiki macro'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-8826676233605861038</id><published>2011-01-27T12:46:00.001-08:00</published><updated>2011-01-27T12:46:32.912-08:00</updated><title type='text'>Processor redundancy: FRC/TMR/QMR RAS</title><content type='html'>http://semipublic.comp-arch.net/wiki/Processor_redundancy&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;[[FRC]] - [[failure redundant computation]]. &amp;nbsp;A &amp;nbsp;fairly generic term.&lt;br /&gt;However, at Intel prior to 1990 or so it often referred to [[master-checker]] pairs&lt;br /&gt;- microprocessors wired tiogether at the pins,&lt;br /&gt;one chip driving the pins,&lt;br /&gt;the other &amp;nbsp;comparing what it would drive, were it not configured in FRC checker mode,&lt;br /&gt;to what is actually being driven.&lt;br /&gt;If a different is detected, an error is asserted.&lt;br /&gt;&lt;br /&gt;[[TMR]] - [[three module redundancy]] - three processors, voting to choose outcome.&lt;br /&gt;The loser may be deactivated, failing down to [[FRC]].&lt;br /&gt;&lt;br /&gt;[[QMR]] - [[quad module redundancy]] - usually 2 [[FRC]] pairs. &amp;nbsp;NOT a voting scheme.&lt;br /&gt;One [[master-checker]] pair is designated active, and its outputs are &amp;nbsp;actually used.&lt;br /&gt;The &amp;nbsp;other [[master-checker]] pair is designated inactive.&lt;br /&gt;If the active pair exhibits a difference,&lt;br /&gt;it is failed, and &amp;nbsp;the other pair continues the &amp;nbsp;computation.&lt;br /&gt;&lt;br /&gt;The inactive pair follows the computation so that its state will be "hot".&lt;br /&gt;It probably makes sense to compare the inactive &amp;nbsp;pair's results to the active pair's results,&lt;br /&gt;although if there is such a difference between the pairs&lt;br /&gt;but not within the pairs, it is not clear which can be trusted.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;[[TMR]] and &amp;nbsp;other voting schemes requires somewhat challenging external voting logic.&lt;br /&gt;&lt;br /&gt;[[FRC]] [[master-checker]] pairs require much less external logic: most logic is within the CPU chip.&lt;br /&gt;&lt;br /&gt;[[QMR]] is built out of [[FRC]] [[master-checker]] pairs.&lt;br /&gt;It requires no voting logic.&lt;br /&gt;The &amp;nbsp;comparison logic is within the chip, as &amp;nbsp;in [[FRC]] pairs.&lt;br /&gt;You might imagine needing external logic to select which pair's outputs should be used;&lt;br /&gt;however, this may not be necessary&lt;br /&gt;if you trust the einternal logic of an FRC pair to disable its outputs.&lt;br /&gt;I.e. if asserting FRCERR from a checker can reliably disable the master's outputs, then no external logic may be needed.&lt;br /&gt;&lt;br /&gt;However, such multiple drivers per signal configurations are now deprecated (circa 2010),&lt;br /&gt;so external muxes may be necessary.&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;Above we have talking about FRC/TMR/QMR between chips.&lt;br /&gt;However, it can be applied to any logic block, potentially within the same chip&lt;br /&gt;(although then chip failures might &amp;nbsp;corrupt both).&lt;br /&gt;&lt;br /&gt;Similarly, we have talked about doing FRC/TMW/QMR RAS &amp;nbsp;for processors,&lt;br /&gt;but these techniques can be applied to non-processor logic.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-8826676233605861038?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://semipublic.comp-arch.net/wiki/Processor_redundancy' title='Processor redundancy: FRC/TMR/QMR RAS'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/8826676233605861038/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=8826676233605861038' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8826676233605861038'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/8826676233605861038'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/01/processor-redundancy-frctmrqmr-ras.html' title='Processor redundancy: FRC/TMR/QMR RAS'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-6348959074825586888</id><published>2011-01-16T16:26:00.003-08:00</published><updated>2011-01-16T16:26:52.387-08:00</updated><title type='text'>ublic_comp-arch_wiki_shut_down_because_of_attack_corruption</title><content type='html'>The title pretty much says it all:&lt;br /&gt;the public comp-arch wiki was shutdown because of an attack&lt;br /&gt;that caused corruption&lt;br /&gt;- specifically, pages direct to what can only be assumed to be malware infested websites&lt;br /&gt;&lt;br /&gt;See [[WikiAdminLog: Description of attack on wiki.public.comp-arch.net, November 2011]].&lt;br /&gt;&lt;br /&gt;My hope has long been to have a wik site for computer architecture discussions.&lt;br /&gt;I set up a semipublicarea, writeable by me and readable by the world,&lt;br /&gt;and a public area, read/write by the world&lt;br /&gt;(with whatever security mediawiki provides, e.g. Captchas).&lt;br /&gt;&lt;br /&gt;The public site has been attacked twice.&lt;br /&gt;Can't really say that the security was broken,&lt;br /&gt;just that the attackers or spammers took advantage &amp;nbsp;of the openness of wiki.&lt;br /&gt;&lt;br /&gt;Shutting it down.&lt;br /&gt;Jan 16, 2011.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;http://wiki.public.comp-arch.net/ - public wiki - now shut down&lt;br /&gt;&lt;br /&gt;semipublic wiki still okay:&lt;br /&gt;http://comp-arch.net&lt;br /&gt;http://semipublic.comp-arch.net&lt;br /&gt;https://www.semipublic.comp-arch.net/wiki/index.php?title=Main_Page&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Description of attack&lt;br /&gt;https://www.semipublic.comp-arch.net/wiki/WikiAdminLog:_Description_of_attack_on_wiki.public.comp-arch.net,_November_2011&lt;br /&gt;&lt;br /&gt;This page&lt;br /&gt;https://www.semipublic.comp-arch.net/wiki/WikiAdmin:_public_comp-arch_wiki_shut_down_because_of_attack_corruption&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2425290326823263574-6348959074825586888?l=andyglew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='https://www.semipublic.comp-arch.net/wiki/WikiAdmin:_public_comp-arch_wiki_shut_down_because_of_attack_corruption' title='ublic_comp-arch_wiki_shut_down_because_of_attack_corruption'/><link rel='replies' type='application/atom+xml' href='http://andyglew.blogspot.com/feeds/6348959074825586888/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2425290326823263574&amp;postID=6348959074825586888' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6348959074825586888'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2425290326823263574/posts/default/6348959074825586888'/><link rel='alternate' type='text/html' href='http://andyglew.blogspot.com/2011/01/ubliccomp-archwikishutdownbecauseofatta.html' title='ublic_comp-arch_wiki_shut_down_because_of_attack_corruption'/><author><name>Andy "Krazy" Glew</name><uri>http://www.blogger.com/profile/08442494949914217568</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://4.bp.blogspot.com/_hCuRWttJDHY/SqMKSXNl_pI/AAAAAAAAAC4/I9YJzgA6xxg/S220/AndyGlew-as-RomanEmperor.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2425290326823263574.post-2548270507761430875</id><published>2011-01-15T22:51:00.001-08:00</published><updated>2011-01-15T22:51:59.921-08:00</updated><title type='text'>Dynamic Instruction Rewriting</title><content type='html'>http://semipublic.comp-arch.net/wiki/Dynamic_instruction_rewriting&lt;br /&gt;&lt;br /&gt;[[Dynami
