<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Scali&#039;s blog</title>
	<atom:link href="http://scalibq.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://scalibq.wordpress.com</link>
	<description>Programming, graphics, hardware, maths, and that sort of thing</description>
	<lastBuildDate>Thu, 23 Feb 2012 01:45:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='scalibq.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Scali&#039;s blog</title>
		<link>http://scalibq.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://scalibq.wordpress.com/osd.xml" title="Scali&#039;s blog" />
	<atom:link rel='hub' href='http://scalibq.wordpress.com/?pushpress=hub'/>
		<item>
		<title>CPUs and pipelines, how do they work?</title>
		<link>http://scalibq.wordpress.com/2012/02/19/cpus-and-pipelines-how-do-they-work/</link>
		<comments>http://scalibq.wordpress.com/2012/02/19/cpus-and-pipelines-how-do-they-work/#comments</comments>
		<pubDate>Sun, 19 Feb 2012 22:20:05 +0000</pubDate>
		<dc:creator>Scali</dc:creator>
				<category><![CDATA[Software development]]></category>
		<category><![CDATA[optimize]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[beginner]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[ASM]]></category>
		<category><![CDATA[CPU]]></category>
		<category><![CDATA[pipeline]]></category>
		<category><![CDATA[superscalar]]></category>
		<category><![CDATA[in-order]]></category>
		<category><![CDATA[out-of-order]]></category>
		<category><![CDATA[ooo]]></category>
		<category><![CDATA[oooe]]></category>
		<category><![CDATA[execution]]></category>

		<guid isPermaLink="false">http://scalibq.wordpress.com/?p=591</guid>
		<description><![CDATA[After the tutorial on C that I re-published recently, it is now time for another old article that is still remarkably educational and useful today. It describes the internals of the Pentium and Pentium Pro CPUs. Ironically enough, today&#8217;s Intel &#8230; <a href="http://scalibq.wordpress.com/2012/02/19/cpus-and-pipelines-how-do-they-work/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=591&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>After the <a title="A tutorial on programming C" href="http://scalibq.wordpress.com/2012/02/09/a-tutorial-on-programming-c/">tutorial on C</a> that I re-published recently, it is now time for another old article that is still remarkably educational and useful today.</p>
<p>It describes the internals of the Pentium and Pentium Pro CPUs. Ironically enough, today&#8217;s Intel CPUs are still remarkably similar to these oldies. The Atom is a superscalar two-pipeline in-order CPU, closely related to the classic Pentium. The Core architectures are out-of-order execution CPUs, which still work very similar to the Pentium Pro I described. Likewise, AMD&#8217;s x86 CPUs are all out-of-order execution CPUs as well, and work according to the same principles.</p>
<p>An execution pipeline is basically a conveyor belt for instructions. Instructions pass various stages, at each stage, a part of the execution of the instruction is done. Normally each stage will take 1 clock cycle (clk), that is, an instruction will stay in a stage for 1 clk, and then proceed on to the next. Basically there are 3 main operations:</p>
<ul>
<li>Fetch: Load instruction from memory.</li>
<li>Decode: Set up operands (load from memory if necessary), and find out what the instruction is supposed to do. Basically prepare to dispatch operands to the correct logic unit, for execution.</li>
<li>Execute: Take the operands, dispatch to the correct logic unit, perform the desired operation, and store the result.</li>
</ul>
<p>A simplified pipeline could have these three stages: fetch &#8211; decode &#8211; execute. The instruction passes through them in that sequence:</p>
<pre>        Fetch -&gt; Decode -&gt; Execute
       +--------+--------+--------+
clk 0: | INSTR  |        |        |
       +--------+--------+--------+

       +--------+--------+--------+
clk 1: |        | INSTR  |        |
       +--------+--------+--------+

       +--------+--------+--------+
clk 2: |        |        | INSTR  |
       +--------+--------+--------+</pre>
<p>Now if the instruction is being fetched, it cannot be decoded or executed yet. Does this mean that only 1 stage of the pipeline is active, while the rest is idle? The answer is no. Why not? Because while you execute the first instruction, you can decode the second, and fetch the third at the same time. The different stages do not share any hardware resources, so each stage can work independently of the others, at the same time:</p>
<pre>        Fetch -&gt; Decode -&gt; Execute
        +--------+--------+--------+
 clk 0: | INSTR0 |        |        |
        +--------+--------+--------+

        +--------+--------+--------+
 clk 1: | INSTR1 | INSTR0 |        |
        +--------+--------+--------+

        +--------+--------+--------+
 clk 2: | INSTR2 | INSTR1 | INSTR0 |
        +--------+--------+--------+</pre>
<p>So the stages can work in parallel on sequential instructions. This in effect reduces the execution time of a sequence of instructions. If you have for example a piece of code of 10 instructions, it doesn&#8217;t take 10 x 3 = 30 clks, but instead, it takes 2 clks for the first instruction to reach the execute stage, and from there on, at every clk a new instruction arrives at the execute stage, so you execute the 10 instructions in 10 clks. That&#8217;s a total of only 2 + 10 = 12 clks. There is a sort of &#8216;telescoping&#8217;, or &#8216;waterfall&#8217; effect.</p>
<p>The different instructions in the pipeline can be dependent however. It could be that the instruction that is being decoded, requires an operand that is the result of the instruction being executed. To solve this problem, operand forwarding was introduced. This means that the decode stage does not have to wait for the result to be available. The result is forwarded back into the pipeline as a new operand, so execution can continue without &#8220;stalling&#8221; (waiting):</p>
<p>Consider an add instruction of the form: add destination, source1, source2</p>
<pre>INSTR0: add r0, r1, r2
INSTR1: add r4, r0, r3  Decode -&gt; Execute
       +--------+--------+--------+
clk 0: | INSTR2 | INSTR1 | INSTR0 |-&gt;-+
       +--------+--------+--------+   |
                                      |
                         +------&lt;-----+ r0
                         |
       +--------+--------+--------+
clk 1: | INSTR3 | INSTR2 | INSTR1 |
       +--------+--------+--------+</pre>
<p>Another problem arises when a jump is made. When you jump to a different location, the instructions that were already in the pipeline, are now invalid. So the pipeline needs to be flushed, and start over at the right location. With unconditional jumps, modern CPUs will spot the jumps in time, and continue fetching instructions at the new location right away, without stalling. With conditional jumps however, this is not possible. It is not clear whether the jump should be taken or not, until the condition has been tested, which at worst happens by the last instruction before the conditional jump in the pipeline. So instead of waiting, a modern CPU will try to predict whether the jump will be taken or not, judging from a hint in the code (the usual rule is: if the jump is to a lower address, it is probably a loop, so take the jump. Else, do not take it. Some CPUs even have hint bits inside the opcode, to indicate the behaviour), and from previous iterations. They store whether the jump was taken or not, on the last few occasions. With this strategy, the code can usually continue without stalling. Only when the prediction turns out to be wrong, the pipeline will have to be flushed, and execution will stall for a few clks until the first instruction has reached the execution stage again.</p>
<p>A superscalar CPU is a CPU which has more than one execution pipeline, working in parallel. So you can execute more than one instruction per clk. Operand forwarding can not solve all dependency problems anymore however. It could be that there are 2 instructions being executed in parallel, but the second one requires the result of the first one. The CPU will then have to put the second pipeline on hold. It can only execute the first instruction, then forward the result into the second pipeline. Then the second pipeline can continue executing on the next clk. We refer to this as a stall in the second pipeline. The penalty was 1 clk. It is the responsibility of the programmer to avoid these stalls.</p>
<p>What happens when the clockspeed goes up? All stages in the pipeline consist of sequences of logic gates. A gate has a certain propagation time. The time it takes to settle to a consistent state, after a change of input signals. So the sequence of gates in a pipeline stage can have at most a propagation time of 1 clk. As clockspeed goes up, the time that 1 clk takes goes down, so the maximum length of the logic sequence in a pipeline stage decreases as well. There are 2 solutions to this problem:</p>
<ul>
<li>Reduce the logic required to fetch, decode and execute instructions</li>
<li>Split the pipeline up in more stages</li>
</ul>
<p>In order to reduce the logic required to fetch and decode instructions, the instructions should be encoded in a less complex fashion. Reduced Instruction Set Computing (RISC) is a CPU design philosophy which aims for small and simple instructionsets. Complex instructions which can be replaced by a sequence of 1 or more simple instructions without significantly increasing total execution time, will not be implemented in hardware. Instructions should also be encoded in an orthogonal (uniform) manner, to reduce decoding complexity. They should all be the same size, and layout. The old paradox of &#8220;less is more&#8221; is a nice description of RISC. To give a name to the previous, more complex designs, the name Complex Instruction Set Computing (CISC) was chosen. These more complex designs date from an earlier age, when memory was small and expensive, and clock speeds were low. There was no pipelining yet, let alone superscalar designs. Code was mostly written by hand, rather than by compilers. So the designs were made to minimize the code footprint, and maximize the operations per cycle. This meant a lot of instructions with implicit operands, and performing common sequences of operations.</p>
<p>A few results of this philosophy are that virtually all instructions can only operate on registers, and virtually all instructions have the same number of operands. Loading registers from memory, or storing to memory is done with a few special instructions. At the time of RISC implementation, manufacturers were able to put millions of gates on a chip, but because of RISC, way less were needed than ever before. The extra gates could now be used for more registers, extra cache or other features.</p>
<p>Splitting the pipeline up in order to increase clockspeed is a poor mans solution. Namely, if there are interlocks because of dependent instructions for example, the penalties can increase from 1 to several stages. But, what if you cannot redesign your instructionset, because you want to keep supporting legacy code?</p>
<p>There are again 2 solutions:</p>
<ul>
<li>Design a new RISC CPU, and support legacy code with a software emulation layer.</li>
<li>Design a new RISC CPU, and support legacy code with a hardware emulation layer.</li>
</ul>
<p>It is interesting to see that Motorola chose the first option, when going from 68k to PowerPC. The Apple computers were powered by 68k CPUs. The new PowerPC-powered CPUs had 68k emulation built into the OS. At that time, the choice was not such a bad one. The new RISC CPU could be manufactured at much higher clockspeeds, and with more cache and more registers. This gave a leap in performance, which could make emulated legacy code perform acceptably, while native code would run at the highest possible speed.</p>
<p>Motorola did however release a 68060, which was designed with the second option. Which was the same choice that Intel made for their x86 family. The instructionset was not redesigned totally. They did use some ideas taken from the RISC philosophy however. In short, their CPUs now translate the complex x86 instructions internally, and then issue them to an execution pipeline which could be seen as a sort of RISC CPU. It can execute what Intel calls &#8220;micro-operations&#8221; or µOps, since the Pentium PRO (Pentium PRO, Celeron, PII and PIII all have the same P6 core). And surely, some of the instructions which would have been removed in a RISC instructionset, are now constructed from a sequence of 2 or more µOps. The Pentium has no names for the &#8216;atomic operations&#8217; that it builds the instructions from&#8230; But it clearly does just that. Their pipeline is still longer than that of a true RISC CPU, but it is shorter than it would be when all instructions were to be decoded and executed directly, rather than translated to µOps. Intel managed to keep the implementation of a complex instructionset within reasonable bounds, and managed to get the clockspeed up to impressive rates. At the time of writing they have a Pentium 4 CPU out, which runs at 1.5 GHz. This makes the Pentium 4 the highest clocked CPU currently available. The high clockspeed comes at a cost however. The instruction throughput is relatively low, because the pipeline has a lot of stages, because at 1.5 GHz, the stages have to be very short. On top of that, its decoding scheme is still more complex than that of CPUs with a RISC instructionset.</p>
<p>But let&#8217;s look at the x86 family more closely. Because at the backend there is the RISC-like µOp-execution unit, the CPU also gets RISC-like traits. What does this mean? The translation from x86 instructions to µOps is not always an easy one. Only a part of the x86 instructionset can be considered orthogonal. But the x86 instructions have complex addressing modes. Intel decided to map register-only operations 1:1 to µOps, and add additional µOps for the complex addressing modes. Pentium PRO can handle simple addressing modes ([reg]) in 1 µOp forms as well.</p>
<p>A few Pentium examples:</p>
<pre>add eax, edx</pre>
<p>is 1 clk.</p>
<pre>add eax, [edx]</pre>
<p>will translate to:</p>
<pre>mov tmp, [edx]
add eax, tmp</pre>
<p>where tmp is a temporary internal register. Takes 2 clks.</p>
<pre>add [eax], edx</pre>
<p>is three operations on Pentium:</p>
<pre>mov tmp, [eax]
add tmp, edx
mov [eax], tmp</pre>
<p>This is exactly the style in which RISC code is written, since add would only be able to operate on registers, not on memory directly. So in short, the CPU seems to translate ordinary x86 code to RISC code. The downside to having the CPU translate your code, is that the pipeline will have to wait for the sequence of instructions to finish, before issuing the next pair of instructions. When you write the RISC-like code yourself, you are working with 1 clk operations only, and you can get 2 independent instructions through per clk. A few instructions are even more interesting on Pentium, because translating them by hand will actually make them faster.</p>
<p>For example:</p>
<pre>cdq</pre>
<p>takes 3 clks.</p>
<pre>mov edx, eax
sar edx, 31</pre>
<p>takes 2 clks.</p>
<pre>stosd</pre>
<p>takes 2 clks.</p>
<pre>mov [esi], eax
add esi, 4</pre>
<p>takes 1 clk.</p>
<pre>loop label</pre>
<p>takes 5 clks.</p>
<pre>dec ecx
jnz label</pre>
<p>takes 1 clk.</p>
<p>Basically there are a few extra rules for instructions to execute in 1 clk on Pentium. The Pentium has 2 pipelines, but they are not identical, the primary pipeline is called U, the secondary pipeline is called V. Some instructions can only be executed in U (eg. shift/rotate instructions), others only in V (eg. call/jump instructions). There are also some instructions which will not pair at all. They will only execute in U, and V will stall. The decoders also have limits. Each pipeline can decode instructions up to 7 bytes per clk. If a longer instruction is encountered, there will be a stall. Finally, the Address Generation Unit (AGU) is a stage earlier in the pipeline. What this means is that if you have an instruction that has to address memory with the result of a previous instruction, then it will have to stall for 1 clk. Namely, the operand has to be forwarded not one, but 2 stages back. This is known as the Address Generation Interlock (AGI) stall:</p>
<pre>    add eax, ebx
    mov edx, [eax]

        Decode -&gt;  AGU -&gt; Execute
       +--------+--------+--------+
clk 0: | INSTR2 | INSTR1 | INSTR0 |-&gt;-+
       +--------+--------+--------+   |
                                      |
                +----------&lt;----------+ eax
                |
       +--------+--------+--------+
clk 1: | INSTR2 | INSTR1 |        |
       +--------+--------+--------+</pre>
<p>We say the latency of the instruction is 2 clks. The result has to be available 2 clks ahead. Most instructions have a latency of 1 clk. Operand forwarding can avoid stalls on 1 clk latency instructions, as long as there are no 2 dependent instructions in the pipeline at the same time, as we&#8217;ve seen earlier.</p>
<p>The Pentium PRO (and other P6 core CPUs) is a completely different beast however. Instead of translating in-place, as the Pentium does, the Pentium PRO decodes the x86 instructions into µOps and stores them in a buffer. A scheduling unit (Reservation Station) will then dispatch µOps to the units as the operands and the unit become available. This means it can execute instructions out-of-order (ooo). The results are then stored in a reorder buffer (ROB), so the result is guaranteed to be equivalent to the original sequence of instructions. In other words, the out-of-order execution is transparent to the program. Why is this interesting? A stall occuring in the early part of the pipeline will not affect the backend. The decoder might not issue new µOps during the stall, but while there are still µOps in the ooo buffer, the backend can continue execution. The ooo execution can also minimize the cost of latencies/stalls. While 1 µOp has to wait on a previous one to complete, the CPU can still dispatch other µOps which are not dependent, even if they were originally sequenced after the dependent instruction.</p>
<p>The Pentium PRO has 3 decoders, d0, d1 and d2. Decoder d0 can decode an instruction with a maximum length of 7 bytes, and a maximum complexity of 4 µOps per clk. Decoders d1 and d2 can decode an additional instruction of 1 µOp per clk each. So the ooo buffer can be filled with up to 6 µOps per clk.</p>
<p>The ooo execution pipeline has 5 different ports, port 0 through 4. Each port handles a specific class of µOps. Port 0 handles integer and floating point arithmetic, and address generation operations. Port 1 handles simple integer arithmetic instructions (not shift, multiply or divide). Port 2 handles memory load operations. Port 3 handles the calculation of memory write addresses. Port 4 handles memory write operations.</p>
<p>Each port can take 1 µOp per clk. This means that the CPU can dispatch a maximum of 5 (independent) µOps per clk. Since the decoders decode into the ooo buffer, they are now no longer dependent on the execution backend. This means that the decoders no longer have to stall when the execution takes more than 1 clk. In fact, the ports on a PPRO themselves are pipelined as well. Which means that even multiple-clk instructions such as multiply can be issued every clk, and they pass through the stages subsequently. With PPRO, you don&#8217;t specify execution speed of instructions by simply counting the clks, instead, you specify latency and throughput. For example, mul is specified as: latency 4 clks, throughput 1 per clk. This means that when we issue a mul, it takes 4 clks to complete in total. But you can issue 1 mul per clk. If you issue 2 subsequent muls, then the second one will follow one clk after the first. So we see the same &#8216;telescoping&#8217; or &#8216;waterfall&#8217; effect&#8230;</p>
<pre>1 mul takes 4 clks
2 muls take 5 clks
3 muls take 6 clks
etc.</pre>
<p>After execution, the µOps wait in the ROB for &#8220;retirement&#8221;. The retire stage will update all operands (register or memory) in order, at a maximum of 3 µOps per clk. It will also take care of the operand forwarding.</p>
<p>This rather complex and very interesting execution scheme shifts the focus from instruction scheduling to issuing µOps to the ooo backend. Scheduling instructions for dependency is now less important, because the CPU can schedule the µOps itself. There should not be so many dependencies that the ooo-buffer fills up faster than the backend can handle, but not each and every dependency results directly in a stall anymore. The ooo buffer acts like a cushion for dependencies and latencies.</p>
<p>You could see the CPU as a funnel for instructions&#8230; You can add up to 6 µOps per clk, then it can dispatch up to 5 µOps, and finally retire up to 3 µOps per clk. When writing code for PPRO CPUs, the main focus should be on using as little µOps as possible, rather than getting as many µOps per clk into the ooo buffer. Decoding them efficiently is important as well. Instructions that consist of more than 1 µOp should be scheduled to go into d0. Try to get both d1 and d2 to decode an instruction as well.</p>
<p>One part we&#8217;ve not touched yet, is the PPRO&#8217;s register renaming scheme. While on the outside it appears that the PPRO has only 8 integer registers, it has in fact 40 temporary registers inside. It uses these temporary registers internally, for a special kind of dependencies such as this one:</p>
<pre>mov eax, [a]
mul [b]
mov [c], eax
mov eax, [d]</pre>
<p>We saw earlier that a mul takes 4 clks. If the CPU would use the physical registers directly, then the last mov would have to wait until the previous store operation was dispatched. But instead the first three instructions get one temporary register assigned for eax, and the last instruction gets a new temporary register assigned again. The eax register has been &#8216;renamed&#8217; to a temporary register. Now the load operation can be dispatched at any time, it does not have to wait for the previous instructions to finish. The CPU can rename 3 registers per clk. Each temporary register has its own dependency chain. Try to keep these chains short, so that the CPU can allocate new temporary registers early, which will reduce dependencies, and allow the CPU to dispatch more indepentent µOps. (note: the old tricks of &#8216;xor reg, reg&#8217; or &#8216;sub reg, reg&#8217; to set a register  to 0 will not be recognized as independent. A new independent chain will not  be created.)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scalibq.wordpress.com/591/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scalibq.wordpress.com/591/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scalibq.wordpress.com/591/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scalibq.wordpress.com/591/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scalibq.wordpress.com/591/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scalibq.wordpress.com/591/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scalibq.wordpress.com/591/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scalibq.wordpress.com/591/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scalibq.wordpress.com/591/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scalibq.wordpress.com/591/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scalibq.wordpress.com/591/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scalibq.wordpress.com/591/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scalibq.wordpress.com/591/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scalibq.wordpress.com/591/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=591&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scalibq.wordpress.com/2012/02/19/cpus-and-pipelines-how-do-they-work/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2547913ebf910c8aa2c632619be46e93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">scalibq</media:title>
		</media:content>
	</item>
		<item>
		<title>The myth of CMT (Cluster-based Multithreading)</title>
		<link>http://scalibq.wordpress.com/2012/02/14/the-myth-of-cmt-cluster-based-multithreading/</link>
		<comments>http://scalibq.wordpress.com/2012/02/14/the-myth-of-cmt-cluster-based-multithreading/#comments</comments>
		<pubDate>Tue, 14 Feb 2012 11:18:14 +0000</pubDate>
		<dc:creator>Scali</dc:creator>
				<category><![CDATA[Hardware news]]></category>
		<category><![CDATA[AMD]]></category>
		<category><![CDATA[CMT]]></category>
		<category><![CDATA[fake]]></category>
		<category><![CDATA[HyperThreading]]></category>
		<category><![CDATA[nonsense]]></category>
		<category><![CDATA[SMT]]></category>
		<category><![CDATA[x86]]></category>

		<guid isPermaLink="false">http://scalibq.wordpress.com/?p=580</guid>
		<description><![CDATA[The first time I heard someone use the term &#8216;CMT&#8217;, I was somewhat surprised. Was there a different kind of CPU multithreading technology that I somehow missed? But when I looked it up, things became quite clear. If you google &#8230; <a href="http://scalibq.wordpress.com/2012/02/14/the-myth-of-cmt-cluster-based-multithreading/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=580&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The first time I heard someone use the term &#8216;CMT&#8217;, I was somewhat surprised. Was there a different kind of CPU multithreading technology that I somehow missed? But when I looked it up, things became quite clear. If you google the term, you&#8217;ll mainly land on AMD marketing material, explaining &#8216;cluster-based multithreading&#8217; (or sometimes also &#8216;clustered multithreading&#8217;):</p>
<p><a href="http://scalibq.files.wordpress.com/2012/02/3663732_9bc35365d1_l.png"><img class="aligncenter size-full wp-image-581" title="AMD Multithreading" src="http://scalibq.files.wordpress.com/2012/02/3663732_9bc35365d1_l.png?w=640&#038;h=480" alt="" width="640" height="480" /></a></p>
<p>This in itself is strange, as one page you will also find is this: <a href="http://dl.acm.org/citation.cfm?id=640477.640525">http://dl.acm.org/citation.cfm?id=640477.640525</a></p>
<blockquote><p>Triggered by the ever increasing advancements in processor and networking technology, a cluster of PCs connected by a high-speed network has become a viable and cost-effective platform for the execution of computation intensive parallel multithreaded applications.</p></blockquote>
<p>So apparently the term &#8216;cluster-based multithreading&#8217; has been used before AMD&#8217;s CMT, and is a lot less confusing: it just speaks of conventional clustering of PCs to build a virtual supercomputer.</p>
<p>So CMT is just an &#8216;invention&#8217; by AMD&#8217;s marketing department. They invented a term that sounds close to SMT (<a href="http://en.wikipedia.org/wiki/Simultaneous_multithreading">Simultaneous Multithreading</a>), in an attempt to compete with Intel&#8217;s HyperThreading. Now clearly,  HyperThreading is just a marketing-term as well, but it is Intel&#8217;s term for their implementation of SMT, which is a commonly accepted term for a multithreading approach in CPU design, and has been in use long before Intel implemented HyperThreading (IBM started researching it in 1968, to give you an idea of the historical perspective here).</p>
<p>Now the problem I have with CMT is that people are actually buying it. They seem to think that CMT is just as valid a technology as SMT. And worse, they think that the two are closely related, or even equivalent. As a result, they are comparing CMT with SMT in benchmarks, as I found in this Anandtech review a few days ago: <a href="http://www.anandtech.com/show/5279/the-opteron-6276-a-closer-look/6">http://www.anandtech.com/show/5279/the-opteron-6276-a-closer-look/6</a></p>
<blockquote><p>AMD claimed more than once that Clustered Multi Threading (CMT) is a much more efficient way to crunch through server applications than Simultaneous Multi Threading (SMT), aka Hyper-Threading (HTT).</p></blockquote>
<p>Now, I have a problem with comparisons like these&#8230; Let&#8217;s compare the benchmarked systems here: <a href="http://www.anandtech.com/show/5279/the-opteron-6276-a-closer-look/2">http://www.anandtech.com/show/5279/the-opteron-6276-a-closer-look/2</a></p>
<p>Okay, so all systems have two CPUs. So let&#8217;s look at the CPUs themselves:</p>
<ul>
<li>Opteron 6276: 8-module/16-thread, which has two Bulldozer dies of 1.2B transistors each, total 2.4B transistors</li>
<li>Opteron 6220: 4-module/8-thread, one Bulldozer die of 1.2B transistors</li>
<li>Opteron 6174: 12-core/12-thread, which has two dies of 0.9B transistors each, total 1.8B transistors</li>
<li>Xeon X5650: 6-core/12-thread, 1.17B transistors</li>
</ul>
<p>Now, it&#8217;s obvious where things go wrong here, by just looking at the transistorcount: The Opteron 6276 is more than twice as large as the Xeon. So how can you have a fair comparison of the merits of CMT vs SMT? If you throw twice as much hardware at the problem, it&#8217;s bound to be able to handle more threads better. The chip is already at an advantage anyway, since it can handle 16 simultaneous threads, where the Xeon can only handle 12.</p>
<p>But if we look at the actual benchmarks, we see that the reality is different: AMD actually NEEDS those two dies to keep up with Intel&#8217;s single die. And even then, Intel&#8217;s chip excels in keeping response times short. The new CMT-based Opterons are not all that convincing compared to the smaller, older Opteron 6174 either, which can handle only 12 threads instead of 16, and just uses vanilla SMP for multithreading.</p>
<p>Let&#8217;s inspect things even closer&#8230; What are we benchmarking here? A series of database scenarios, with MySQL and MSSQL. This is integer code. Well, that *is* interesting. Because, what exactly was it that CMT did? Oh yes, it didn&#8217;t do anything special for integers! Each module simply has two dedicated integer cores. It is the FPU that is shared between two threads inside a module. But we are not using it here. Well, lucky AMD, best case scenario for CMT.</p>
<p>But let&#8217;s put that in perspective&#8230; Let&#8217;s have a simplified look at the execution resources, looking at the integer ALUs in each CPU.</p>
<p>The Opteron 6276 with CMT disabled has:</p>
<ul>
<li>8 modules</li>
<li>8 threads</li>
<li>4 ALUs per module</li>
<li>2 ALUs per thread (the ALUs can not be shared between threads, so disabling CMT disables half the threads, and as a result also half the ALUs)</li>
<li>16 ALUs in total</li>
</ul>
<p>With CMT enabled, this becomes:</p>
<ul>
<li>8 modules</li>
<li>16 threads</li>
<li>4 ALUs per module</li>
<li>2 ALUs per thread</li>
<li>32 ALUs in total</li>
</ul>
<p>So nothing happens, really. Since CMT doesn&#8217;t share the ALUs, it works exactly the same as the usual SMP approach. So you would expect the same scaling, since the execution units are dedicated per thread anyway. Enabling CMT just gives you more threads.</p>
<p>The Xeon X5650 with SMT disabled has:</p>
<ul>
<li>6 cores</li>
<li>6 threads</li>
<li>3 ALUs per core</li>
<li>3 ALUs per thread</li>
<li>18 ALUs in total</li>
</ul>
<p>With SMT enabled, this becomes:</p>
<ul>
<li>6 cores</li>
<li>12 threads</li>
<li>3 ALUs per core</li>
<li>3 ALUs per 2 threads, effectively ~1.5 ALUs per thread</li>
<li>18 ALUs in total</li>
</ul>
<p>So here the difference between CMT and SMT becomes quite clear: With single-threading, each thread has more ALUs with SMT than with CMT. With multithreading, each thread has less ALUs (effectively) than CMT.</p>
<p>And that&#8217;s why SMT works, and CMT doesn&#8217;t: AMD&#8217;s previous CPUs also had 3 ALUs per thread. But in order to reduce the size of the modules, AMD chose to use only 2 ALUs per thread now. It is a case of cutting off one&#8217;s nose to spite their face: CMT is struggling in single-threaded scenario&#8217;s, compared to both the previous-generation Opterons and the Xeons.</p>
<p>At the same time, CMT is not actually saving a lot of die-space: There are 4 ALUs in a module in total. Yes, obviously, when you have more resources for two threads inside a module, and the single-threaded performance is poor anyway, one would expect it to scale better than SMT.</p>
<p>But what does CMT bring, effectively? Nothing. Their chips are much larger than the competition&#8217;s, or even their own previous generation. And since the Xeon is so much better with single-threaded performance, it can stay ahead in heavy multithreaded scenario&#8217;s, despite the fact that SMT does not scale as well as CMT or SMP. But the real advantage that SMT brings is that it is a very efficient solution: it takes up very little die-space. Intel could do the same as AMD does, and put two dies in a single package. But that would result in a chip with 12 cores, running 24 threads, and it would absolutely devour AMD&#8217;s CMT in terms of performance.</p>
<p>So I&#8217;m not sure where AMD thinks that CMT is &#8216;more efficient&#8217;, since they need a much larger chip, which also consumes more power, to get the same performance as a Xeon, which is not even a high-end model. The Opteron 6276 tested by Anandtech is the top of the line. The Xeon X5650 on the other hand is a midrange model clocked at 2.66 GHz. The top model of that series is the X5690, clocked at 3.46 GHz. Which shows another advantage of smaller chips: better clockspeed scaling.</p>
<p>So, let&#8217;s not pretend that CMT is a valid technology, comparable to SMT. Let&#8217;s just treat it as what it is: a hollow marketing term. I don&#8217;t take CMT seriously, or people who try to use the term in a serious context, for that matter.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scalibq.wordpress.com/580/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scalibq.wordpress.com/580/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scalibq.wordpress.com/580/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scalibq.wordpress.com/580/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scalibq.wordpress.com/580/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scalibq.wordpress.com/580/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scalibq.wordpress.com/580/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scalibq.wordpress.com/580/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scalibq.wordpress.com/580/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scalibq.wordpress.com/580/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scalibq.wordpress.com/580/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scalibq.wordpress.com/580/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scalibq.wordpress.com/580/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scalibq.wordpress.com/580/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=580&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scalibq.wordpress.com/2012/02/14/the-myth-of-cmt-cluster-based-multithreading/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2547913ebf910c8aa2c632619be46e93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">scalibq</media:title>
		</media:content>

		<media:content url="http://scalibq.files.wordpress.com/2012/02/3663732_9bc35365d1_l.png" medium="image">
			<media:title type="html">AMD Multithreading</media:title>
		</media:content>
	</item>
		<item>
		<title>A tutorial on programming C</title>
		<link>http://scalibq.wordpress.com/2012/02/09/a-tutorial-on-programming-c/</link>
		<comments>http://scalibq.wordpress.com/2012/02/09/a-tutorial-on-programming-c/#comments</comments>
		<pubDate>Thu, 09 Feb 2012 14:18:28 +0000</pubDate>
		<dc:creator>Scali</dc:creator>
				<category><![CDATA[Software development]]></category>
		<category><![CDATA[beginner]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[function]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[pointer]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[struct]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://scalibq.wordpress.com/?p=573</guid>
		<description><![CDATA[Over the years I have written various things related to programming and hardware. They were published on websites that no longer exist and/or have disappeared into oblivion. I have found a few of them, which I think may still be &#8230; <a href="http://scalibq.wordpress.com/2012/02/09/a-tutorial-on-programming-c/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=573&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Over the years I have written various things related to programming and hardware. They were published on websites that no longer exist and/or have disappeared into oblivion. I have found a few of them, which I think may still be relevant today, so I will re-publish them here.</p>
<p>I will start with a tutorial on programming in C. I think programming in bare C is still relevant today, even with fancy languages based on C, such as C++, C#, Objective C and Java. It is good to have a good grasp of the basics in C, and be fluent with pointer manipulation and such. It can help you to make your code simpler and more efficient in more modern languages as well.</p>
<p>C: The beginning<br />
&#8212;&#8212;&#8212;&#8212;&#8212;-</p>
<p>Well, C is a relatively simple language, in that it does not have too many<br />
language constructs. That&#8217;s because the language is relatively low-level,<br />
it&#8217;s quite close to the machine, so to say. And the machine is a thing of<br />
anarchy and chaos. C was developed to get some structure in this, without<br />
sacrificing too much performance, and size.</p>
<p>We have:</p>
<pre>- Data declarations
- Functions
- Operators
- Flow control statements
- Type definitions</pre>
<p>That is what you have to work with. In short, code consists of functions,<br />
containing (flow control) statements, with operations on data of specific<br />
types.<br />
That may seem a bit too much to grasp at this point, but we&#8217;ll take it one<br />
step at a time.</p>
<p>Data<br />
&#8212;-</p>
<p>Ofcourse we need data&#8230; like text or numbers. On data, we can perform<br />
operations, like adding, subtracting, multiplying and such.<br />
So first, let&#8217;s see how we can give our programs some data.</p>
<p>We have 2 kinds of data: initialized data, and uninitialized data. They are<br />
declared in much the same way, except that initialized data gets a value<br />
assigned at declaration, and uninitialized data does not. So the initial<br />
value of an uninitialized variable is undefined.</p>
<p>First you give the type of your data variable, then you give the name:</p>
<pre>char myChar;</pre>
<p>This is an unintialized data variable. Initialized data works by assigning an<br />
initial value to the variable:</p>
<pre>char myChar = 'a';</pre>
<p>Well, char is just 1 primitive data type of C. I will give you the complete<br />
list and their sizes in memory here:</p>
<pre>- char          1 byte integer, also used for characters.
- int           2 or 4 bytes integer, depending on the system architecture.
- float         4 bytes floating point number.
- double        8 bytes floating point number.
- (pointers)    depending on the system architecture.</pre>
<p>As you see, some data types are dependant on the system. So to make things<br />
easier, I will choose the popular x86 system in 32 bit mode from now on.<br />
Note that other systems may vary.</p>
<p>There are also 2 &#8216;size modifier&#8217; directives:</p>
<pre>- short         2 bytes
- long          4 bytes</pre>
<p>These directives can be prefixed to int, and will determine the size. In most<br />
compilers, you can omit the int, and just use long and short as if they were<br />
primitive types themselves.</p>
<p>So for 2 byte integers, both these declarations are correct:</p>
<pre>short int a;
short a;</pre>
<p>Similarly for 4 byte integers:</p>
<pre>long int a;
long a;</pre>
<p>And if there&#8217;s no size modifying directive for an int, the compiler will use<br />
the default size, which is platform-dependant.</p>
<p>On the x86 system, ints are long, 4 bytes, and so are pointers. Pointers are a<br />
special group of data types, which I will cover later.<br />
(in 16 bit realmode OSes, ints used to default to short, and pointers could<br />
be either near or far. This had to do with the segmented memory model on old<br />
x86 processors (8086, 8088, 80186 and 80286). This legacy system is beyond<br />
the scope of this text, since modern x86 systems use 32 bit addressing. But<br />
when using a realmode OS such as DOS, you have to pay attention to this.)</p>
<p>These data types are seen as signed numbers by default. Signed means that the<br />
number can be both positive and negative. Unsigned variables can not have<br />
negative values.<br />
This is interesting, because a char is only 1 byte, or 8 bits big. It can take<br />
on 2^8, or 256 values. With signed, this would be -128 to and with 127.<br />
With unsigned, it would be 0 to and with 255.<br />
You can control this behavior with the signed and unsigned directives:</p>
<pre>signed int;
unsigned int;</pre>
<p>It is also legal to define multiple variables of the same type on one line,<br />
even intermixing initialized and uninitialized data.<br />
It works by simply separating all variables by comma&#8217;s, like this:</p>
<pre>unsigned int myVar1, myVar2, myVar3 = 50, myVar4;</pre>
<p>Functions<br />
&#8212;&#8212;&#8212;</p>
<p>Functions are the core of any C program. They contain the actual code, and<br />
therefore provide the functionality of the program. A function can receive<br />
parameters, the data it will process. And a function can return a primitive<br />
data type variable. You declare a function in the sequence of return type,<br />
function name, and parameter list (in parentheses):</p>
<pre>int MyFunction(int param1, unsigned char param2, signed short param3)</pre>
<p>Functions also have the possibility to not return anything. In that case we<br />
have the special void data type. We will see this type again later with<br />
pointers. This data type can also indicate that we want no parameters.<br />
So if you don&#8217;t need any return value, and no parameters, then you can do:</p>
<pre>void MyFunction(void)</pre>
<p>(note: there&#8217;s old-style and new-style for functions with no parameters.<br />
Official ANSI C wants MyFunction(void), but before the ANSI C standard was<br />
introduced, MyFunction() was used. For most compilers, both styles should<br />
work, but some (eg Borland) may enforce the ANSI C (void) style.)</p>
<p>Blocks of code are always between curly braces: {}. So the code that goes<br />
into our function is no different. The code block immediately follows our<br />
first line which declared the function prototype.</p>
<p>This might also be a good time to explain how to add comments to your program.<br />
A C comment is prefixed by /* and postfixed by */. Anything between those<br />
symbols is considered as comment, and will not be looked at by the compiler.</p>
<p>A small example:</p>
<pre>int main(void)
{
    /* Print some text to the screen, using a library function */
    puts("Hello world!");

    /* Exit function with return value */
    return 0;
}</pre>
<p>Here we have a function calling the puts() function with a text string as<br />
a parameter (puts() will &#8216;put&#8217; the &#8216;s&#8217;tring on screen. We will look at these<br />
strings later, aswell as the puts function), and then returning a signed<br />
integer value of 0 (this is an immediate operand).<br />
Note also that each line of code in C is delimited by a semicolon (;).</p>
<p>Now, to look at the calling of functions more closely&#8230;</p>
<p>You can use functions from your own source, but you can also import functions<br />
from earlier compiled modules of code, or libraries. Libraries are made up of<br />
a number of modules of code. ANSI C comes with quite a few libraries of code,<br />
which you can use in your programs. With these libraries, you also get header<br />
files, which include these function prototypes, among other things. We will<br />
look at these header files more closely lateron, when we are actually going<br />
to write a program.</p>
<p>Before you can call a function, the compiler needs to know how many parameters<br />
are to be passed to the function, and what types they are. This is done via a<br />
prototype of the function.</p>
<p>If a function is defined in your own source code, above the line where you<br />
want to call it, then the compiler already knows the prototype, since it has<br />
seen the actual function before, and you won&#8217;t have to do anything.<br />
If a function is below your call, or imported from a code library or module,<br />
then the compiler won&#8217;t know the function, so we have to provide a prototype<br />
before using the function.</p>
<p>A prototype looks much like the first line of a function, except that the<br />
parameter names are optional, and are usually omitted.<br />
An example of a prototype:</p>
<pre>int MyFunction(int x, char y, short z);</pre>
<p>Or, omitting the names:</p>
<pre>int MyFunction(int, char, short);</pre>
<p>Then you&#8217;re all set to use the function lateron in your source.</p>
<p>Calling a function is as simple as filling in the blanks, basically. All you<br />
have to do is fill in the variables, in the prototype, and the function will<br />
be called, and its return value will be yielded.<br />
You can either import functions from another library of code, or<br />
use functions from your own source.</p>
<p>For example, if we have a function like this:</p>
<pre>float sqrt(float);</pre>
<p>which will return the square root of a float we give it, we can make a small<br />
piece of code like this:</p>
<pre>float x = 25, sqrtOfX;

sqrtOfX = sqrt(x);</pre>
<p>As you can see, you can treat a function call like a number. In this example,<br />
the parameter x will be passed to the function, the function will do its work,<br />
and return the square root of x. And this result will be assigned to the<br />
sqrtOfX variable.</p>
<p>Operators<br />
&#8212;&#8212;&#8212;</p>
<p>So now that we know how and where to put our code, the next question ofcourse<br />
will be: &#8220;How do I write code?&#8221;. That is not a trivial question, so we will<br />
break code down to some subsets. Our first subset is &#8220;operations on data&#8221;.</p>
<p>As with most language constructs, C does not have too much operations on data.<br />
Here&#8217;s the list:</p>
<p>Mathematical operators:</p>
<pre>- +  : addition.
- -  : subtraction.
- *  : multiplication.
- /  : division.
- %  : division remainder/modulus.</pre>
<p>Bitwise operators:</p>
<pre>- &amp;  : AND
- |  : OR
- ^  : XOR
- ~  : NOT
- &gt;&gt; : shift right
- &lt;&lt; : shift left</pre>
<p>They all work the same, in that you specify a target variable, then the first<br />
operand, the operator, and then the second operand.</p>
<p>I will give a small example, with some data:</p>
<pre>int destination;
int operand1 = 10;
int operand2 = 20;

destination = operand1 * operand2;</pre>
<p>This will assign the value of the expression 10 * 20 to the destination<br />
variable.<br />
Well, to be more precise, the right hand side is an expression, which yields<br />
a result. You could just write this in C:</p>
<pre>operand1 * operand2;</pre>
<p>This would yield the result, but it never does anything with it. In these<br />
first examples, we will assign the result to a variable, but we will see that<br />
there are other things we can do with expressions, such as combining them to<br />
larger expressions. You could say that the above expression is a &#8216;primitive<br />
expression&#8217;</p>
<p>You can also use immediate operands instead of variables:</p>
<pre>int destination;
int operand1 = 15;

destination = operand1 / 3;</pre>
<p>This will assign the result of 15 divided by 3 to destination.</p>
<p>There is also shorthand notation for the case where one of the operands is<br />
also the destination variable. The shorthand notation works with putting the<br />
operator directly in front of the equals-sign, and specifying only the other<br />
operand. So:</p>
<pre>destination = destination ^ 10;</pre>
<p>can be written as:</p>
<pre>destination ^= 10;</pre>
<p>There&#8217;s another shorthand case, namely when you want to increase or decrease<br />
the value of a variable by 1 unit. Why do I call it a unit? We&#8217;ll see that<br />
later, when discussing pointers. For numbers, the unit is simply the number<br />
1. These are the operators for it:</p>
<pre>- ++ : increase
- -- : decrease</pre>
<p>They work slightly different from the normal operators. You just prefix or<br />
postfix them to a variable, there is no equals-sign involved. When postfixing<br />
the operator, the value is used in an expression, and afterwards its value is<br />
increased. When prefixing it, the value is increased first, then used in the<br />
expression.</p>
<p>Some examples:</p>
<pre>int destination;
int operand1 = 30;
int operand2 = 19;

destination = operand1 - ++operand2;</pre>
<p>This will result in the following values:</p>
<pre>destination = 30 - 20 = 10
operand1    = 30
operand2    = 20</pre>
<p>Postfixing the operator:</p>
<pre>destination = operand1 - operand2++;</pre>
<p>Gives the following results:</p>
<pre>destination = 30 - 19 = 21
operand1    = 30
operand2    = 20</pre>
<p>You can also make more complex expression. The compiler should follow standard<br />
operator precedence, and resolve expressions in brackets first. You can<br />
recursively compose expressions, yielding the results of each &#8216;primitive<br />
equation&#8217; in the sequence the brackets and operator precedence have defined.</p>
<p>For example, this is possible:</p>
<pre>int destination;
int var1 = 10;
int var2 = 20;
int var3 = 30;

destination = 27 * ((var2 + --var3)/(var1 * 2))</pre>
<p>This will give destination the value of:</p>
<pre>27 * ((20 + 29)/(10 * 2)) =
27 * 49/20 =
27 * 2 = 54</pre>
<p>And finally, you can also use the results of functions in expressions:</p>
<pre>destination = 25 + sqrt(var0);</pre>
<p>Now, onto another kind of expressions&#8230;</p>
<p>Flow-control statements<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</p>
<p>Here comes the interesting part of code, namely controlling the flow of our<br />
program, on specific conditions.</p>
<p>We do this with boolean expressions, a condition is either TRUE or FALSE.<br />
Since a computer can only represent numbers, we express these boolean values<br />
in integers (either char, short, int or long). A 0 stands for false,<br />
everything else stands for true. But as a rule, when assigning the true value<br />
to a variable, 1 is used.</p>
<p>For boolean expressions we have some operators:</p>
<pre>- == : equals
- != : not-equals
- &amp;&amp; : logical and
- || : logical (inclusive) or
- !  : logical not
- &gt;  : greater than
- &lt;  : less than
- &gt;= : greater than or equal
- &lt;= : less than or equal</pre>
<p>Any boolean expression should yield the value 0 for false and 1 for true. They<br />
work much the same as the previous expressions, namely with first operand,<br />
operator, second operand. We will usually not store them in new variables<br />
however, but use the result directly in a statement.</p>
<p>A simple example of a boolean expression would be:</p>
<pre>short var;

var == 15;</pre>
<p>The result of this expression is true, or 1, if var equals 15, and false, or 0<br />
if var is any other value than 15.</p>
<p>Another example:</p>
<pre>var1 &amp;&amp; var2;</pre>
<p>This expression results in true if, and only if both var1 and var2 are true<br />
(non-zero).</p>
<pre>var1 || !var2;</pre>
<p>This expression is true when var1 is true and/or var2 is false.</p>
<p>Boolean expressions can also be combined:</p>
<pre>var1 &amp;&amp; !(var2 &gt;= 10 || var3);</pre>
<p>We have only a handful of flow-control statements:</p>
<pre>- if-else
- do-while
- for
- break
- continue
- switch-case
- goto</pre>
<p>The if-statement is simple, but effective&#8230;<br />
&#8216;If (this expression is true) then run this block of code&#8217;. And you can<br />
optionally run an alternative block of code if the expression was not true,<br />
using the else-statement after the first block of code.</p>
<p>Going something like this:</p>
<pre>int var1, var2;

if (var1 == 3)
{
    var2 += 10;
}
else
{
    var1 /= 5;
}</pre>
<p>do and while can be used to run a block of code in a loop, while (&#8216;as long<br />
as&#8217;) the boolean expression is true. do-while() is a special form of while(),<br />
where the expression is checked after the code block is executed, instead of<br />
before, with the normal while(). As a result, the code block is always run<br />
at least once.</p>
<p>As an example, I shall show the code for a power function:</p>
<pre>int mantissa = 5, exponent = 3, result = 1;

while (exponent--)
    result *= mantissa;</pre>
<p>At the end of the loop, result will contain 5^3.<br />
(And as you see, when there&#8217;s only 1 line of code in the loop, there&#8217;s no need<br />
to put it in between the {} brackets).</p>
<p>Now for an example with do-while:</p>
<pre>int result, var1 = 3, var2 = 7;

do
{
    result = var1 * var2;
    var1 += var2;
} while (result &lt; 5000);</pre>
<p>Here you can&#8217;t result before entering the loop, because result is not<br />
initialized yet. So basically it has just the value that the last program left<br />
there when using that piece of memory. Checking its value would be irrelevant.<br />
We could make it an initialized value, but we know that the result is less<br />
than 5000 the first time anyway, so this way, we save 1 check, and we save<br />
the trouble of initializing the result value.</p>
<p>The for-loop is similar to a while-loop, but it has a special construct, where<br />
you can not only specify a boolean expression which must be true to loop, but<br />
you can also initialize some variables before entering the loop, and you<br />
can specify some expressions which will be carried out after each loop. These<br />
expressions are usually used to update the variables that are used for the<br />
loop. It goes like this:</p>
<p>for (&lt;initialize variables&gt;; &lt;boolean expression&gt;; &lt;update expressions&gt;)</p>
<p>Or less abstract (an example which calculates faculty of x):</p>
<pre>int i, x, y;

for (i = 1, x = 5, y = 1; i &lt;= x; i++)
{
    y *= i;
}</pre>
<p>In fact, it can be shortened to this:</p>
<pre>int i, x, y;

for (i = 2, x = 5, y = 1; i &lt;= x; y *= i++);</pre>
<p>(Here we see that when there&#8217;s no code block following a statement, we can<br />
just delimit the line of code, and with that the loop, with a <img src='http://s1.wp.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>The expressions in a for-loop are optional. For an endless loop, you can<br />
simply omit all expressions:</p>
<pre>for (;;)
{
    /* Put code in endless loop here */
}</pre>
<p>Which brings us to the next question&#8230; What if we want to exit a loop such as<br />
this endless loop, when a certain condition occurs?<br />
That&#8217;s where the break statement comes in.</p>
<p>The break statement will exit the current loop, and continue the rest of the<br />
program.<br />
You will usually use this with an if (conditon) break; construction.</p>
<p>A small example:</p>
<p>Let&#8217;s say you&#8217;ve got a program that reacts to the user&#8217;s input (a menu or<br />
something).<br />
It should stay in the loop until the user chooses &#8216;quit&#8217;:</p>
<pre>for (;;)
{
    /* Put code to display the options here */

    if (getUserInput() == QUIT)
        break;
    else
        doSomething();
}</pre>
<p>Question: Do we really need to put doSomething() in an else-statement?</p>
<p>&lt;Author gets some coffee, giving you time to think over the question&gt;</p>
<p>Answer: In this case we don&#8217;t. Notice that when the if is true, it will only<br />
do the break (and therefore breaks out of the loop). When the if is false, it<br />
will do doSomething().</p>
<p>The continue statement is similar to the break statement, but it only exits<br />
the current cycle of the loop and enters the next.</p>
<p>Going something like this:</p>
<pre>int i, n, m;

/* user inputs n */

for (i = 0; i &lt; 100; i++)
{
    m = i % n;

    if (m == 0)
        continue;

    n /= m;
}</pre>
<p>We see here how we can exit the current loop on a certain condition. In this<br />
case we exit if m is 0, to avoid a division-by-zero exception.</p>
<p>But, now back to our menu-example of earlier&#8230;<br />
OK&#8230; so now you&#8217;re probably wondering why you&#8217;d want a menu with only a<br />
quit-option.<br />
Well&#8230; You don&#8217;t.</p>
<p>Now, there are two ways to add new options to the menu.<br />
You could of course do it by adding if&#8217;s:</p>
<pre>int userinput;

for (;;)
{
    /* Put code to display the options here */

    userinput = getUserInput();

    if (userinput == FILE)
        doFile();
    else if (userinput == EDIT)
        doEdit();
    else if (userinput == VIEW)
        doView();
    else if (userinput == QUIT)
        break;
    else doBadInput();
}</pre>
<p>OK, this&#8217;ll work, but it&#8217;s not the ideal solution.<br />
The compiler only sees a couple of if&#8217;s, but it isn&#8217;t able to tell whether<br />
they have anything in common or not (compilers aren&#8217;t that smart yet.),<br />
therefore the code it produces won&#8217;t be as efficient as it could be.</p>
<p>The best way to do it is to use the switch-case statement for this. The switch works<br />
by testing one int variable against several (const) solutions.</p>
<p>The above example menu-example using switch ():</p>
<pre>int userquit;

for (userquit = 0; userquit == 0;)
{
    /* Put code to display the options here */

    userinput = getUserInput();

    switch (userinput)
    {
        case EDIT:
            doEdit();
            break;
        case VIEW:
            doView();
            break;
        case QUIT:
            userquit = 1;
            break;
        default:
            doBadInput();
    }
}</pre>
<p>Some comments on this structure: Firstly, the code looks much better <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Secondly, for the compiler, it&#8217;s very easy to see that the cases are related<br />
to each other, because they all depend on the same variable, by definition.<br />
Hence the compiler can make optimizations that a normal block of if&#8217;s won&#8217;t<br />
allow.</p>
<p>And thirdly: see the break at the end of each case? That is there to stop the<br />
program entering the other cases. You can use this mechanism if you want to<br />
make the program do the same thing for some of the cases. For example suppose<br />
you&#8217;re on a diet that prescribes that you can only eat meat on tuesday,<br />
thursday and sunday, and you want a program to print (when given a day-number<br />
between 1 and 7) if that day is a meat-day?</p>
<pre>/* day 1 is monday */
int day = getUserInt();

switch (day)
{
    case 2:
    case 4:
    case 5:
        puts("Meatday!!");
        break;
    case 1:
    case 3:
    case 6:
    case 7:
        puts("No meat");
        break;
    default:
        puts("This day doesn't exist, no meat for you, pal!");
}</pre>
<p>Let&#8217;s say we enter day = 5. It&#8217;ll start executing from case 5: until it hits<br />
a break.<br />
See how good this looks? You can instantly see which cases do what&#8230;<br />
Much better than:</p>
<pre>if (day == 1 || day = 3 || day == 6 || day == 7)
    puts("No meat");

else if (day &gt;1 &amp;&amp; day &lt;8)
    puts("Meatday!!");

else puts("The day doesn't exist");</pre>
<p>The compiler can&#8217;t really optimize this in terms of speed, but here it just<br />
looks better.<br />
So there!</p>
<p>And now onto our last control flow statement: goto.</p>
<p>This statement can be used to make jumps to and fro in the code. It can be<br />
quite useful in some cases.</p>
<p>The destination will be marked by a label. Labels are defined by a name,<br />
followed by a semicolon:</p>
<pre>myLabel:</pre>
<p>They are useful for exiting multiple nested loops at once, where break cannot,<br />
among other things.</p>
<p>That would look something like this:</p>
<pre>int i, j;

for (i = 25; i &gt;= 0; i--)
    for (j = 0; j &lt;= 25; j++)
            if (i == j * j)
                    goto endOfLoop;

endOfLoop:
/* program continues here */</pre>
<p>And that about wraps up our control-flow. Now on to the next challenge&#8230;</p>
<p>Arrays and pointers<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-</p>
<p>Okay, here comes an interesting part of C. We will get into direct contact<br />
with a part of the machine here, namely the memory.<br />
The variables we used earlier were stored in memory aswell, but we didn&#8217;t get<br />
to see where and how exactly. The compiler took care of that for us, we could<br />
just use the variables by the name and type we had given them.<br />
Now we are going to use linear sets of data, called arrays. And to understand<br />
how they work exactly, we have to look at how the machine uses its memory.</p>
<p>So how does the machine use its memory?<br />
You could picture it as a giant cupboard of drawers, and each drawer has a its<br />
own unique number. In every drawer we can store one byte.<br />
Picture it like this:</p>
<pre>0: [  ]
1: [  ]
2: [  ]
3: [  ]
4: [  ]
5: [  ]
...</pre>
<p>Or perhaps you prefer a horizontal look at it:</p>
<pre>0:  1:  2:  3:  4:  5:  ...
[  ][  ][  ][  ][  ][  ]...</pre>
<p>So to get a byte in memory, all we have to know is the number of its<br />
container. We call this number the address.</p>
<p>It&#8217;s also interesting to look at how we store larger variables in memory.<br />
This is another platform-dependant matter. An int for example is 4 bytes.<br />
Now, there is 2 ways to store those 4 bytes in memory. The way we store multi-<br />
byte numbers into memory, is called Endianness. There is Big Endian, and<br />
Little Endian.</p>
<p>Let&#8217;s say that the 4 bytes of our int are AA, BB, CC and DD, from most<br />
significant part to least significant part.<br />
So our int looks like AABBCCDD.<br />
In Big Endian, we store it like humans write numbers: from left to right, most<br />
significant to least significant, so it will look like this:</p>
<pre>[AA][BB][CC][DD]</pre>
<p>Little Endian stores it from least significant to most significant instead.<br />
So we get the reverse:</p>
<pre>[DD][CC][BB][AA]</pre>
<p>Nearly all systems use the Big Endian byte order these days, it became the<br />
preferred method for new systems in the 80s. It is also the &#8216;network byte<br />
order&#8217;, used on all standard networks, including ofcourse the internet.<br />
But, bear in mind that the original 8086 dates from 1978, when Little Endian<br />
was still used predominantly, so the x86 is one of very few systems around<br />
today, that still uses the now largely obsolete Little Endian method.</p>
<p>So, in short, the address of a multibyte variable will always point to the<br />
most significant byte in Big Endian, and it will always point to the least<br />
significant byte in Little Endian.</p>
<p>Well, what is a pointer? That is simply the address or &#8216;reference&#8217; of a<br />
variable in memory. A pointer has a type, and contains an address, which is<br />
a 32 bit number. So like a normal variable, it actually contains a number, and<br />
it is stored in memory (storage is also affected by the Endianness of the<br />
architecture, as with the normal multibyte variables).</p>
<p>Declaring a pointer of some type is as simple as giving the type, and then the<br />
name, with an asterisk (*) prefixed to it:</p>
<pre>int *myPointer;</pre>
<p>You can use pointers to any of the primitive types, and also to user-defined<br />
types, which we will see later.</p>
<p>But now, how do we assign an address value to it?<br />
One option is to use the &#8216;address-of&#8217; operator, which gets the address of a<br />
variable. It is as simple as prepending &#8216;&amp;&#8217; to the variable name:</p>
<pre>char myChar = 25;</pre>
<p>&#8216;&amp;myChar&#8217; will then resolve to the address of the char in memory.</p>
<p>Assigning a value to a pointer works the same as with the other variables:</p>
<pre>char *myPointer, myChar = 25;

myPointer = &amp;myChar;</pre>
<p>Now myPointer contains a reference to myChar. There is also a dereferencing<br />
operator in C, this will get the value of the variable at the address that is<br />
being pointed to. It works by prepending a &#8216;*&#8217; to the variable name. You could<br />
say that it is the opposite of the &amp;-operator. The &amp;-operator turns a variable<br />
into a pointer, and the *-operator turns a pointer into a variable.</p>
<p>So we could store the char that myPointer is pointing to, back into a char<br />
like this:</p>
<pre>char *myPointer, myChar = 25, newChar;

myPointer = &amp;myChar;
newChar = *myPointer;</pre>
<p>Now newChar has a value of 25;</p>
<p>We could also assign a value to an address by dereferencing a pointer. Take a<br />
careful look at this:</p>
<pre>char *myPointer, myChar;

myPointer = &amp;myChar;
*myPointer = 25;</pre>
<p>We store 25 at the address that myPointer is pointing to. But, pay attention<br />
here! Look what happened&#8230; myPointer contained the address of myChar. So by<br />
storing 25 to myPointer, we stored it into the address of myChar, and<br />
therefore myChar now has the value of 25!</p>
<p>Now from pointers on to arrays&#8230;</p>
<p>An array in the field of programming is basically what the word means: an<br />
array, a row.<br />
A row of variables of the same type, more specifically, stored in one linear<br />
piece of memory. You can picture it like this:</p>
<p>An array of 6 chars (1 byte), starting at address 521:</p>
<pre>Address: 521: 522: 523: 524: 525: 526:
Index:   [ 0] [ 1] [ 2] [ 3] [ 4] [ 5]</pre>
<p>Note that we start indexing by 0, not 1. So, the array of 6 chars has indices<br />
0 to 5 for all the elements there. More generally speaking:</p>
<p>N elements of an array are indexed from 0 to (N-1)</p>
<p>Also note that we can get the address of each individual element by:</p>
<pre>starting address + index</pre>
<p>Let&#8217;s look at arrays of bigger types. For example, an array of long ints.</p>
<p>An array of 4 long ints (4 bytes), starting at address 64:</p>
<pre>Address: 64:   68:   72:   76:
Index:   [   0][   1][   2][   3]</pre>
<p>We see here that the correct addressing formula for all types is:</p>
<pre>starting address + (index*sizeof(type))</pre>
<p>Incidently, sizeof(type) is recognized by C. If we put:</p>
<pre>sizeof(int)</pre>
<p>it will evaluate to a value of 4, which is the number of bytes in an int.<br />
More generally speaking, sizeof(x) will evaluate to the total number of bytes<br />
of memory used by x.<br />
This is applicable to all primitive and user-defined types, and as we will<br />
see later, also to arrays and structures.</p>
<p>How do we define arrays and use arrays in C?</p>
<p>There&#8217;s basically 2 variations&#8230; We have statically allocated arrays and<br />
dynamically allocated arrays.</p>
<p>First we will look at the statically allocated ones. They are defined much<br />
like a single variable: type, name. But after the variable name we put the<br />
number of elements we want, in square brackets []. Looking like this:</p>
<pre>char myArray[256];</pre>
<p>Now, we have set up an array of 256 (uninitialized) elements. myArray is the<br />
pointer to the first element in the array.</p>
<p>To access an element in the array, you can simply use the subscript operator<br />
([]), as it is called: name[index].<br />
For example, we want to test whether element 5 equals 25.<br />
You can manipulate myArray[5] like a normal char, so we can just create a<br />
normal boolean expression with it:</p>
<pre>if (myArray[5] == 25)
{
    /* Do something */
}</pre>
<p>Assigning values and doing operations on array elements are also done like<br />
we&#8217;ve seen before.</p>
<p>Here&#8217;s a small example that multiplies every element of an array by 23:</p>
<pre>short numbers[36], i;

for (i = 0; i &lt; (sizeof(numbers)/sizeof(numbers[0])); i++)
{
    numbers[i] *= 23;
}</pre>
<p>Lets also take a closer look at this expression:</p>
<pre>(sizeof(numbers)/sizeof(numbers[0]));</pre>
<p>This yields the size of the array. So i will go from 0 to 36. Namely, what<br />
happens here, is this:</p>
<p>sizeof(numbers) will give use the total amount of memory used for the array<br />
numbers. This will be in bytes. But a short int is 2 bytes, so we need to<br />
correct for that.</p>
<p>sizeof(numbers[0]) will give us the size of element 0 of the array. Element 0<br />
is 1 short int, so this will give us 2 bytes (ofcourse all elements in the<br />
array are equal size).</p>
<p>So if we divide the two, we get this:</p>
<pre>(sizeof(numbers)/sizeof(numbers[0])) = 72/2 = 36</pre>
<p>Which is our arraysize. This little trick is very useful. If for example we<br />
would change the type of the array to long int later, this line can remain<br />
untouched, as it would still yield the correct arraysize. And if we decide to<br />
change the size of the array to say 50, then all we have to do is to change<br />
the definition of the array:</p>
<pre>short numbers[50];</pre>
<p>and the for-loop will still loop through all elements.</p>
<p>It&#8217;s also possible to put initialized data into an array. In that case, the<br />
size does not need to be specified, because the compiler can derive that from<br />
the number of elements it needs to add.</p>
<p>We give a list of elements, in {} brackets, and separate each element with a<br />
comma:</p>
<pre>int myArray[] = { 10, -642342, 12321, 213122, 1231 };</pre>
<p>For text, there is a special array definition. Text is stored as an array of<br />
chars, terminated by a 0. For example:</p>
<pre>char myText[] = {'H','e','l','l','o',0};</pre>
<p>But, C provides a special construct for such text strings. The following is<br />
equivalent:</p>
<pre>char myText[] = "Hello";</pre>
<p>The 0-terminator will be appended automatically.<br />
This construct will also allow you to append strings together. If you put 2<br />
or more strings after one another, they will be appended to form 1 string.<br />
This makes it possible to split long strings up in multiple lines, or even<br />
add comments inbetween. A few examples:</p>
<pre>char myText[] = "Hi" " there!" " How are you doing?";

char myText[] = "Hello, "
                "how are you?";

char myText[] = "This is version "
/* edit this */ "0.1 beta"
                " of the software";</pre>
<p>It&#8217;s also possible to initialize some elements, and specify a size. This can<br />
be useful when for example only the first element needs to be 0 (for a zero-<br />
terminated list, for example):</p>
<pre>short myArray[32] = { 0 };</pre>
<p>myArray is a pointer, in that it evaluates to a memory address, but it&#8217;s a bit<br />
different from the ones we&#8217;ve seen earlier. Namely, this pointer is not stored<br />
in memory, but the address is a constant rather, which you can use, but not<br />
modify. The pointers we&#8217;ve seen earlier, are stored in memory, and act like<br />
variables. You can also use operations on them.</p>
<p>For example, we want to fill an array with all powers of 7:</p>
<pre>unsigned int myArray[256], *myPointer, i, power = 1;

myPointer = myArray;

for (i = 0; i &lt; (sizeof(myArray)/sizeof(myArray[0])); i++)
{
    *myPointer++ = (power *= 7);
}</pre>
<p>Now, a few notes here&#8230; First, this part:</p>
<pre>*myPointer++</pre>
<p>The ++ postfixed to myPointer basically does what you would expect it to do.<br />
The operation is performed, and after that, the variable is increased.<br />
In this case, the operation is a dereference. So we assign a value to the<br />
address that myPointer points to, and then we increase its value by 1.<br />
&#8220;1 what?&#8221; you may ask. The answer to that, is &#8220;1 element&#8221;.<br />
If you want to increase the pointer by say 6 elements, then you can just add<br />
6 to it, like this:</p>
<pre>myPointer += 6;</pre>
<p>Basically, you can do any operation on it, even stuff like adding 2 pointers.<br />
Note however, that it adds 6 elements. The actual address that myPointer<br />
contains, will be increased by 6 * sizeof(type), so in this case, that is<br />
6 * 4 = 24. We will look into that some more, when we get to typecasts.</p>
<p>Now to the second part:</p>
<pre>(power *= 7);</pre>
<p>This is a nice shorthand notation. The statement in brackets is executed, and<br />
its value is evaluated, and can be assigned to our array-element.<br />
So, power gets multiplied by 7, and the new value of power is then assigned<br />
to *myPointer.</p>
<p>On to dynamically allocated arrays.</p>
<p>Dynamically allocated arrays can be useful when you don&#8217;t know the size of<br />
the array beforehand, because it is dependant on user input, for example.<br />
Or, when you don&#8217;t need the array for the duration of the entire program, and<br />
you would like your memory deallocated after you&#8217;re done with it.</p>
<p>For the allocation of memory, we have the following function:</p>
<pre>void *malloc( size_t size );</pre>
<p>Well, we see here that it returns a void *. A type we haven&#8217;t seen before.<br />
Basically, it is a typeless pointer, and it cannot be dereferenced, since we<br />
don&#8217;t know the type, and therefore we don&#8217;t know how large an element would<br />
be. But, we assign its value to a pointer of the type of the array that we<br />
need, so there is no trouble. Assigning the value of a variable of one type<br />
to the value of another type, is called typecasting. Let&#8217;s look deeper into<br />
that before we continue.</p>
<p>Typecasting<br />
&#8212;&#8212;&#8212;&#8211;</p>
<p>This is a very simple and short subject&#8230; Typecasting a variable is<br />
basically forcing the compiler to interpret the data of the variable as if<br />
it were of another type. This is done by putting the desired type before the<br />
variable (or expression), in brackets, looking like this:</p>
<pre>(int)myVariable;</pre>
<p>Here&#8217;s an example on how a cast can affect variables:</p>
<pre>short i;
char j = -1;

i = (unsigned char)j;</pre>
<p>Now, instead of what you would expect to happen, i is not -1 now, but it is<br />
255. What happened is this: j is cast to an unsigned char. The bitpattern for<br />
-1 is 11111111 in 2s complement notation. Now, we interpret that same bit<br />
pattern as if it were not signed. In that case, 11111111 is equal to 255.<br />
And that is the value assigned to i.</p>
<p>You could also use a cast to add a certain amount of bytes to a pointer,<br />
instead of a certain amount of elements. A char is 1 byte, so we could cast<br />
our pointer to a char pointer temporarily, and add the number of bytes we<br />
want.</p>
<p>For example:</p>
<pre>int *myPointer, myBytes;

(char *)myPointer += myBytes;</pre>
<p>We temporarily change the type of myPointer to char *, then we add the number<br />
of bytes we want, which is stored in myBytes in this case. Afterwards,<br />
myPointer will be an int * again, since a cast is only temporary.</p>
<p>In most cases, a cast is implicit. For example, void * can always be cast to<br />
other pointer types. In some cases tho, the compiler might give a warning,<br />
or you want to change the behavior to that of another type temporarily.<br />
That&#8217;s when you use a cast.</p>
<p>Now, back to the malloc() function&#8230;</p>
<p>There was this other new thing&#8230; the size parameter has type size_t&#8230; That&#8217;s<br />
odd&#8230; we haven&#8217;t seen size_t in the variable types.<br />
Well, this is because size_t is not a primitive type, but a user-defined one.<br />
Basically, it&#8217;s just a primitive type, but given another name, so that it<br />
makes more sense when reading the source.</p>
<p>When we search through the header files (we will look at header files more<br />
closely lateron), we find that size_t is defined in a file called Stddef.h,<br />
by the following line:</p>
<pre>typedef unsigned int size_t;</pre>
<p>So, size_t is basically nothing but an unsigned int.<br />
The typedef directive is followed by the primitive type, and then a list of<br />
new names to be used as variables of that type, separated by comma&#8217;s.<br />
For example:</p>
<pre>typedef unsigned int colour, serial, uint;</pre>
<p>Now you can declare variables like this:</p>
<pre>colour red, green, blue;
serial nr1, nr2, nr3;
uint a, b, c;</pre>
<p>An interesting application for these typedefs is portability. You could write<br />
a program that uses only user-defined types, and to port it from one system<br />
to another, all you would have to do is adapting the type-definitions to the<br />
new architecture.</p>
<p>For example, you need a 32 bit signed integer type, and on architecture X, you<br />
would need a long int, and on architecture Y, you would need an __int32.<br />
You would choose a usertype to represent the 32 bit signed integer, let&#8217;s take<br />
sint32 in this example.</p>
<p>Then all you have to do to make it work correctly on architecture X, is this:</p>
<pre>typedef long int sint32;</pre>
<p>And to make it work on architecture Y, you would use:</p>
<pre>typedef __int32 sint32;</pre>
<p>Then there&#8217;s also the compound datatype in C, the data structure. Structures<br />
contain a set of variables, and can be very useful for adding logic and<br />
structure to your program. Namely, when you deal with real-world entities for<br />
example, and they have certain attributes, you can group them together into<br />
1 compound datatype.</p>
<p>Let&#8217;s say you want to store data about cars. For example, you want to store<br />
model, year and colour of a car.<br />
First you use the &#8216;struct&#8217; keyword. Then you give your type a name. And then<br />
you group some types together, and give them names, just like you would do<br />
with separate variables.<br />
Put them in between {} brackets.</p>
<p>Looking like this:</p>
<pre>struct Car { unsigned int model; unsigned short year; unsigned int colour; };</pre>
<p>Then when you want to define a variable of this type, you have to mention that<br />
it is a structure, by using the struct keyword.<br />
Looking like this:</p>
<pre>struct Car myCar;</pre>
<p>You can also initialize the struct, which would look like this:</p>
<pre>struct Car myCar = { 911, 1989, 500 };</pre>
<p>You could make a type of &#8216;struct car&#8217;, by using a typedef. This would save you<br />
from typing &#8216;struct&#8217; before each new variable definition.<br />
So you would define a type like this:</p>
<p>typedef struct Car sCar;</p>
<p>And then define your variable like:</p>
<pre>sCar myCar = { 911, 1989, 500 };</pre>
<p>The most convenient way is to combine the struct definition with the typedef.<br />
You don&#8217;t have to give the struct a name then, since you won&#8217;t be needing the<br />
name of the actual struct, but only the name of the newly defined type.<br />
Only if you would define a variable of the same type inside the struct, since<br />
then the new name is not known until the typedef, which takes place after the<br />
struct definition.</p>
<p>The entire line would look like:</p>
<pre>typedef struct { unsigned int model; unsigned short year; unsigned int colour; } Car;</pre>
<p>If you would reference a struct inside a struct, you would do this:</p>
<pre>typedef struct tagSelf ( struct tagSelf next; } Self;</pre>
<p>You use a temporary name, or &#8216;tag&#8217;, to be able to have the struct reference to<br />
itself.</p>
<p>It is also common to write each member of the struct on a new line, to<br />
increase readability:</p>
<pre>typedef stuct {
    char *processor;
    unsigned int memory;
    unsigned int diskspace;
} Computer, *pComputer;</pre>
<p>Note also that we defined 2 names here, Computer and *pComputer.<br />
*pComputer is a dereferenced pointer, hence the *.<br />
This automatically leads pComputer to be a pointer to a Computer struct<br />
(the &#8216;p&#8217; stands for pointer. For improved readability, sometimes variable<br />
names are prefixed with abbreviations of their type, like this &#8216;p&#8217; for<br />
pointer. This is called Hungarian notation. The inventor was a Microsoft<br />
programmer from Hungary, by the name of Charles Simonyi. The code looked like<br />
some weird foreign language at first sight, and since Simonyi was Hungarian<br />
by birth, they decided to call this convention Hungarian. To this day, all<br />
Microsoft code uses this notation. It can be very convenient.)</p>
<p>So basically we combined this line:</p>
<pre>typedef (Computer *) pComputer;

together with the definition of the Computer structure itself.</pre>
<p>To access members in a struct, we use the dot (.) operator. The syntax is:</p>
<pre>struct.member</pre>
<p>This will resolve to a &#8216;normal&#8217; variable, which can be used in expressions<br />
just as usual.</p>
<p>An example:</p>
<pre>Computer myComputer;

myComputer.processor = "MC68000";
myComputer.memory = 1048576;
myComputer.diskspace = 30234234;</pre>
<p>When you would have a pointer to a struct, you could do this:</p>
<pre>pComputer myComputer;

(*myComputer).processor = "MC7400";</pre>
<p>But, there is a special arrow (-&gt;) operator for pointers to structs, which is<br />
the preferred method:</p>
<pre>pComputer myComputer;

myComputer-&gt;memory = 655360;</pre>
<p>It is also possible to create arrays of structs, and even initialize them.<br />
For example:</p>
<pre>Computer myComputers[2] = { { "MC68000", 1048576, 30234234 },
                            { "Pentium", (48*1048576), (3072*1048576) } };</pre>
<p>The sizeof operator also works on user-defined types and structures, as I said<br />
earlier. So sizeof(Computer) will return the combined size of all the members<br />
of the Computer struct:</p>
<pre>sizeof(char *)
sizeof(unsigned int)
sizeof(unsigned int) +
----------------------
sizeof(Computer)</pre>
<p>In other words:</p>
<pre> 4
 4
 4 +
----
12</pre>
<p>sizeof(myComputers) would return the total size of the array, which will be<br />
2 * 12 = 24 bytes, in our example.</p>
<p>There is another type, very similar to the struct. That&#8217;s the union. A union<br />
is used exactly like a struct, but with one difference: instead of having<br />
all members, you can pick one member, and use that.<br />
An example should clarify it:</p>
<pre>union Number { int i; double d; };

union Number myNumber;</pre>
<p>Now if we want to use this variable, we can use either the int:</p>
<pre>myNumber.i = 245;</pre>
<p>or the double:</p>
<pre>myNumber.d = 26.32345;</pre>
<p>You can also use the typedef-combinations like we saw earlier with the struct:</p>
<pre>typedef union { int i; double f; } Number;

Number myNumber;</pre>
<p>The union will always take up as much memory as necessary for the largest<br />
member. sizeof(&lt;union&gt;) will also return that size.<br />
sizeof(&lt;union.member&gt;) will return the size of the type of that member.</p>
<p>That about covers the user-defined types, now back to our malloc() call.<br />
We now know that we can simply specify the number of bytes as an argument of<br />
the malloc() function, and we get a void * back.</p>
<p>So, we can have it cast implicitly to a pointer, and we have a pointer of the<br />
type we want, to a piece of memory of the size we want.</p>
<p>This can be useful to dynamically allocate variables, and keep the memory<br />
usage under control. We allocate an array when we need it, and deallocate it<br />
again, when we&#8217;re done with it, and save the memory for other uses and<br />
applications.</p>
<p>For example, to allocate 1 car structure dynamically, we can do this:</p>
<pre>Car *myCar;

myCar = malloc(sizeof(Car));
</pre>
<p>We can also allocate an array dynamically&#8230; simply by multiplying the size of<br />
1 element by the number of elements we want.<br />
For example, an array of 25 ints:</p>
<pre>int *myArray;

myArray = malloc(25*sizeof(int));</pre>
<p>Now, to use this array, we can simply apply the subscript operator to the<br />
pointer, just like with the statically allocated arrays:</p>
<pre>myArray[20] = -15;</pre>
<p>To deallocate the memory again, we can use the free(void *) function. As you<br />
can see, it takes a pointer as its argument. Since it&#8217;s a void *, any pointer<br />
will be implicitly cast, so we can just feed it any type of pointer directly.</p>
<p>To deallocate our int-array, we simply type:</p>
<pre>free(myArray);</pre>
<p>and our memory is regained.</p>
<p>Our first program<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</p>
<p>Well, we now covered just about everything that C can do. Now, let&#8217;s look at<br />
how we create programs with the language constructs we&#8217;ve seen. How to group<br />
it together to a sourcefile, and how to create a binary executable from it.</p>
<p>A sourcefile is a simple ASCII file, and can be written with any texteditor<br />
you like.<br />
There are 2 types of sourcefiles in C:</p>
<pre>- normal sourcecode, text files usually with the .c extension.
 - headers, text files usually with the .h extension.
</pre>
<p>There is no physical difference between .c and .h files. Both can contain any<br />
form of C statements, directives and code. It&#8217;s more of a habit of the C<br />
programmers to make the distinction, etiquette rather than syntax.</p>
<p>Headers are a special kind of sourcefiles, which usually come with code<br />
libraries. They contain the necessary function prototypes, type definitions,<br />
constants and macros for use with the library. They don&#8217;t normally contain<br />
code.</p>
<p>When you use code from a library, you include its header file in the source:</p>
<pre>#include "header.h"</pre>
<p>The compiler will search the current directory for header.h first, and if it&#8217;s<br />
not found there, it will continue to search the path specified by the INCLUDE<br />
environment variable.</p>
<p>There is also this form:</p>
<pre>#include &lt;header.h&gt;</pre>
<p>This will not search the current directory, but will start with the INCLUDE<br />
path immediately.<br />
To speed up compilation, use the &lt;&gt; form whenever possible, to prevent the<br />
compiler from searching too much.</p>
<p>Right, now for our first real program, the infamous &#8220;Hello world&#8221; example&#8230;</p>
<p>We use the puts() function to print a null-terminated string to the console<br />
output stream.<br />
Your API reference will tell you which header and which library to use.<br />
On *nix systems, you can type &#8220;man puts&#8221;, and on most other systems, there<br />
will be some help function implemented into the IDE, where you can search for<br />
&#8220;puts&#8221;.<br />
You will find that we need the stdio.h (&#8216;standard I/O&#8217;)header file, so we write:</p>
<pre>#include &lt;stdio.h&gt;</pre>
<p>This will basically paste the code from stdio.h into your current source file<br />
at the place of the #include statement.</p>
<p>A C program starts execution from the main() function, which is defined to<br />
return an int, which will be the exit-code to the OS.<br />
There are 2 versions:</p>
<pre>- no arguments: int main(void)
- Commandline passed from the OS: int main(int argc, char *argv[])</pre>
<p>argc is the argument-count, the number of commandline parameters passed to the<br />
program.<br />
argv is an array of null-terminated strings, of size argc.<br />
Note that argv[0] is the program name itself, so if for example you had a<br />
commandline like this:</p>
<pre>myprogram one two three</pre>
<p>Then you would get argc = 4, and:</p>
<pre>argv[0] = "myprogram"
argv[1] = "one"
argv[2] = "two"
argv[3] = "three"</pre>
<p>For our &#8220;Hello world&#8221; example, we could use the main(void) version, since we<br />
do not require any commandline parameters to be passed to this program.</p>
<p>(Note: The main() function is called by a piece of code known as the &#8216;stub&#8217;.<br />
The stub contains the raw entrypoint of the executable, and sets up the<br />
environment for running a C program (such as parsing the commandline,<br />
setting up standard input and output streams, and the OS environment<br />
variables). It then calls your main() function. When main() returns, the<br />
stub will clean up the environment again, and pass the return value of main()<br />
back to the OS and exit.)</p>
<p>Actually, I already gave the code for the program as an example for the<br />
functions.</p>
<p>The entire program would look like this:</p>
<pre>#include &lt;stdio.h&gt;

int main(void)
{
    /* Print some text to the screen, using a library function */
    puts("Hello world!");

    /* Exit function with return value */
    return 0;
}</pre>
<p>Save it to a text file called &#8220;hello.c&#8221;, and after discussing the compiling<br />
process, we will make an executable out of it.</p>
<p>Now that we have our first source, it is time to compile it to a running<br />
program. This is a bit problematic to explain, since each compiler has its own<br />
commandline options and special behavior. I will just tell a bit about the<br />
process of compiling and linking, so you will know what to look for in the<br />
documentation of your C compiler.</p>
<p>There are several levels in the transition from C source code to a binary<br />
executable.<br />
Technically, we have these levels:</p>
<pre>4. C source code
3. Instruction listing (assembly source)
2. Linkable code object
1. Executable image</pre>
<p>A compiler will take us from level 4 to 3. Then an assembler will take us from<br />
3 to 2, and finally, a linker will take us from 2 to the final level of 1.</p>
<p>But in practice, we find that compilers will go from 4 to 2 immediately these<br />
days, and use a bytecode format internally, rather than a true assembly<br />
instruction listing. They often still have the option to output an assembly<br />
listing tho, should the programmer so require.<br />
And compilers automatically invoke the linker these days, so that we can go<br />
from C source code to an executable in one go.<br />
Compiling and linking will always be separate steps tho, since you may have<br />
some precompiled code which you want to link to the newly compiled code, such<br />
as imported library functions, or you might want to create some precompiled<br />
libraries, so you will not link them to a binary executable at all.</p>
<p>So, I can&#8217;t really help you with the commandline options for your compiler.<br />
But now that you know what the compiling process is about, you should be able<br />
to figure it out yourself. I can give you 2 examples tho, for both Microsoft&#8217;s<br />
compiler and the GNU C Compiler (gcc).</p>
<p>Microsoft Compiler:</p>
<pre>CL hello.c</pre>
<p>This will compile hello.c into hello.obj, and then link in the necessary<br />
standard C libraries. CL.EXE invokes LINK.EXE itself. It has libc.lib as a<br />
standard library, so you do not need to specify any libraries for standard C<br />
functions. You only need to specify libraries for non-C functions, like<br />
Windows API or third party libraries.<br />
CL will give the executable the same name as the source file which contains<br />
the main() function, so in our case, we will get hello.exe as a filename.<br />
In this case, this line is all you need to have hello.exe generated.<br />
if you need other libraries or objects, you can simply add them to the<br />
commandline: CL hello.c blah.obj foo.lib<br />
So, try to execute your first program now!</p>
<p>GNU C compiler:</p>
<pre>gcc -o hello hello.c</pre>
<p>This compiler also has a C library as standard, so no need to specify it. The<br />
-o switch specifies the name of the executable. If no name is given, then it<br />
defaults to a.out instead.</p>
<p>Some example programs<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;</p>
<p>Here is a small program that I think will be useful. It will show you how to<br />
do some simple interaction with the user. It will also allow you to play with<br />
some bitwise operations, and make you get used to playing with hexadecimal<br />
values. This might come in handy lateron, as hexadecimal numbers and bitwise<br />
operations can be quite useful in speeding your code up.</p>
<p>We will see puts() here again. The puts() function puts a string onto the<br />
standard output stream (stdout), which is normally the screen. And I say<br />
&#8216;normally&#8217; here, because streams may be redirected to other devices, such as<br />
printers for example. We will also see gets(), which can read a line from the<br />
standard input stream (stdin), which is normally the keyboard input.<br />
It will put this line into a char array. The user must pass a pointer to this<br />
array to the gets() function.<br />
The getchar() function is similar, but it returns the first character from the<br />
stdin stream immediately, and does not require a buffer to be passed.</p>
<p>We also get to meet fflush(), which &#8216;flushes&#8217; a file stream. The stdin and<br />
stdout streams are considered as files aswell in C, so we can use it on these<br />
streams aswell. In our case we flush the stdin stream, to remove any<br />
superfluous input that we would rather ignore.</p>
<p>The last function we get to meet, is printf(). This is a function to print<br />
formatted text (hence the &#8216;f&#8217;) onto stdout. It is quite an interesting<br />
function, because you can pass as many arguments as you wish. It has ellipsis<br />
(&#8230;) in its prototype, to make this possible. I will give an example of how<br />
to use them later.<br />
The printf takes a format string as its first argument, which will basically<br />
define what to do with the next arguments. How to format these, to be more<br />
specific.<br />
It will use a %, followed by some formatting rules, at the place where you<br />
want the respective argument to be printed.<br />
In our case we will see %s, which will print the argument as a string.<br />
So for example, you could do this:</p>
<pre>char name[] = "Jopie";

printf("Hello %s!", name);</pre>
<p>This will result in:</p>
<pre>Hello Jopie!</pre>
<p>We will also see %d, which will print a signed decimal number. For unsigned<br />
numbers, there is also %u.</p>
<p>And lastly we will see %08X&#8230; This formatting is a bit more advanced.<br />
The 0 indicates that we want 0s prefixed to our number. The 8 indicates that<br />
we want to have 8 digits in total. And the X tells printf() to print the<br />
argument as a hexadecimal number, with uppercase alphas. There is also<br />
%x for lowercase alphas, but I prefer uppercase.</p>
<p>There are a lot more options with printf(), but it would be a bit much to<br />
cover them all here. I suggest you use your reference for that instead.<br />
I will leave you with one last example, then we go to the actual program:</p>
<pre>unsigned int age = 16;
char name[] = "Jopie";

printf("Congratulations %s, today is your %uth birthday!");</pre>
<p>Result of this would be:</p>
<pre>Congratulations Jopie, today is your 16th birthday!</pre>
<p>Note also that puts() always does a newline after printing the string, while<br />
printf() does not. So in some cases, you might want to use printf() instead of<br />
puts() to avoid that newline. Be careful though, if your string contains %,<br />
printf() will see this as formatting. A workaround for this is using %c, which<br />
prints 1 character:</p>
<pre>printf("I can use %c in my strings with printf()", '%');</pre>
<p>Anyway, here is the program:</p>
<pre>#include &lt;stdio.h&gt;

char name[32];
int x = 1, y = 1;

/* Print menu, allow the user to choose an option, and process that option.
   Returns 0 if user chose to quit, else 1 */
int doMenu()
{
    char buffer[32];

    printf( "\nHello %s, here is the menu for today:\n\n"
            "1) Enter X\n"
            "2) Enter Y\n"
            "3) X + Y\n"
            "4) X - Y\n"
            "5) X * Y\n"
            "6) X / Y\n"
            "7) X &amp; Y\n"
            "8) X | Y\n"
            "9) X ^ Y\n"
            "A) ~X\n"
            "B) ~Y\n"
            "C) Quit\n\n"
            "Current X = %d\n"
            "Current Y = %d\n\n"
            "What is your choice? ",
            name, x, y);

    /* remove additional input from stdin stream */
    fflush(stdin);

    /* Get a character from stdin stream and process the command */
    switch(getchar())
    {
        case '1':
            /* remove additional input from stdin stream */
            fflush(stdin);      

            puts("Please enter new value for X.");
            x = atoi(gets(buffer));
            break;
        case '2':
            /* remove additional input from stdin stream */
            fflush(stdin);      

            puts("Please enter new value for Y.");
            y = atoi(gets(buffer));
            break;
        case '3':
            printf("%d + %d = %d\n", x, y, x + y);
            break;
        case '4':
            printf("%d - %d = %d\n", x, y, x - y);
            break;
        case '5':
            printf("%d * %d = %d\n", x, y, x * y);
            break;
        case '6':
            if(y == 0)
                puts("Y = 0. Division is not defined.");
            else
                printf("%d / %d = %d\n", x, y, x / y);
            break;

        /* The following are bitwise operations, let's also print out
           the hexadecimal representations for clarity */
        case '7':
            printf("%d &amp; %d = %d || "
                   "%08X &amp; %08X = %08X\n",
                   x, y, x &amp; y,
                   x, y, x &amp; y);
            break;
        case '8':
            printf("%d | %d = %d || "
                   "%08X | %08X = %08X\n",
                   x, y, x | y,
                   x, y, x | y);
            break;
        case '9':
            printf("%d ^ %d = %d || "
                   "%08X ^ %08X = %08X\n",
                   x, y, x ^ y,
                   x, y, x ^ y);
            break;
        case 'a':
        case 'A':
            printf("~%d = %d || "
                   "~%08X = %08X\n",
                   x, ~x,
                   x, ~x);
            break;
        case 'b':
        case 'B':
            printf("~%d = %d || "
                   "~%08X = %08X\n",
                   y, ~y,
                   y, ~y);
            break;
        case 'c':
        case 'C':
            /* The user wants to leave, we return 0, to break out of the
               while() loop in main() */
            puts("Thanks, and have a nice day.");
            return 0;
        default:
            /* Since we have arrived here, apparently none of the valid
               choices were reached */
            puts("Invalid choice.");
            break;
    }

    printf("Press enter to continue.");
    getchar();

    return 1;
}

int main(void)
{
    /* Input the username */
    puts("What is your name?");
    gets(name);

    /* Keep returning to menu until user chooses to quit */
    while(doMenu());

    return 0;
}</pre>
<p>This might be a good time to greet some people.<br />
First of all, ofcourse my Diamond Crew people, ewald and Maybird.<br />
And ofcourse also all my Phrozen Crew mates, you know who you are <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
And the #Win32ASM guys, hutch, llama, Iczelion, nuu, dowap (or whatever <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> ,<br />
and the rest.<br />
Ofcourse Kalms, and last but not least, the ladies (in no particular order,<br />
you know how women are <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> .<br />
Sara`, Tracy, blorght, MoonDawn, Nitallica, CandyII, jessca, embla, Baudie,<br />
flipgrrrl, and yes, even taylor^ <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>X-Calibre</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scalibq.wordpress.com/573/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scalibq.wordpress.com/573/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scalibq.wordpress.com/573/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scalibq.wordpress.com/573/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scalibq.wordpress.com/573/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scalibq.wordpress.com/573/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scalibq.wordpress.com/573/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scalibq.wordpress.com/573/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scalibq.wordpress.com/573/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scalibq.wordpress.com/573/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scalibq.wordpress.com/573/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scalibq.wordpress.com/573/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scalibq.wordpress.com/573/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scalibq.wordpress.com/573/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=573&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scalibq.wordpress.com/2012/02/09/a-tutorial-on-programming-c/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2547913ebf910c8aa2c632619be46e93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">scalibq</media:title>
		</media:content>
	</item>
		<item>
		<title>Porting BHM3DSample to Android: Some&#8230; well&#8230; a lot of&#8230; stressful development</title>
		<link>http://scalibq.wordpress.com/2012/02/04/porting-bhm3dsample-to-android-some-well-a-lot-of-stressful-development/</link>
		<comments>http://scalibq.wordpress.com/2012/02/04/porting-bhm3dsample-to-android-some-well-a-lot-of-stressful-development/#comments</comments>
		<pubDate>Sat, 04 Feb 2012 17:38:35 +0000</pubDate>
		<dc:creator>Scali</dc:creator>
				<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[Software development]]></category>
		<category><![CDATA[Android]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[horrible]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[JNI]]></category>
		<category><![CDATA[NDK]]></category>
		<category><![CDATA[nightmare]]></category>
		<category><![CDATA[OpenGL ES]]></category>
		<category><![CDATA[SDK]]></category>
		<category><![CDATA[smartphone]]></category>

		<guid isPermaLink="false">http://scalibq.wordpress.com/?p=566</guid>
		<description><![CDATA[As you may know, I have ported my OpenGL rendering framework to iPhone a few months ago. Originally I was not all that interested in Android, since it apparently uses Java for its apps. While it has OpenGL ES support &#8230; <a href="http://scalibq.wordpress.com/2012/02/04/porting-bhm3dsample-to-android-some-well-a-lot-of-stressful-development/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=566&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>As you may know, I have <a title="Porting BHM3DSample to iPhone: some Objective-C++ and OpenGL ES development" href="http://scalibq.wordpress.com/2011/09/07/porting-bhm3dsample-to-iphone-some-objective-c-and-opengl-es-development/">ported my OpenGL rendering framework to iPhone</a> a few months ago. Originally I was not all that interested in Android, since it apparently uses Java for its apps. While it has OpenGL ES support in Java, it would require me to rewrite the entire framework from C++ to Java. With the iPhone I had a similar situation where Apple wants you to write things in Objective-C. However, as it turned out, there was a way to use regular C/C++ on the iPhone, so in the end I could just use most of my OpenGL framework as-is.</p>
<p>Looking a bit further into Android, I found that Android not only has the <a href="http://developer.android.com/sdk/index.html">SDK for Java-based applications</a>, but also an <a href="http://developer.android.com/sdk/ndk/index.html">NDK, for more low-level things</a>. The NDK allows you to use C/C++ and even assembly. Now that&#8217;s more like it! So I checked out the NDK, and spotted an example using OpenGL ES from C++. Exactly what the doctor ordered!</p>
<p>Things quickly went downhill from there though&#8230; Where I was used to having some a proper IDE for the iPhone and for J2ME, the Android SDK and NDK had a distinct 1970s feel to them: just a set of headers, libraries and some commandline tools slapped together. There was not too much in the way of documentation. It&#8217;s mostly learn-by-example, apparently. It didn&#8217;t look good. What&#8217;s worse, there are a few snafu&#8217;s in the SDK and NDK. For example, the Windows version of the SDK installs itself in your Program Files by default. However, writing to Program Files requires administrator rights. But the SDK manager is not installed to have these rights by default. So if you start it, and try to download and install any packages, it will fail to write them. And the NDK has some build/make scripts, which can&#8217;t handle spaces in pathnames. So if you put the NDK in Program Files together with the SDK, it won&#8217;t work.</p>
<p>Google does offer an <a href="http://developer.android.com/sdk/eclipse-adt.html">Android plugin for Eclipse</a> though, so I figured I&#8217;d give that a try. I never liked Eclipse, as I&#8217;ve always found it to be slow, cumbersome and generally unstable. For my Java development (including J2ME) I generally use <a href="http://netbeans.org/">Netbeans</a>. But well, this time I had no choice, so Eclipse it is&#8230;</p>
<p>Eclipse did not disappoint&#8230; It was still as slow and cumbersome as I remembered it&#8230; and from time to time it would also crash&#8230; One time it even corrupted the workspace, so it would hang the next time it tried to open it. All I could do was delete the metadata in the workspace and start over.</p>
<p>However, at least the Android plugin was reasonably nice. At least I had a simple tool to build and deploy Android apps in the emulator. Oh yes&#8230; the emulator. Another part of the success-story that is the Android SDK. The emulator is extremely slow, and takes up lots of memory. Even on a modern high-end machine like an Intel Core i7, it still takes ages to boot the emulator up. And it actually runs considerably slower than a real phone.</p>
<p>Right, well&#8230; now that I had familiarized myself with the SDK somewhat, it was time to actually develop something. It quickly became apparent that you could not get rid of Java completely: the NDK can produce libraries, but they will have to be called from a Java app through JNI. I&#8217;m no stranger to JNI myself, I have used it before to add a fullscreen option to the Java demo system used for <a href="http://www.pouet.net/prod.php?which=10808">Croissant 9</a> on Windows. It involved a simple DLL that set up DirectDraw, and the Java application could pass its pixelbuffer to it through a JNI call.</p>
<p>But JNI has always been rather obscure, and quite archaic in nature. It&#8217;s difficult to find good documentation on it, especially now that Oracle has taken over Java, and the old Java.sun.com webpages are gone. A lot of dead links. After some digging, I managed to find a reasonable <a href="http://docs.oracle.com/javase/1.4.2/docs/guide/jni/spec/functions.html">overview of the JNI calls</a> though. Looks like nothing changed in the roughly 10 years since I wrote my JNI DLL. The Java side is quite simple: just declare a function with &#8216;native&#8217;, and call System.loadLibrary() in the static constructor to load your native library. The system works entirely via reflection, so all linking is done dynamically at runtime.</p>
<p>The C++ side is not so pretty. Firstly you have to give your functions very specific long names in order to make the reflection work in the JVM. Secondly, you also need to use reflection yourself, when you want to call methods on Java objects. What makes it even more cumbersome to use is that you cannot use any of the data directly. You need to perform marshaling via the JNIEnv object in order to access strings, arrays and such in a form that C++ can use. And since the NDK has only limited functionality, you will need use Java objects and functions in C++ from time to time.</p>
<p>To make matters worse, it does not seem to be possible to debug the native code in the emulator. The only thing I managed to do was to output log messages in Android&#8217;s logCat, so I at least had some idea of what was going on.</p>
<p>And no, the horror does not stop there&#8230; No&#8230; Once I got some basic C++ code running inside a Java app, I tried to compile the actual C++ framework for OpenGL ES, which I had used on the iPhone earlier. Problem #1: My code was for OpenGL ES 2.0, but the emulator only supports 1.0. So I could not use shaders and things, and had to go back to fixed function, and using CPU-emulation for the skinning. And for some reason, the GPU emulation is disabled by default&#8230; So I first had to reconfigure my emulator to get it to work&#8230; Problem #2: There was no STL support in the NDK.</p>
<p>Or at least, that&#8217;s what it looked like originally. Googling for some information turned out to be a wild goose chase. People trying to port <a href="http://ustl.sourceforge.net/">uSTL</a> or <a href="http://www.stlport.org/">STLPort</a> to Android. That&#8217;s what I tried originally, but that didn&#8217;t quite work. Luckily I later found out that Google has since added its own limited STLPort to the SDK. It is just disabled by default. But apparently if you add an Application.mk to your project, you can put the following statement in there to enable STL:</p>
<blockquote><p>APP_STL := stlport_static</p></blockquote>
<p>Now I could finally get my code compiled. Next step was to actually make it work. The biggest problem here is that you can&#8217;t just access files. I wanted to bundle the data files in the Application Package (.apk) itself, much like I did with the iPhone version. But this was not easy&#8230; like pretty much everything else in Android development. Eventually I figured out that I had to put it in the assets-folder. And then you can open it from Java, and get an inputstream. But that is still quite useless, since I needed to access the file in C++, not in Java. So I again had to pull some nasty JNI trickery to finally get things to work.</p>
<p>There were a lot of other minor things that delayed the port further&#8230; but in the end I did more or less get it to work:<a href="http://scalibq.files.wordpress.com/2012/02/androidbhm.png"><img class="aligncenter size-full wp-image-567" title="AndroidBHM" src="http://scalibq.files.wordpress.com/2012/02/androidbhm.png?w=640&#038;h=461" alt="" width="640" height="461" /></a></p>
<p>Okay, so the textures still don&#8217;t work&#8230; But that should just be a formality at this point. I already managed to get the BHM file loaded from the assets, so getting a JPG to load shouldn&#8217;t be much harder. There are some Java helper functions for that, which I will be using.</p>
<p>And here it is, running on an actual phone:<br />
<span style="text-align:center; display: block;"><a href="http://scalibq.wordpress.com/2012/02/04/porting-bhm3dsample-to-android-some-well-a-lot-of-stressful-development/"><img src="http://img.youtube.com/vi/hfGe2wImPOI/2.jpg" alt="" /></a></span><br />
All in all, it wasn&#8217;t a very pleasant experience. Google may have this geeky/nerdy/linux/developer-like image, but the Android development tools are horribly archaic, immature and just generally unprofessional. Apple has done a much better job on the development tools for iPhone and iPad. And this coming from a guy who <a title="Just keeping it real… old skool style" href="http://scalibq.wordpress.com/2011/11/23/just-keeping-it-real-old-skool-style/">likes to code classic Amiga and 16-bit DOS with Hercules/CGA/EGA for fun</a>! This was an absolute nightmare compared to that. I&#8217;d almost be inclined to say that the extra money you spend on the iPhone and getting an iPhone developer account is worth the extra money. Decent tools make your life so much easier!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scalibq.wordpress.com/566/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scalibq.wordpress.com/566/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scalibq.wordpress.com/566/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scalibq.wordpress.com/566/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scalibq.wordpress.com/566/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scalibq.wordpress.com/566/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scalibq.wordpress.com/566/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scalibq.wordpress.com/566/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scalibq.wordpress.com/566/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scalibq.wordpress.com/566/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scalibq.wordpress.com/566/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scalibq.wordpress.com/566/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scalibq.wordpress.com/566/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scalibq.wordpress.com/566/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=566&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scalibq.wordpress.com/2012/02/04/porting-bhm3dsample-to-android-some-well-a-lot-of-stressful-development/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2547913ebf910c8aa2c632619be46e93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">scalibq</media:title>
		</media:content>

		<media:content url="http://scalibq.files.wordpress.com/2012/02/androidbhm.png" medium="image">
			<media:title type="html">AndroidBHM</media:title>
		</media:content>
	</item>
		<item>
		<title>Another root exploit for linux</title>
		<link>http://scalibq.wordpress.com/2012/01/24/another-root-exploit-for-linux/</link>
		<comments>http://scalibq.wordpress.com/2012/01/24/another-root-exploit-for-linux/#comments</comments>
		<pubDate>Tue, 24 Jan 2012 11:16:55 +0000</pubDate>
		<dc:creator>Scali</dc:creator>
				<category><![CDATA[Software development]]></category>
		<category><![CDATA[Software news]]></category>
		<category><![CDATA[bug]]></category>
		<category><![CDATA[exploit]]></category>
		<category><![CDATA[insecure]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[root]]></category>

		<guid isPermaLink="false">http://scalibq.wordpress.com/?p=561</guid>
		<description><![CDATA[A few days ago, the following exploit was published: http://blog.zx2c4.com/749 Another small step in debunking the myth of linux security. What is also interesting is that this bug was introduced only recently: In 2.6.39, the protections against unauthorized access to &#8230; <a href="http://scalibq.wordpress.com/2012/01/24/another-root-exploit-for-linux/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=561&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A few days ago, the following exploit was published: <a href="http://blog.zx2c4.com/749">http://blog.zx2c4.com/749</a></p>
<p>Another small step in debunking the myth of linux security. What is also interesting is that this bug was introduced only recently:</p>
<blockquote><p>In 2.6.39, the protections against unauthorized access to <tt>/proc/<em>pid</em>/mem</tt> were deemed sufficient, and so the prior <tt>#ifdef</tt> that prevented write support for writing to arbitrary process memory <a href="http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=198214a7">was removed</a>.</p></blockquote>
<p>Well, those linux kernel developers sure are geniuses when it comes to writing secure code, aren&#8217;t they? And all those eyes that are allegedly inspecting the source code all the time&#8230; well, this code was submitted in March 2011. So it took many months to find the bug, and it is now widespread. A fix is available now, but there will obviously be tons of unpatched systems out there (people with a false sense of security&#8230; after all, they&#8217;re running linux, right?)</p>
<p>Another interesting tidbit is this:</p>
<blockquote><p>It turns out that <tt>su</tt> on the vast majority of distros is <em>not compiled with <a href="http://en.wikipedia.org/wiki/Position-independent_code">PIE</a></em>, disabling ASLR for the .text section of the binary!</p></blockquote>
<p>Yes, really! Which is interesting actually. I recall when I ported some of my <a title="CPUInfo goes multi-platform" href="http://scalibq.wordpress.com/2009/10/20/cpuinfo-goes-multi-platform/">CPUInfo code </a>to OS X, that I ran into a problem. It did not allow me to use the EBX register freely. This was because the default build options in OS X are to compile everything position-independent. That is probably related to this: position-independent code enables ASLR. I didn&#8217;t have the problem on linux, because I used Ubuntu, which is one of the many distros that does not force PIC. As an aside, Windows relocates code in a <a href="http://www.symantec.com/connect/articles/dynamic-linking-linux-and-windows-part-one">slightly different way</a>. Windows calculates the addresses with the PE loader, and patches them into memory. This takes slightly more time during loading of an executable, but it saves time during execution. An interesting difference in tradeoffs between Windows and linux.</p>
<p>So, a few minus points for linux security again: both in the quality of the kernel code, and in the quality of the default configuration of most linux distros.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scalibq.wordpress.com/561/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scalibq.wordpress.com/561/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scalibq.wordpress.com/561/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scalibq.wordpress.com/561/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scalibq.wordpress.com/561/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scalibq.wordpress.com/561/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scalibq.wordpress.com/561/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scalibq.wordpress.com/561/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scalibq.wordpress.com/561/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scalibq.wordpress.com/561/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scalibq.wordpress.com/561/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scalibq.wordpress.com/561/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scalibq.wordpress.com/561/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scalibq.wordpress.com/561/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=561&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scalibq.wordpress.com/2012/01/24/another-root-exploit-for-linux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2547913ebf910c8aa2c632619be46e93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">scalibq</media:title>
		</media:content>
	</item>
		<item>
		<title>Intel Medfield vs ARM</title>
		<link>http://scalibq.wordpress.com/2012/01/23/intel-medfield-vs-arm/</link>
		<comments>http://scalibq.wordpress.com/2012/01/23/intel-medfield-vs-arm/#comments</comments>
		<pubDate>Mon, 23 Jan 2012 12:28:22 +0000</pubDate>
		<dc:creator>Scali</dc:creator>
				<category><![CDATA[Hardware news]]></category>
		<category><![CDATA[Software news]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ARM]]></category>
		<category><![CDATA[CISC]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[RISC]]></category>
		<category><![CDATA[Windows 8]]></category>
		<category><![CDATA[x86]]></category>

		<guid isPermaLink="false">http://scalibq.wordpress.com/?p=559</guid>
		<description><![CDATA[I did an article on ARM and x86 about a year ago, since multicore ARMs were up and coming, and Windows 8 would allow them to be used in netbooks, notebooks or even desktops, as competing solutions to x86. That &#8230; <a href="http://scalibq.wordpress.com/2012/01/23/intel-medfield-vs-arm/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=559&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I did an article on ARM and x86 <a title="RISC vs CISC, round #213898234" href="http://scalibq.wordpress.com/2011/02/21/risc-vs-cisc-round-213898234/">about a year ago</a>, since multicore ARMs were up and coming, and Windows 8 would allow them to be used in netbooks, notebooks or even desktops, as competing solutions to x86. That was how I saw things at the time: ARM moving towards x86.</p>
<p>But Intel has recently introduced a new generation of Atom processors, codenamed <a href="http://www.anandtech.com/show/5365/intels-medfield-atom-z2460-arrive-for-smartphones">Medfield</a>. This is showing a move in the opposite direction: x86 moving towards tablets and smartphones. The move in itself is not new, as Intel has been trying to get Atom processors in embedded and mobile devices for a while now. But the difference is that Intel is actually succeeding this time.</p>
<p>As I mentioned last time, x86 bears some legacy which makes it inherently larger and more complex than a more modern architecture, such as ARM. However, this extra overhead is more or less a constant factor, where the rest of the CPU design will grow larger over time as more transistors can be fitted on a single die, because of manufacturing progress. For regular desktop, workstation and server systems, the x86 overhead has become a non-issue years ago. With the amount of execution units, caches and everything you find in a modern CPU, the legacy overhead of the x86 becomes insignificant. An added factor is that Intel has always had the most advanced manufacturing. This allowed them to fit more transistors in a smaller space, using less power, which could somewhat compensate for the extra cost of x86 legacy.</p>
<p>For Atom however, the small scale of things meant that Intel could not quite get the upper hand over the competition yet. An important issue with early Atoms was that it consisted of the CPU and a separate chipset, where the competing ARM solutions were a System-on-a-Chip (SoC). As a result, Atoms were not small and energy-efficient enough for mobile devices such as tablets, let alone smartphones.</p>
<p>Then Intel introduced Oak Trail, the first SoC version of Atom. This makes it more compact and more power-efficient. It was still too powerhungry for phones, but <a href="http://arstechnica.com/business/news/2011/04/intels-oak-trail-headed-for-tablet-limbo.ars">tablets were more or less within Intel&#8217;s reach now</a>. But since ARM solutions still had better performance and battery life, Atom-powered tablets never took off.</p>
<p>The new Intel Medfield is an SoC as well, but it looks a lot better than Oak Trail. Oak Trail was built on a 45nm process, but Medfield uses a 32nm process. This puts Intel in its favourite position: a step ahead of the competition, who are still on 40nm process. The result is that <a href="http://www.anandtech.com/show/5262/intel-shows-off-competitive-medfield-x86-android-power-performance">x86 can now hide its legacy very well</a>. And this time we will actually be seeing this Atom SoC in a number of tablets and smartphones.</p>
<p>So where I was expecting ARM to invade Intel&#8217;s turf, Intel is now doing the opposite. I was not expecting Intel to close the gap just yet. But it seems that they are well on their way. And 22nm is just around the corner for Intel, which may allow Atom to take yet another step forward compared to ARM.</p>
<p>This newfound competition is also interesting for ARM itself. Because Intel throws HyperThreading into the mix, which ARM does not have. And Intel&#8217;s SIMD extensions may also be slightly more mature than ARM&#8217;s. So we will have to see how ARM is going to respond to this.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scalibq.wordpress.com/559/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scalibq.wordpress.com/559/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scalibq.wordpress.com/559/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scalibq.wordpress.com/559/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scalibq.wordpress.com/559/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scalibq.wordpress.com/559/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scalibq.wordpress.com/559/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scalibq.wordpress.com/559/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scalibq.wordpress.com/559/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scalibq.wordpress.com/559/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scalibq.wordpress.com/559/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scalibq.wordpress.com/559/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scalibq.wordpress.com/559/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scalibq.wordpress.com/559/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=559&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scalibq.wordpress.com/2012/01/23/intel-medfield-vs-arm/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2547913ebf910c8aa2c632619be46e93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">scalibq</media:title>
		</media:content>
	</item>
		<item>
		<title>Why I don&#8217;t use linux (and why you shouldn&#8217;t either)</title>
		<link>http://scalibq.wordpress.com/2012/01/16/why-i-dont-use-linux-and-why-you-shouldnt-either/</link>
		<comments>http://scalibq.wordpress.com/2012/01/16/why-i-dont-use-linux-and-why-you-shouldnt-either/#comments</comments>
		<pubDate>Mon, 16 Jan 2012 20:28:29 +0000</pubDate>
		<dc:creator>Scali</dc:creator>
				<category><![CDATA[Software news]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[FUD]]></category>
		<category><![CDATA[GPL]]></category>
		<category><![CDATA[secure]]></category>
		<category><![CDATA[UEFI]]></category>
		<category><![CDATA[Windows 8]]></category>
		<category><![CDATA[Microsoft]]></category>

		<guid isPermaLink="false">http://scalibq.wordpress.com/?p=553</guid>
		<description><![CDATA[As you may know, I have nothing against open source software. In fact, I am both a user of FreeBSD, and a developer of open source projects. But linux never sat well with me. It&#8217;s not so much the software &#8230; <a href="http://scalibq.wordpress.com/2012/01/16/why-i-dont-use-linux-and-why-you-shouldnt-either/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=553&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>As you may know, I have nothing against open source software. In fact, I am both a <a title="The big FreeBSD server upgrade" href="http://scalibq.wordpress.com/2009/11/06/the-big-freebsd-server-upgrade/">user of FreeBSD</a>, and a developer of <a title="CPUInfo, an open source multiplatform library for determining CPU features" href="http://scalibq.wordpress.com/2009/10/10/cpuinfo-an-open-source-multiplatform-library-for-determining-cpu-features/">open source</a> <a title="BHM file format and GLUX projects updated" href="http://scalibq.wordpress.com/2011/07/08/bhm-file-format-and-glux-projects-updated/">projects</a>. But linux never sat well with me. It&#8217;s not so much the software itself, as it is the culture. The GPL is not my idea of free software. I think the BSD license offers considerably more freedom. GPL is more of a political manifest if anything. And I am interested in software development, not politics. Aside from that, the <a title="Why do people think that using linux makes them an expert?" href="http://scalibq.wordpress.com/2011/06/26/why-do-people-think-that-using-linux-makes-them-an-expert/">attitude of the linux community</a> does not appeal to me.</p>
<p>Now, back in late September/early October, there was some buzz when word got out that Microsoft wanted to have UEFI secure boot enabled by default for Windows 8 systems. I did not bother to blog about it at that time&#8230; But as new ARM-based devices for Windows 8 are being introduced, the issue is being recycled by the linux community. You get over-the-top articles like <a href="http://hothardware.com/News/Microsoft-Locks-Out-Linux-On-ARM-Systems-Shipping-Windows-8/">this one</a>. Where, as usual, linux is trying to play the victim, and blame everything on evil Microsoft.</p>
<p>Excuse me? But linux is doing it to themselves. A far more balanced article that was released earlier, can be found <a href="http://www.tech-faq.com/linux-licensing-in-conflict-with-secure-boot-support.html">here</a>. The short version is that the GPL (specifically version 3) doesn&#8217;t allow any kind of binary code to be distributed without source code. This restriction means that the secure key for booting cannot be kept a secret. So the GPL is locking linux out from participating in UEFI&#8217;s trusted boot sequence, which is meant to prevent rootkits from installing on your system unnoticed (is that such an evil thing?)</p>
<p>Now, the simple solution would be to create a license that is compatible with UEFI, so linux too can support secure booting. But no. Pragmatic as always, the linux community feels that their license is holy, and that the rest of the world is wrong, and has to adapt to their ways (which has worked just great so far, hasn&#8217;t it?).</p>
<p>And this is the sort of thing that drives me away from linux. I simply don&#8217;t want to be associated with these people and their crazy ideas and conduct in any way. I don&#8217;t need their software. Choice is important, right? Well good, because I can choose alternatives, such as FreeBSD. It&#8217;s free, it&#8217;s open source, and it does everything I need it to. But the community seems nicer. It&#8217;s just one coherent project, instead of tons of distributions-on-distributions, and the focus is on <a href="http://www.freebsd.org/releng/index.html">developing quality software</a>. There are clear goals, a clear vision. If you are a linux user, I suggest you check out <a href="http://www.freebsd.org">FreeBSD</a>. For most types of linux installations, FreeBSD will make a fine alternative, as it is a true UNIX derivative, and most software that is available for linux, is also available for FreeBSD (Apache, mysql, postgresql, KDE, Gnome, VLC, Firefox, Chromium, Thunderbird etc). Under the right circumstances, FreeBSD will even perform better. And you will no longer be involved in all the political nonsense, FUD and distro-wars of the linux community (try asking a question&#8230; no matter what the topic, one of the first answers is always going to be: &#8220;But distro X sucks. You should try distro Y!&#8221; As if that matters&#8230;).</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scalibq.wordpress.com/553/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scalibq.wordpress.com/553/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scalibq.wordpress.com/553/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scalibq.wordpress.com/553/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scalibq.wordpress.com/553/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scalibq.wordpress.com/553/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scalibq.wordpress.com/553/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scalibq.wordpress.com/553/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scalibq.wordpress.com/553/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scalibq.wordpress.com/553/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scalibq.wordpress.com/553/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scalibq.wordpress.com/553/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scalibq.wordpress.com/553/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scalibq.wordpress.com/553/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=553&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scalibq.wordpress.com/2012/01/16/why-i-dont-use-linux-and-why-you-shouldnt-either/feed/</wfw:commentRss>
		<slash:comments>48</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2547913ebf910c8aa2c632619be46e93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">scalibq</media:title>
		</media:content>
	</item>
		<item>
		<title>Just keeping it real, part 5.1</title>
		<link>http://scalibq.wordpress.com/2012/01/06/just-keeping-it-real-part-5-1/</link>
		<comments>http://scalibq.wordpress.com/2012/01/06/just-keeping-it-real-part-5-1/#comments</comments>
		<pubDate>Fri, 06 Jan 2012 14:14:49 +0000</pubDate>
		<dc:creator>Scali</dc:creator>
				<category><![CDATA[Software development]]></category>
		<category><![CDATA[Amiga]]></category>
		<category><![CDATA[bitplane]]></category>
		<category><![CDATA[CGA]]></category>
		<category><![CDATA[demoscene]]></category>
		<category><![CDATA[DOS]]></category>
		<category><![CDATA[EGA]]></category>
		<category><![CDATA[Mode X]]></category>
		<category><![CDATA[oldskool]]></category>
		<category><![CDATA[optimize]]></category>
		<category><![CDATA[PC]]></category>
		<category><![CDATA[polygon]]></category>
		<category><![CDATA[subpixel]]></category>
		<category><![CDATA[unchained]]></category>
		<category><![CDATA[VGA]]></category>

		<guid isPermaLink="false">http://scalibq.wordpress.com/?p=538</guid>
		<description><![CDATA[Picking up where I left off in part 5, the subpixel-corrected polygons on Amiga: The subpixel-correction in itself appeared to work. My analysis of the input terms for the blitter&#8217;s linedrawing appeared to be correct: you can specify the error-term &#8230; <a href="http://scalibq.wordpress.com/2012/01/06/just-keeping-it-real-part-5-1/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=538&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Picking up where I left off in <a title="Just keeping it real, part 5" href="http://scalibq.wordpress.com/2011/12/28/just-keeping-it-real-part-5/">part 5</a>, the subpixel-corrected polygons on Amiga:<br />
<span style="text-align:center; display: block;"><a href="http://scalibq.wordpress.com/2012/01/06/just-keeping-it-real-part-5-1/"><img src="http://img.youtube.com/vi/Bo666nq_sN0/2.jpg" alt="" /></a></span></p>
<p>The subpixel-correction in itself appeared to work. My analysis of the input terms for the blitter&#8217;s linedrawing appeared to be correct: you can specify the error-term in bltapt. Let us look at this formula again:</p>
<blockquote><p>bltapt  = (APTR) (2*Sdelta-Ldelta);</p></blockquote>
<p>The Ldelta-term here represents Denominator/2, and can be replaced by a subpixel prestep, something like this:</p>
<blockquote><p>initialNominator = fraction*2*Ldelta;<br />
bltapt  = (APTR) (2*Sdelta-initialNominator);</p></blockquote>
<p>In other words:</p>
<blockquote><p>initialNominator = fraction*Denominator;<br />
bltapt  = (APTR) (Nominator-initialNominator);</p></blockquote>
<p>Where fraction would be the distance from the start of the line to the hotspot inside that pixel (so a value between 0 and 1). For completeness I&#8217;d like to point out that you have to calculate the Sdelta and Ldelta values from the coordinates at the higher resolution, rather than screen resolution, and you will need to also pre-step the x and y coordinates for your starting point before passing them on to the blitter. Important to note: the blitter can render lines in any direction (I already mentioned that in <a title="Just keeping it real, part 3" href="http://scalibq.wordpress.com/2011/12/04/just-keeping-it-real-part-3/">part 3</a>, when I said I wanted to sort my lines to always render top-down, so that left and right edges would fit properly). This means that you need to perform the subpixel correction in the proper direction as well.</p>
<p>As an aside: I already mentioned that the Amiga Hardware Reference Manual and the System Programmers Guide had slightly different formulas. For some reason the HRM scaled everything up by a factor 2, compared to the SPG. I can see why the SPG uses 2*Sdelta and 2*Ldelta terms: this means you can just use Ldelta as your Denominator/2 value, without losing any precision. I am not sure why the HRM uses 4*dx and 4*dy terms. However, it clearly shows that applying a scale factor to all terms does not affect linedrawing.</p>
<p>Since I use 4-bit subpixel information, my terms are scaled up by 16 already. Therefore I chose to not use the scaled-up versions from SPG or HRM, but just use the dx and dy values as-is. I do not use a Denominator/2 term anyway, since I replace that for subpixel-correction, and I use higher precision anyway, so for me there is no real advantage there.</p>
<h3>But it is still broken&#8230;</h3>
<p>So far, nothing new. I have just given a slightly more in-depth explanation of how the idea of subpixel-correction on blitter lines works. For line-drawing, this works well enough. However, as we&#8217;ve seen above, polygons are still buggy. So what is wrong here?</p>
<p>Well, this again has to do with the blitter rendering rendering lines in different directions. As long as the blitter renders them top-down (the cases where abs(dx) &lt; abs(dy)), things work as expected for polygon edges. However, for the cases where it renders them left-to-right or right-to-left (abs(dy) &lt; abs(dx)), there is that problem again of making the edges meet properly. The line may not end exactly on the scanline you intended, and as a result, there is either a pixel too few or a pixel too many on the scanline, causing the filler to overshoot and fill the entire scanline.</p>
<p>At first I tried to mess about with the line setup to try and fix this problem, but so far I have not come up with a working variation. So then I decided to make a hybrid routine instead: for the cases where abs(dy) &lt; abs(dx), I use a CPU routine, which still draws it top-down, and puts pixels in all the right places. Luckily, these are generally the lines that require the fewest pixels (since we always draw top-down, abs(dy) is the number of pixels we draw), so the blitter still takes care of most of the workload. The CPU routine can also work in parallel with the blitter, so it is not even all that bad.</p>
<p>The only problem-case here is when the polygon is less than 2 pixels wide. If both edges draw at the same pixel, the filler will not work properly, as we&#8217;ve seen in <a title="Just keeping it real, part 3" href="http://scalibq.wordpress.com/2011/12/04/just-keeping-it-real-part-3/">part 3</a>. Back then I proposed the solution of using XOR-mode when drawing pixels. This way, when two pixels are drawn on top of eachother, the second pixel will turn the first pixel back off, so the filler will not do anything there.</p>
<p>This solution works perfectly for our hybrid subpixel-correct renderer, since we now render exactly 2 pixels on every scanline. So we use the blitter to draw in XOR-mode, and we also use a XOR operation to draw the pixels with the CPU. We do not need any other tricks, like throwing away the first pixel of a line. And there we have it then: a blitter-accelerated subpixel-correct polygon drawer on the custom hardware of a 1985 home computer:<br />
<span style="text-align:center; display: block;"><a href="http://scalibq.wordpress.com/2012/01/06/just-keeping-it-real-part-5-1/"><img src="http://img.youtube.com/vi/f4a7rxGL7b0/2.jpg" alt="" /></a></span></p>
<p>I am getting 4-bit subpixel precision here, which is as good as early 3d accelerators from the mid-90s on PC. Quite bizarre actually. Is this just an undocumented feature? I don&#8217;t recall ever having seen subpixel-correct lines or polygons on a regular Amiga. But as usual, Amiga makes it possible!</p>
<h3>On to some other old junk</h3>
<p>Before I end this post, I would like to share some other small things that I have made in the meantime. Namely, on PC I had made CGA, EGA and VGA-optimized polygon fillers. But there are more early graphics standards. One of them is Hercules, which is actually the first graphics standard I ever used on PC. My first PC came with a Plantronics onboard adapter, which was compatible with both Hercules and CGA, and also had a special 16-colour mode. At first I only had a monochrome monitor, so Hercules was all I could use. It wasn&#8217;t even that bad, really. Sure, it was monochrome, but the resolution was 720&#215;348 pixels, which was incredibly high at the time. CGA could only do 640&#215;200, EGA did 640&#215;350, and VGA did 640&#215;480.</p>
<p>Anyway, I decided to give it a go. I tried to look at the ah=0 int10h setvideomode function to see which mode it would be&#8230; Shock! Horror! There *is* no mode for Hercules. Apparently Hercules does not have any BIOS API, so the only way to set a videomode is to manually reprogram all registers. Luckily I found the right register settings on the internet somewhere. And before long I could switch to graphics mode and back.</p>
<p>Then I had to figure out how to address each pixel in memory. Hercules is quite quirky that way. The scanlines are stored in an 4-way interleaved arrangement. Each scanline is just as you expect: 720 pixels packed into bytes, giving a total of 720/8 = 90 bytes. But the addressing of the scanlines is like this:</p>
<blockquote><p>Y MOD 4 == 0 at B000:0000 + (Y/4)*90<br />
Y MOD 4 == 1 at B000:2000 + (Y/4)*90<br />
Y MOD 4 == 2 at B000:4000 + (Y/4)*90<br />
Y MOD 4 == 3 at B000:6000 + (Y/4)*90</p></blockquote>
<p>So, now that the addressing is worked out, it&#8217;s time for the final details. Hercules uses a Motorola 6845 CRT controller, just like CGA (and EGA/VGA are near-100% compatible extensions of the 6845). The main difference is that monochrome adapters have their I/O ports based at 3B0h rather than at 3D0h for colour adapters (so that both can co-exist in the same system). Hercules comes with 64kb of memory, which means it supports 2 pages of memory. A single screen takes 720*348/8 = 32kb of memory. The second page is at segment B800. This is the same segment as is normally used by CGA. Which means that you can use the second page, but only if you do not have CGA-compatible card in your system as well (the first page is always available, so dual monitor setups are possible, as long as one is MDA/Hercules and the other is CGA/EGA/VGA-compatible).</p>
<p>Assuming that the Hercules card is the only one in the system, we can use double-buffering in video memory, just like on EGA and unchained VGA modes. Porting my polygon routine was quite straightforward from here on in. There was a slight problem however: the routine only supported flatshading, and Hercules has only two shades: black and white (or amber, green, or whichever other colour your monochrome display may use). So I decided to implement a simple dithering scheme, so that you could discern the individual faces:<br />
<span style="text-align:center; display: block;"><a href="http://scalibq.wordpress.com/2012/01/06/just-keeping-it-real-part-5-1/"><img src="http://img.youtube.com/vi/6KM9tsiWYjY/2.jpg" alt="" /></a></span></p>
<p>Yes, it&#8217;s rather flickery, because the vsync does not appear to work correctly. I&#8217;m not sure if that&#8217;s dosbox&#8217; fault, or if the vsync bit on the 6845&#8242;s status register does not work on real Hercules hardware. But it will have to do.</p>
<h3>PCjr and Tandy</h3>
<p>Although I have now REALLY covered every single graphics card I ever owned, there was still one graphics standard that was reasonably popular in the early days: the enhanced 16-colour mode of IBM&#8217;s PCjr, and the clones made by Tandy.  Okay, I have no support for the Plantronics mode on my first PC, but I no longer have that PC, and I don&#8217;t think dosbox is compatible with it&#8230; It seems easy enough to add support for it though: It is like CGA, but with two extra even/odd bitplanes at segment BC00h. It combines the 2-bit pixels from B800 and BC00 to a 4-bit pixel.</p>
<p>Right, now onto PCjr/Tandy, because that mode IS supported by dosbox. This is yet another 16-colour mode, it does not work like EGA, and not like Plantronics either. Instead, it uses a packed-pixel format like CGA, so now there are two 4-bit pixels packed into each byte. And where CGA has even/odd planes at B8000 and BA000, PCjr/Tandy has 4 scanline-interleaved planes, much like Hercules, at B800, BA000, BC00 and BE00.</p>
<p>So PCjr/Tandy does not lend itself very well to fast polygon filling. With just 2 pixels per byte, and no special trickery to fill multiple planes at a time, it is not going to be all that efficient. But I&#8217;ve implemented it anyway, just to complete the whole set of graphics adapters:<br />
<span style="text-align:center; display: block;"><a href="http://scalibq.wordpress.com/2012/01/06/just-keeping-it-real-part-5-1/"><img src="http://img.youtube.com/vi/HMyKn86OJ1Q/2.jpg" alt="" /></a></span></p>
<p>And well, that&#8217;s it for now. I am not sure what I am going to do next. As I already mentioned in <a title="Just keeping it real… old skool style" href="http://scalibq.wordpress.com/2011/11/23/just-keeping-it-real-old-skool-style/">part 1</a>, I may explore the graphics capabilities (or lack thereof) of the Commodore 64, or I may evolve these simple polygon routines into a more complete engine, allowing some simple objects to be animated on screen.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scalibq.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scalibq.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scalibq.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scalibq.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scalibq.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scalibq.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scalibq.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scalibq.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scalibq.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scalibq.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scalibq.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scalibq.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scalibq.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scalibq.wordpress.com/538/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=538&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scalibq.wordpress.com/2012/01/06/just-keeping-it-real-part-5-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2547913ebf910c8aa2c632619be46e93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">scalibq</media:title>
		</media:content>
	</item>
		<item>
		<title>nVidia&#8217;s shader use for tessellation is NOT, I repeat NOT different from AMD&#8217;s</title>
		<link>http://scalibq.wordpress.com/2012/01/03/nvidias-shader-use-for-tessellation-is-not-i-repeat-not-different-from-amds/</link>
		<comments>http://scalibq.wordpress.com/2012/01/03/nvidias-shader-use-for-tessellation-is-not-i-repeat-not-different-from-amds/#comments</comments>
		<pubDate>Tue, 03 Jan 2012 07:26:37 +0000</pubDate>
		<dc:creator>Scali</dc:creator>
				<category><![CDATA[Direct3D]]></category>
		<category><![CDATA[Hardware news]]></category>
		<category><![CDATA[OpenGL]]></category>
		<category><![CDATA[Software development]]></category>
		<category><![CDATA[Software news]]></category>
		<category><![CDATA[DirectX 11 Direct3D OpenGL Tessellation AMD nVidia Radeon GeForce shader]]></category>

		<guid isPermaLink="false">http://scalibq.wordpress.com/?p=529</guid>
		<description><![CDATA[For some reason I keep reading the same misinformed nonsense about tessellation. A lot of people seem to think that nVidia somehow &#8216;emulates&#8217; tessellation on their shaders while AMD has a fixed-function unit. And for some reason they think that &#8230; <a href="http://scalibq.wordpress.com/2012/01/03/nvidias-shader-use-for-tessellation-is-not-i-repeat-not-different-from-amds/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=529&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>For some reason I keep reading the same misinformed nonsense about tessellation. A lot of people seem to think that nVidia somehow &#8216;emulates&#8217; tessellation on their shaders while AMD has a fixed-function unit. And for some reason they think that this will bottleneck nVidia&#8217;s hardware, while this will not happen on AMD (which is rather ironic, since as we all know, <a title="AMD and tessellation: A difficult relationship" href="http://scalibq.wordpress.com/2011/12/24/amd-and-tessellation-a-difficult-relationship/">AMD is the one being bottlenecked</a> in tessellation) Let me try to make it clear:</p>
<p><strong>There is no such thing! nVidia&#8217;s shader use for tessellation is NOT, I repeat NOT different from AMD&#8217;s!!!</strong></p>
<p>I have no idea where this nonsense originally came from, but it is rather annoying that so many people keep repeating it. The simple truth is that ALL vendors use shaders for tessellation, since that is simply how the pipeline in Direct3D 11 is designed. See Microsoft&#8217;s explanation for more detail: <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/ff476882(v=vs.85).aspx">http://msdn.microsoft.com/en-us/library/windows/desktop/ff476882(v=vs.85).aspx</a></p>
<p>The short version is this:</p>
<blockquote><p>Vertex shader -&gt; <strong>Hull shader</strong> -&gt; Tessellator -&gt; <strong>Domain Shader</strong> -&gt; Geometry Shader -&gt; Pixel shader</p></blockquote>
<p>In bold, you see the two new types of shaders that were added in the Direct3D 11 pipeline. These types of shaders have been added to the pipeline for the simple reason that the tessellation is programmable. So any vendor implementing a Direct3D 11-compatible GPU will be using shaders for tessellation. Since shaders have been unified since Direct3D 10, the hull and domain shader will be executed by the same shader units as all the other types of shaders. That is simply how Direct3D 11 works, regardless of brand.</p>
<p>The difference between AMD and nVidia is in the part between the hull and domain shader stages: the tessellator. The tessellator itself is a fixed-function unit. The difference between AMD and nVidia here is that nVidia has implemented the tessellator in a parallel way. nVidia calls this <a href="http://www.nvidia.com/object/IO_86775.html">PolyMorph</a>. In short, what happens is this:</p>
<p>The hull shader gets the source geometry, and does some calculations to decide how many new triangles to add (the magical tessellation factors). The tessellator then adds the extra triangles, and the domain shader can do some final calculations to position the new triangles correctly. The bottleneck in AMD&#8217;s approach is that it is implemented as a conventional pipeline. Where you&#8217;d normally pass a single triangle through the entire pipeline, you now get an &#8216;explosion&#8217; of triangles at the tessellation stage. All these extra triangles need to be handled by the same pipeline that was only designed to handle single triangles. As a result, the rasterizer and pixel shaders get bottlenecked: they can only handle a single triangle at a time. This problem was already apparent in Direct3D 10, where the geometry shader could do some very basic tessellation as well, adding extra triangles on-the-fly. This was rarely used in practice, because it was often slower than just feeding a more detailed mesh through the entire pipeline.</p>
<p>nVidia decided to tackle this problem head-on: their tessellator is not just a single unit that tries to stuff all the triangles through a single pipeline. Instead, nVidia has added 16 geometry engines. There is now extra hardware to handle the &#8216;explosion&#8217; of triangles that happens through tessellation, so that the remaining stages will not get bottlenecked. There are extra rasterizers to set up the triangles, and feed the pixel shaders efficiently.</p>
<p>With AMD it is very clear just how much they are bottlenecked: the tessellator is the same on many of their cards. A <a href="http://www.geeks3d.com/20100826/tessmark-opengl-4-gpu-tessellation-benchmark-comparative-table/">Radeon 5770 will perform roughly the same as a 5870</a> under high tessellation workloads. The Radeon 5870 may have a lot more shader units than the 5770, but the bottlenecking that occurs at the tessellator means that they cannot be fed. So the irony is that things work exactly the opposite of what people think: AMD is the one whose shaders get bottlenecked at high tessellation settings. nVidia&#8217;s hardware scales so well with tessellation because they have the extra hardware that allows them to *use* their shaders efficiently, ie NOT bottlenecked.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scalibq.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scalibq.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scalibq.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scalibq.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scalibq.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scalibq.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scalibq.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scalibq.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scalibq.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scalibq.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scalibq.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scalibq.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scalibq.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scalibq.wordpress.com/529/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=529&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scalibq.wordpress.com/2012/01/03/nvidias-shader-use-for-tessellation-is-not-i-repeat-not-different-from-amds/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2547913ebf910c8aa2c632619be46e93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">scalibq</media:title>
		</media:content>
	</item>
		<item>
		<title>Just keeping it real, part 5</title>
		<link>http://scalibq.wordpress.com/2011/12/28/just-keeping-it-real-part-5/</link>
		<comments>http://scalibq.wordpress.com/2011/12/28/just-keeping-it-real-part-5/#comments</comments>
		<pubDate>Wed, 28 Dec 2011 22:56:28 +0000</pubDate>
		<dc:creator>Scali</dc:creator>
				<category><![CDATA[Software development]]></category>
		<category><![CDATA[Amiga]]></category>
		<category><![CDATA[bitplane]]></category>
		<category><![CDATA[CGA]]></category>
		<category><![CDATA[demoscene]]></category>
		<category><![CDATA[DOS]]></category>
		<category><![CDATA[EGA]]></category>
		<category><![CDATA[Mode X]]></category>
		<category><![CDATA[oldskool]]></category>
		<category><![CDATA[optimize]]></category>
		<category><![CDATA[PC]]></category>
		<category><![CDATA[polygon]]></category>
		<category><![CDATA[subpixel]]></category>
		<category><![CDATA[unchained]]></category>
		<category><![CDATA[VGA]]></category>

		<guid isPermaLink="false">http://scalibq.wordpress.com/?p=514</guid>
		<description><![CDATA[Ah, there it is at last, part 5! In case you were wondering why the previous two posts were not just part 5 and 6 already, well that is because they were not planned originally, and they were just things &#8230; <a href="http://scalibq.wordpress.com/2011/12/28/just-keeping-it-real-part-5/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=514&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Ah, there it is at last, part 5! In case you were wondering why the <a title="Just keeping it real, part 4.5" href="http://scalibq.wordpress.com/2011/12/17/just-keeping-it-real-part-4-5/">previous</a> <a title="Just keeping it real, part 4.6" href="http://scalibq.wordpress.com/2011/12/21/just-keeping-it-real-part-4-6/">two</a> posts were not just part 5 and 6 already, well that is because they were not planned originally, and they were just things that I discovered as I went along. They were just addenda, or even errata. I wanted to do part 5 as an encore-piece that improves the rendering quality a bit. I&#8217;m not sure if this is going to be the last piece in the series&#8230; that depends on how much further I would like to take it. There are tons of little tricks and things to make 3d rendering faster, especially on such old and low-spec machines, which I may or may not cover in the future.</p>
<h3>Corrections</h3>
<p>As I mentioned earlier, my 486 donut has subpixel-correction. If you are not familiar with that, I&#8217;ll try to explain it in a nutshell, but I think it&#8217;s one of those things that are difficult to explain, yet easy once you get it. The name &#8216;subpixel&#8217; already suggests that there is something &#8216;below&#8217; the pixel, smaller than pixel resolution. And indeed, it is perfectly possible to have fractional coordinates. When you are doing rotations, scaling and other transforms (either in 2d or 3d), you will usually get coordinates with fractions, which are mapped onto integer coordinates for the pixel grid.</p>
<p>It is this mapping onto integer coordinates that may cause a problem. Namely, you should not just look at pixels as being infinitely small dots. You should consider each pixel to occupy a rectangular area. To demonstrate what I mean, I will show two different lines drawn on a pixel grid:</p>
<p><a href="http://scalibq.files.wordpress.com/2011/12/subpixel1.png"><img class="aligncenter size-full wp-image-517" title="Subpixel1" src="http://scalibq.files.wordpress.com/2011/12/subpixel1.png?w=640" alt=""   /></a></p>
<p><a href="http://scalibq.files.wordpress.com/2011/12/subpixel2.png"><img class="aligncenter size-full wp-image-518" title="Subpixel2" src="http://scalibq.files.wordpress.com/2011/12/subpixel2.png?w=640" alt=""   /></a></p>
<p>The blue line shows the actual line at 16 times higher resolution (so we have 4-bits of &#8216;subpixel&#8217; information in this line). The red pixels show the pixels that the line passes through, so this is how the line looks when rendered at the actual pixel resolution. Now the key thing to note here is that the blue lines start and end in the same pixel in both images, but the different subpixel positions within the pixel result in different pixels being traversed by the line on the screen.</p>
<p>The naive approach would be to just round the fractional coordinates to the nearest integer. Effectively you&#8217;re just snapping every coordinate to the center of each pixel. The result is that both lines will be rendered like this:</p>
<p><a href="http://bohemiq.scali.eu.org/NoSubpixel.png"><img class="aligncenter" src="http://bohemiq.scali.eu.org/NoSubpixel.png" alt="" width="241" height="76" /></a></p>
<p>The second line could be a result of the first line being rotated a little. Small movements are where subpixel-correct rendering is most obvious. Instead of lines or edges just hopping from one pixel to the next, you will see a more &#8216;flowing&#8217; movement as the line changes even when the endpoints are still within the same pixels.</p>
<p>Subpixel correction is an extra step that you do before you rasterize the pixels. You correct for the subpixel position of your starting point. Instead of just &#8216;snapping&#8217; pixels to the center by rounding the coordinate, you define a &#8216;hotspot&#8217; inside your pixel, which will be the exact point in subpixel coordinates that will be projected to the pixels on screen. Since you generally rasterize from left-to-right and from top-to-bottom, the most common choice is to take the bottom-right corner of each pixel as the hotspot. You then calculate the distance to the hotspot from your fractional coordinates, and take a &#8216;pre-step&#8217; to land exactly on the hotspot coordinate. Once you are on the hotspot, each next hotspot is exactly one pixel away, both horizontally and vertically. This means that you can just rasterize with integer steps as usual. So apart from the pre-step, there is no extra cost in subpixel-correct rendering. The bulk of the work is still the actual rasterizing. So I was wondering if it would be realistic to add subpixel correction to my low-end polygon routines.</p>
<h3>To rasterize, or not to rasterize</h3>
<p>Part of the fun in this little endeavour, for me at any rate, was to experiment with different kinds of rasterizers. It must have been some 10 years since I last wrote a rasterizer. And for such low-end hardware (for PC I not only targeted CGA/EGA/VGA, but also the original 8088 CPU, so with 16-bit registers in real mode. The title of these blogs was a play on that), I wanted to try something out of the ordinary. I started out with a classic Bresenham-based approach, as I mentioned in <a title="Just keeping it real, part 4" href="http://scalibq.wordpress.com/2011/12/13/just-keeping-it-real-part-4/">part 4</a>.</p>
<p>Bresenham has the advantage that you don&#8217;t need any divisions or multiplies at all. However, the Bresenham routine is normally used for lines, where you will always rasterize along the biggest delta. So if (x1 &#8211; x0) &lt; (y1 &#8211; y0), you will rasterize vertically, else you will rasterize horizontally. Then Bresenham works because you know that you will always step less than 1 pixel in the other direction (the &#8216;fraction&#8217;, so to say). When you rasterize a polygon however, you will always rasterize vertically. So now you have the following dilemma for lines that are more horizontal:</p>
<ol>
<li>Will I just loop through the Bresenham routine horizontally until it takes a step vertically?</li>
<li>Or will I pre-calculate how many pixels it always steps horizontally, and modify the Bresenham step to only decide on one extra pixel to step, to account for the fraction?</li>
</ol>
<p>Originally I went with 1. because it does not require any divisions. It works well in the average case, but there are extreme cases of near-horizontal lines, where the amount of iterations can get relatively costly. I then tried 2., which requires a division and a modulo operation during the setup (which is just one division on x86 CPUs, you get the modulo for free). However, the rasterization itself becomes more straightforward now, with just one conditional jump per edge.</p>
<p>But how does one perform subpixel-correction on such a routine? Apparently it is not a common subject with Bresenham algorithms. It seems they are generally seen as integer-only solutions. It is possible to do, however. Namely, you are still processing a fraction in Bresenham. You have the nominator, the denominator, and the error term. You keep adding the nominator to the error term until it reaches or exceeds the denominator. At this point your fraction has stepped over the next integer boundary. With classic Bresenham, the error term is initialized with denominator/2. This means that we start our fraction &#8216;halfway&#8217;. Which is effectively snapping the starting point to the center of the pixel.</p>
<p>So the initial value of the error term is the key to subpixel-correcting a Bresenham algorithm. During the &#8216;pre-step&#8217;, you need to calculate the proper error term at the pixel hotspot. Calculating the nominator and denominator from the fractional coordinates does the rest. If you want to know more about this approach, you might want to read <a href="http://chrishecker.com/Miscellaneous_Technical_Articles">Chris Hecker&#8217;s articles on perspective texture mapping</a>. He uses a Bresenham-derived rasterization approach with subpixel correction.</p>
<p>Well, I decided to implement it, just to finish the routine. I managed to get mine a bit more efficient than Chris Hecker&#8217;s. I used 12.4 fixedpoint coordinates, and managed to fit everything in 32/16-bit div/mod and just 16-bit terms for the fractional stepping. But I wasn&#8217;t happy with it. Namely, to get rid of the iterative nature of Bresenham during rasterizing, you already needed to use a div/mod operation per edge. And the subpixel-correction needed another div/mod per edge (both could be replaced with iterative solutions as well, but still relatively expensive). This completely defeated the original point of using Bresenham to avoid costly divisions and multiplies. So I figured I might as well write another rasterizer, one that is closer to the one I used in my 486 renderer, using regular 16.16 fixedpoint. It only needs one division per edge to set up, and the subpixel prestepping is more straightforward as well. Aside from that, it doesn&#8217;t need as many variables as a Bresenham-style rasterizer. You need 32-bit precision for 16.16, but it splits perfectly over 2 16-bit registers, which means you can access the integer portion of the coordinates immediately. So in retrospect this seems to be the best choice, even on low-spec machines.</p>
<p>To give you an example, let&#8217;s go back to the video I made with the transparent polygon. This one was still not subpixel-corrected, but it already rotates at the low speed I wanted to test the subpixel correction at:<br />
<span style="text-align:center; display: block;"><a href="http://scalibq.wordpress.com/2011/12/28/just-keeping-it-real-part-5/"><img src="http://img.youtube.com/vi/1GL8XbhViik/2.jpg" alt="" /></a></span></p>
<p>And this is what it looks like with subpixel correction:<br />
<span style="text-align:center; display: block;"><a href="http://scalibq.wordpress.com/2011/12/28/just-keeping-it-real-part-5/"><img src="http://img.youtube.com/vi/Rgjb-AGJcU4/2.jpg" alt="" /></a></span></p>
<p>As you can see, you get that smoothly &#8216;flowing&#8217; effect of the edges, making it look far less choppy than the all-integer variation.</p>
<p>Another example is from an <a href="http://bohemiq.scali.eu.org/subpixel.zip">application that Mikael Kalms made</a> long ago (I believe it was to accompany an article on polygon rasterizing, but I cannot find it). This little app allows you to pick the number of subpixel bits to use, and also demonstrates the effect on lighting and texturing. Even 1 bit already makes a difference, and about 3-4 bits give excellent results:<br />
<span style="text-align:center; display: block;"><a href="http://scalibq.wordpress.com/2011/12/28/just-keeping-it-real-part-5/"><img src="http://img.youtube.com/vi/pQMCDpomTK8/2.jpg" alt="" /></a></span></p>
<p>If you played some 3d games or watched 3d demos back in the early 90s, you&#8217;ll probably recognize that &#8216;choppy&#8217; look. Subpixel correction wasn&#8217;t commonplace until the mid-to-late 90s (although various demos, including Crystal Dream, would just spin their objects fast enough that it was not very obvious). At this time the 3d accelerators came into swing as well, which may have had something to do with that. Even early 3DFX VooDoo accelerators already had subpixel-correction (with 3-bit accuracy I believe). Most accelerators focused on accurate, high quality rendering. This also included perspective-correct texturing and texture filtering. The Sony PlayStation was the exception to the rule, with rather unstable rasterizing and distorted, unfiltered textures. Occasionally you will also find early accelerated software, where the coders had apparently not designed their geometry pipeline for subpixel-correct rasterization. As a result, they only pass integer data to the accelerator, making it look as choppy as a non-subpixel corrected renderer. Well-known offenders are Gods, with <a href="http://www.pouet.net/prod.php?which=710">Toys</a> and <a href="http://www.pouet.net/prod.php?which=2852">Incoming Future</a>.</p>
<h3>Windows NT does what?</h3>
<p>While we&#8217;re on the subject of subpixel-correct rendering&#8230; Chris Hecker mentions in his articles that Windows NT is capable of rendering subpixel-correct lines and polygons as well. He does not go into detail however&#8230; But I was curious. So I looked into the API reference to see how that would work. My guess whas that the <a href="http://msdn.microsoft.com/en-us/library/dd145104(v=VS.85).aspx">SetWorldTransform() </a>function would be the answer, as the drawing functions only took integer coordinates. If you would specify a scaling matrix, then all your integer coordinates would be scaled down, and apparently the subpixel-information would be used for rendering. So I made a small test application to try out that theory. By scaling everything down to 1/16, I would effectively have the 4-bit subpixel accuracy that Chris Hecker mentioned. As it turns out, you also need to use <a href="http://msdn.microsoft.com/en-us/library/dd162977(v=VS.85).aspx">SetGraphicsMode()</a> to set the GM_ADVANCED mode, or else the world transform will not be enabled.</p>
<p>Lo and behold: indeed, the GDI <a href="http://msdn.microsoft.com/en-us/library/dd145029(v=VS.85).aspx">LineTo()</a> and <a href="http://msdn.microsoft.com/en-us/library/dd162814(v=VS.85).aspx">Polygon()</a> functions now gave me subpixel-correct results! And apparently this has been part of Windows for a long time, seeing as Hecker&#8217;s article is from 1995. Doing some digging through old MSDN resources showed that apparently it has been supported since NT 3.1 (the current MSDN just reports &#8216;Windows 2000&#8242; for any old functionality, because Microsoft no longer supports the older versions. Kind of a shame, in a way they have rewritten history by doing that). It does not work on any Win9x-derivative however. Although there is an alternative way to use scaled coordinates (by using <a href="http://msdn.microsoft.com/en-us/library/dd145100(v=VS.85).aspx">SetWindowExtEx()</a> and <a href="http://msdn.microsoft.com/en-us/library/dd145098(v=VS.85).aspx">SetViewportExtEx()</a>), which does work on Win9x as well, this does not result in subpixel-correct rendering. So it seems that only the NT-derivatives support it (which must be why Chris Hecker specifically mentions NT).</p>
<p>Which makes me wonder: Did some early Windows accelerator cards already have subpixel-correct rendering? Or even better: was this a requirement? In which case, a flatshaded subpixel-corrected polygon 3d engine could probably be accelerated even before the VooDoo cards arrived. But I have never seen it done. Perhaps because it&#8217;s just too obscure a functionality? Or is it because it was not widely supported, or just not fast enough? Oh well. It&#8217;s an interesting feature to know about.</p>
<h3>But what of the Amiga?</h3>
<p>The polygon routine on the Amiga was also an integer-only one. But that one used the built-in hardware to draw the polygons. Can we make it subpixel-correct as well?</p>
<p>The easy way would be to use the CPU to rasterize the edges, and then use the blitter to fill it. So that was my first try. I just used the same rasterizer as the PC version, but instead of filling entire scanlines, I just plotted the endpoints. Then the blitter would fill as usual. So it is still halfway hardware-accelerated, but it does take quite a bit more CPU time. Something you do not have an awful lot of to begin with, on a classic Amiga.<br />
Anyway, that first attempt resulted in this:<br />
<span style="text-align:center; display: block;"><a href="http://scalibq.wordpress.com/2011/12/28/just-keeping-it-real-part-5/"><img src="http://img.youtube.com/vi/C9Dowbi75OE/2.jpg" alt="" /></a></span></p>
<p>But can we do any better? How does that hardware linedrawer really work? It is some kind of Bresenham algorithm, and as we&#8217;ve seen above, there are ways to subpixel-correct them. From the Amiga Hardware Reference Manual, we get these semi-obscure formulas to initialize a blitter line:</p>
<blockquote><p>bltapt  = (APTR) (4*dy-2*dx);<br />
bltamod = (UWORD)(4*(dy-dx));<br />
bltbmod = (UWORD)(4*dy);</p></blockquote>
<p>(Note however that dx and dy are not necessarily dx and dy here. You first determine the octant for your line, and after that, you sort dx and dy so that dy is always the smallest delta and dx is always the largest).</p>
<p>The System Programmers Guide tells us slighly different formulas:</p>
<blockquote><p>bltapt  = (APTR) (2*Sdelta-Ldelta);<br />
bltamod = (UWORD)(2*Sdelta-2*Ldelta);<br />
bltbmod = (UWORD)(2*Sdelta);</p></blockquote>
<p>They replaced dx and dy with Sdelta and Ldelta (<strong>S</strong>hort and <strong>L</strong>ong respectively). Other than that, apparently they divided all terms by a factor 2, and the resulting line is still equivalent. Now, I was wondering&#8230; why are there three values? For a regular integer Bresenham routine, you would only need a nominator and denominator value, as mentioned before. The error term can be initialized with denominator/2, so it would not have to be specified explicitly. So by the looks of things, these three terms are some form of nominator, denominator and initial error term.</p>
<p>2*Sdelta is the smallest value, so that one is likely to be the nominator. 2*Sdelta-2*Ldelta would be a negative value, since Ldelta is larger than Sdelta by definition. If we disregard the 2*Sdelta terms for a moment, then -2*Ldelta appears to be a denominator, and -Ldelta is denominator/2. For some reason these values are negative (possibly because it counts from -denominator towards 0, since checking for error &lt;= 0 may be easier to implement than checking for error &gt;= denominator in hardware) The 2*Sdelta then would be an offset of 1*nominator, for some reason.</p>
<p>So I started to experiment with the values a bit, and indeed the value stored in bltapt is the initial error term. By replacing the Ldelta term with some other factor, I could control the starting point of the line. I have not quite perfected the routine yet, once again the problem of line endings rear their ugly head, and the blitter fill will leak at certain points. Nevertheless, I think it might be possible to make it work 100%. Currently it looks something like this:<br />
<span style="text-align:center; display: block;"><a href="http://scalibq.wordpress.com/2011/12/28/just-keeping-it-real-part-5/"><img src="http://img.youtube.com/vi/Bo666nq_sN0/2.jpg" alt="" /></a></span></p>
<p>I may get it right one day&#8230; but it is going to take some experimenting to figure out exactly what the blitter is doing when.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scalibq.wordpress.com/514/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scalibq.wordpress.com/514/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scalibq.wordpress.com/514/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scalibq.wordpress.com/514/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scalibq.wordpress.com/514/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scalibq.wordpress.com/514/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scalibq.wordpress.com/514/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scalibq.wordpress.com/514/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scalibq.wordpress.com/514/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scalibq.wordpress.com/514/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scalibq.wordpress.com/514/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scalibq.wordpress.com/514/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scalibq.wordpress.com/514/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scalibq.wordpress.com/514/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scalibq.wordpress.com&amp;blog=16171350&amp;post=514&amp;subd=scalibq&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scalibq.wordpress.com/2011/12/28/just-keeping-it-real-part-5/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2547913ebf910c8aa2c632619be46e93?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">scalibq</media:title>
		</media:content>

		<media:content url="http://scalibq.files.wordpress.com/2011/12/subpixel1.png" medium="image">
			<media:title type="html">Subpixel1</media:title>
		</media:content>

		<media:content url="http://scalibq.files.wordpress.com/2011/12/subpixel2.png" medium="image">
			<media:title type="html">Subpixel2</media:title>
		</media:content>

		<media:content url="http://bohemiq.scali.eu.org/NoSubpixel.png" medium="image" />
	</item>
	</channel>
</rss>
