GNU/GCC aren't really my preference.
A system can have multiple stacks, especially with an RTOS, making stackchk type calls isn't always viable. You need to manually or mechanically analyze your code, and the call trees to determine a maximal/worst case stack depth, and allocate a suitable size stack. You can also look at this at run-time by placing patterns in the stack memory you can view from your debugger, and you can add guard zones under the stack, and monitor those for containment breaches.
Even when you define specific stack sizes, like in Keil, you can still have issues of the stack crashing into the heap or static initialized space if the stack is inadequate. Certainly there are plenty of examples I've seen where the stack is insufficient to sustain hogs like scanf/printf.
About half way down is an exclusion example
http://www.johanforrer.net/BLACKFIN/index.htmlI think WinARM has some more complex scripts, AVR examples have some, and the web is full of assorted stuff. O'Reilly probably has a tome on linker scripts, the topic is complex. Finding some on-point examples is probably the best way to start, but looking at a broad selection will help give you some scope of the topic.
Remember that AT91BootStrap is in SRAM, so most of the code/image you want to load needs to go into the SDRAM first. You might want to look at how the C Startup code is running (ie before it gets to main), clearing the heap, initializing the statics, etc.
Again consider how booting is staged, AT91BootStrap goes into SRAM, uBoot gets planted at the end of SDRAM, and it loads the linux kernel in the forward part of the SDRAM. If you are using AT91BootStrap to load a generic application out of NAND into SDRAM, the front would be the place to start. The application would then need to shadow the vectors (interrupts) into the first SRAM, and make sure that the SRAM was remapped at zero.