Kernel trouble

For programming questions that do not fit in the other forums.

Moderator:Moderators

halofreak1990
Posts:92
Joined:Thu May 27, 2010 8:54 pm
Location:Netherlands
Kernel trouble

Post by halofreak1990 » Wed Feb 09, 2011 8:50 pm

Lately, I've been having trouble with triple-faulting when the bootloader jumps to my kernel.
Since real hardware doesn't give any info on its registers, I installed Bochs and ran my kernel again.
These are the results:

Code: Select all

Booting from 0000:7c00
interrupt(): gate descriptor is not valid sys seg (vector=0x08)

interrupt(): gate descriptor is not valid sys seg (vector=0x0d)

interrupt(): gate descriptor is not valid sys seg (vector=0x08)

CPU is in protected mode (active)
CS.d_b = 32 bit
SS.d_b = 32 bit
EFER   = 0x00000000
| RAX=0000000000000000  RBX=0000000000000000
| RCX=0000000000000000  RDX=00000000000003d5
| RSP=000000000009fbdb  RBP=00000000c0000400
| RSI=00000000c00000c2  RDI=0000000000000bb1
|  R8=0000000000000000   R9=0000000000000000
| R10=0000000000000000  R11=0000000000000000
| R12=0000000000000000  R13=0000000000000000
| R14=0000000000000000  R15=0000000000000000
| IOPL=0 id vip vif ac vm RF nt of df IF tf sf zf AF PF cf
| SEG selector     base    limit G D
| SEG sltr(index|ti|rpl)     base    limit G D
|  CS:0008( 0001| 0|  0) 00000000 ffffffff 1 1
|  DS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
|  SS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
|  ES:0010( 0002| 0|  0) 00000000 ffffffff 1 1
|  FS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
|  GS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
|  MSR_FS_BASE:0000000000000000
|  MSR_GS_BASE:0000000000000000
| RIP=00000000c0000449 (00000000c0000449)
| CR0=0xe0000011 CR2=0x0000000000000000
| CR3=0x0009c000 CR4=0x00000000
0008:00000000c0000449 (unk. ctxt): push 0x00000000          ; 6a00
exception(): 3rd (13) exception with no resolution, shutdown status is 00h, resetting
bx_pc_system_c::Reset(HARDWARE) called
cpu hardware reset
Seems like there are three interrupts firing that aren't handled properly?
Or is there some other cause I might have missed?

accelleon
Posts:15
Joined:Mon Dec 07, 2009 3:41 am

Re: Kernel trouble

Post by accelleon » Mon Feb 21, 2011 3:06 am

from http://wiki.osdev.org/Bochs:
interrupt(): gate descriptor is not valid sys seg
You have not loaded an IDT, or the IDT is corrupt
ok since pmode is active interrupt 8 is a double fault, and interrupt 0x0D or 13 is a GPF
so your kernel is double faulting, then raises a GPF, then double faults again :/

so from here you need to debug your IDT and find out why its double faulting in the first place.

halofreak1990
Posts:92
Joined:Thu May 27, 2010 8:54 pm
Location:Netherlands

Re: Kernel trouble

Post by halofreak1990 » Tue Feb 22, 2011 8:49 pm

accelleon wrote:...so from here you need to debug your IDT and find out why its double faulting in the first place.
Now there's a problem, the IDT isn't set up until the kernel registers its interrupt handlers, which it can't get to because of the errors taking the system down. As it crashes when the 2nd stage jumps to the kernel, i assume the 2nd stage bootloader is the problem here.

<EDIT>Code snippet removed due to it being unnecessary</EDIT>
Last edited by halofreak1990 on Wed Feb 23, 2011 9:29 pm, edited 1 time in total.

User avatar
Mike
Site Admin
Posts:465
Joined:Sat Oct 20, 2007 7:58 pm
Contact:

Re: Kernel trouble

Post by Mike » Wed Feb 23, 2011 7:38 pm

Hello,

Because your post mentioned that these issues just started occurring lately, I assume that your bootloader has worked fine in the past? Were there any modifications to the bootloader software? Please do note that the bootloader provided by the series was not designed for kernels that exceed 64K. These systems might not load properly.

Please verify that its getting into the kernel -- ie, by placing a CLI+HLT at the entry point, or, if you have multiprocessing support, a CLI and an infinity loop. If it is, and this (the first instruction of the entry point) is the instruction that causes the crash, you will need to verify if the location of your stack, and in-memory kernel image are in tact. If it isnt, please provide more information on where the faulting instruction is with source code.

It appears to me from your log that it gets into the kernel fine, but crashes soon after (possibly on entry.) Need the above information to be sure of this.
Lead Programmer for BrokenThorn Entertainment, Co.
Website: http://www.brokenthorn.com
Email: webmaster@brokenthorn.com

halofreak1990
Posts:92
Joined:Thu May 27, 2010 8:54 pm
Location:Netherlands

Re: Kernel trouble

Post by halofreak1990 » Wed Feb 23, 2011 9:26 pm

Mike wrote:Because your post mentioned that these issues just started occurring lately, I assume that your bootloader has worked fine in the past?
Yes, it did.
Mike wrote:Were there any modifications to the bootloader software? Please do note that the bootloader provided by the series was not designed for kernels that exceed 64K. These systems might not load properly.
I added a line that displayes "Searching for Operating System..." before calling the Load function. And I changed the stack to start at 9FBFF because the previous 9FFF would have the stack inside the kernel. Due to the temporary GDT being in place, I should be able to access that address. Even though that can't be the problem, because even with the stack set at 9FFF it crashed.
Mike wrote:Please verify that its getting into the kernel -- ie, by placing a CLI+HLT at the entry point, or, if you have multiprocessing support, a CLI and an infinity loop. If it is, and this (the first instruction of the entry point) is the instruction that causes the crash, you will need to verify if the location of your stack, and in-memory kernel image are in tact. If it isnt, please provide more information on where the faulting instruction is with source code.

It appears to me from your log that it gets into the kernel fine, but crashes soon after (possibly on entry.) Need the above information to be sure of this.
You were right, a cli/hlt combo at entry into the kernel works, which means the code after it is either faulty, or there's another problem.
By the way, should the screen go black as it hits the cli/hlt?

halofreak1990
Posts:92
Joined:Thu May 27, 2010 8:54 pm
Location:Netherlands

Re: Kernel trouble

Post by halofreak1990 » Thu Feb 24, 2011 8:33 pm

<EDIT>
I've moved the HAL init to the very first thing after the kernel entry sets up the segment registers.
Now, instead of crashing due to exceptions being thrown that aren't handled because the IDT and Exception handlers weren't in place,
Bochs just hangs. It clears my screen to blue, like it should in case of an exception, but doesn't display any text.
The Bochs debugger gives the following line:

Code: Select all

check_cs(0x098e): not a valid code segment !
Which, I guess, means I have a faulty code segment somewhere. The problem is, this error occurs when the kernel is in the DebugUpdateCur() function, which looks like this:

Code: Select all

void DebugUpdateCur(int x, int y)
{
    // get location
    uint16_t cursorLocation = y * 80 + x;

    // send location to vga controller to set cursor
    disable();
    outportb(0x3D4, 14);
    outportb(0x3D5, cursorLocation >> 8); // Send the high byte.
    outportb(0x3D4, 15);
    outportb(0x3D5, cursorLocation);      // Send the low byte.
    enable();
}
Since the kernel_panic function uses this function to set the cursor before printing the Exception details, it simply does nothing.</EDIT>

Andyhhp
Moderator
Posts:387
Joined:Tue Oct 23, 2007 10:05 am
Location:127.0.0.1
Contact:

Re: Kernel trouble

Post by Andyhhp » Mon Feb 28, 2011 7:23 pm

If you are in an exception handler, then that means that you are not cleaning the stack up correctly and attempting to iretd junk off the stack.

I think your solution will be to fix whichever interrupt is being serviced when the kernel_panic is issued.

~Andrew
Image

halofreak1990
Posts:92
Joined:Thu May 27, 2010 8:54 pm
Location:Netherlands

Re: Kernel trouble

Post by halofreak1990 » Wed Mar 02, 2011 6:42 pm

Hmmm, I replaced a couple of

Code: Select all

for(;;);
instances with

Code: Select all

__asm
{
    cli
    hlt
}
blocks in the exception handlers, and recompiled my kernel, and now I get the following two errors when my kernel executes (one error less than before)

Code: Select all

interrupt(): gate descriptor is not valid sys seg (vector=0x0e)
interrupt(): gate descriptor is not valid sys seg (vector=0x08)
I don't get what's wrong here.
Bochs says my segments are set up as follows

Code: Select all

SEG sltr(index|ti|rpl)     base    limit G D
 CS:0008( 0001| 0|  0) 00000000 ffffffff 1 1
 DS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
 SS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
 ES:0010( 0002| 0|  0) 00000000 ffffffff 1 1
 FS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
 GS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
and finally, at the bottom of the CPU registers dump, there's this intriguing line

Code: Select all

(0).[63410608] ??? (physical address not available)

Andyhhp
Moderator
Posts:387
Joined:Tue Oct 23, 2007 10:05 am
Location:127.0.0.1
Contact:

Re: Kernel trouble

Post by Andyhhp » Wed Mar 02, 2011 7:20 pm

That means that you have your IDT correctly.

You have an invalid segment set as the target code segment. It should be 0x08 which is your ring0 code segment.

~Andrew
Image

halofreak1990
Posts:92
Joined:Thu May 27, 2010 8:54 pm
Location:Netherlands

Re: Kernel trouble

Post by halofreak1990 » Wed Mar 02, 2011 10:22 pm

Andyhhp wrote:That means that you have your IDT correctly.
Good to hear I'm at least doing something right.
Andyhhp wrote: You have an invalid segment set as the target code segment. It should be 0x08 which is your ring0 code segment.
No kidding. Just looked up the Instruction pointer, it's at 0xE001FEA6 which is about 3,5 GB
What register is the target code segment? CS, right? But that's already set to 0x08.

Just in case, here's the full CPU register dump from Bochs

Code: Select all

00020405696e[CPU0 ] interrupt(): gate descriptor is not valid sys seg (vector=0x0e)
00020405696e[CPU0 ] interrupt(): gate descriptor is not valid sys seg (vector=0x08)
00020405696i[CPU0 ] CPU is in protected mode (active)
00020405696i[CPU0 ] CS.d_b = 32 bit
00020405696i[CPU0 ] SS.d_b = 32 bit
00020405696i[CPU0 ] EFER   = 0x00000000
00020405696i[CPU0 ] | RAX=000000002badb002  RBX=0000000000000000
00020405696i[CPU0 ] | RCX=0000000000000010  RDX=0000000000000044
00020405696i[CPU0 ] | RSP=000000000009facb  RBP=00000000e001fea6
00020405696i[CPU0 ] | RSI=0000000000000000  RDI=0000000000000b89
00020405696i[CPU0 ] |  R8=0000000000000000   R9=0000000000000000
00020405696i[CPU0 ] | R10=0000000000000000  R11=0000000000000000
00020405696i[CPU0 ] | R12=0000000000000000  R13=0000000000000000
00020405696i[CPU0 ] | R14=0000000000000000  R15=0000000000000000
00020405696i[CPU0 ] | IOPL=0 id vip vif ac vm RF nt of df if tf SF zf af PF CF
00020405696i[CPU0 ] | SEG selector     base    limit G D
00020405696i[CPU0 ] | SEG sltr(index|ti|rpl)     base    limit G D
00020405696i[CPU0 ] |  CS:0008( 0001| 0|  0) 00000000 ffffffff 1 1
00020405696i[CPU0 ] |  DS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
00020405696i[CPU0 ] |  SS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
00020405696i[CPU0 ] |  ES:0010( 0002| 0|  0) 00000000 ffffffff 1 1
00020405696i[CPU0 ] |  FS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
00020405696i[CPU0 ] |  GS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
00020405696i[CPU0 ] |  MSR_FS_BASE:0000000000000000
00020405696i[CPU0 ] |  MSR_GS_BASE:0000000000000000
00020405696i[CPU0 ] | RIP=00000000e001fea6 (00000000e001fea6)
00020405696i[CPU0 ] | CR0=0xe0000011 CR2=0x00000000e001fea6
00020405696i[CPU0 ] | CR3=0x0009c000 CR4=0x00000000
(0).[20405696] ??? (physical address not available)
00020405696e[CPU0 ] exception(): 3rd (13) exception with no resolution, shutdown status is 00h, resetting

Andyhhp
Moderator
Posts:387
Joined:Tue Oct 23, 2007 10:05 am
Location:127.0.0.1
Contact:

Re: Kernel trouble

Post by Andyhhp » Thu Mar 03, 2011 1:12 am

Very sorry - I meant to say that your IDT was set up incorrectly.

The physical entry in the IDT contains (amongst other flags) a CS:EIP pair so that whenever/whereever an interrupt occurs, the processor knows where to jump to service the interrupt.

The vast majority of the times you recieve an interrupt, you will be executing Ring3 code. As all interupts can only usefully be serviced in Ring0, you cant assume that CS of the currently running code is the same as CS with which you want to service the interrupt - In fact they will almost always be different, as Ring3 code nesseserally has the bottom two bits set and Ring0 has the bottom two bits clear.

As for why the IDT entry is corrupt, the most likely answer is that you are trying to be clever with a class/struct to make the IDT entry, and are forgetting to tell it to be a packed structure.

Can you post your code which declares the IDT, and the code which fills it with the relevent service routines?

~Andrew
Image

halofreak1990
Posts:92
Joined:Thu May 27, 2010 8:54 pm
Location:Netherlands

Re: Kernel trouble

Post by halofreak1990 » Sat Mar 05, 2011 1:56 pm

Andyhhp wrote:Can you post your code which declares the IDT, and the code which fills it with the relevent service routines?
Sure, here's the IDT descriptor declaration:

Code: Select all

#ifdef _MSC_VER
#pragma pack (push, 1)
#endif

//! interrupt descriptor
struct idt_descriptor
{
	//! bits 0-16 of interrupt routine (ir) address
	uint16_t		baseLo;

	//! code selector in gdt
	uint16_t		sel;

	//! reserved, shold be 0
	uint8_t			reserved;

	//! bit flags. Set with flags above
	uint8_t			flags;

	//! bits 16-32 of ir address
	uint16_t		baseHi;
};

#ifdef _MSC_VER
#pragma pack (pop)
#endif
And the IDTR struct:

Code: Select all

#ifdef _MSC_VER
#pragma pack (push, 1)
#endif

//! describes the structure for the processors idtr register
struct idtr
{
	//! size of the interrupt descriptor table (idt)
	uint16_t		limit;

	//! base address of idt
	uint32_t		base;
};

#ifdef _MSC_VER
#pragma pack (pop, 1)
#endif
Followed by:

Code: Select all

//! interrupt descriptor table
static struct idt_descriptor	_idt [I86_MAX_INTERRUPTS];

//! idtr structure used to help define the cpu's idtr register
static struct idtr				_idtr;
and

Code: Select all

//! installs idtr into processors idtr register
static void idt_install()
{
#ifdef _MSC_VER
	_asm lidt [_idtr]
#endif
}
This routine sets up an Interrupt Handler:

Code: Select all

//! installs a new interrupt handler
int x86_install_ir(uint32_t i, uint16_t flags, uint16_t sel, I86_IRQ_HANDLER irq)
{
	if (i>I86_MAX_INTERRUPTS)
		return 0;

	if (!irq)
		return 0;

	//! get base address of interrupt handler
	uint64_t		uiBase = (uint64_t)&(*irq);

	//! store base address into idt
	_idt[i].baseLo		=	uint16_t(uiBase & 0xffff);
	_idt[i].baseHi		=	uint16_t((uiBase >> 16) & 0xffff);
	_idt[i].reserved	=	0;
	_idt[i].flags		=	uint8_t(flags);
	_idt[i].sel			=	sel;

	return	0;
}
and this one sets up the IDT:

Code: Select all

//! initialize idt
int x86_idt_initialize(uint16_t codeSel)
{
	//! set up idtr for processor
	_idtr.limit = sizeof(struct idt_descriptor) * I86_MAX_INTERRUPTS -1;
	_idtr.base	= (uint32_t)&_idt[0];

	//! null out the idt
	memset((void*)&_idt[0], 0, sizeof(idt_descriptor) * I86_MAX_INTERRUPTS-1);

	//! register default handlers
	for (int i=0; i<I86_MAX_INTERRUPTS; i++)
		x86_install_ir(i, I86_IDT_DESC_PRESENT | I86_IDT_DESC_BIT32,
			codeSel, (I86_IRQ_HANDLER)x86_default_handler);

	//! install our idt
	idt_install();

	return 0;
}
And in case you'd like to know the constants:

Code: Select all

//! x86 defines 256 possible interrupt handlers (0-255)
#define I86_MAX_INTERRUPTS		256

//! must be in the format 0D110, where D is descriptor type
#define I86_IDT_DESC_BIT16		0x06	//00000110
#define I86_IDT_DESC_BIT32		0x0E	//00001110
#define I86_IDT_DESC_RING1		0x40	//01000000
#define I86_IDT_DESC_RING2		0x20	//00100000
#define I86_IDT_DESC_RING3		0x60	//01100000
#define I86_IDT_DESC_PRESENT	0x80	//10000000

//! interrupt handler w/o error code
//! Note: interrupt handlers are called by the processor. The stack setup may change
//! so we leave it up to the interrupts' implimentation to handle it and properly return
typedef void (_cdecl *I86_IRQ_HANDLER)(void);

User avatar
Mike
Site Admin
Posts:465
Joined:Sat Oct 20, 2007 7:58 pm
Contact:

Re: Kernel trouble

Post by Mike » Mon Mar 07, 2011 4:28 pm

Hello,

The triple fault is caused by a page fault exception at e001fea6. Knowing that this results in a gpf implies that you might have corruption (invalid IDT or exceptions). Knowing that those issues begun recently implies that your IDT code itself must be fine if it has been used in the past.

The display should not go black after a cli+hlt is hit however emulators and virtual machines can handle this situation in different ways. In Bochs it should not. If it does it might imply a buffer overrun error.

If you are using the bootloader provided by the series do please note that it was not designed to load images larger then 64k. If your image is larger it might not load completely. A simple way to test this is trying a different bootloader (Ours, GrUB, LILO, etc) that was designed for it.

If you are able to narrow down where it happens, you can single step in the Bochs debugger to help find the cause.
Lead Programmer for BrokenThorn Entertainment, Co.
Website: http://www.brokenthorn.com
Email: webmaster@brokenthorn.com

halofreak1990
Posts:92
Joined:Thu May 27, 2010 8:54 pm
Location:Netherlands

Re: Kernel trouble

Post by halofreak1990 » Mon Mar 07, 2011 7:33 pm

Mike wrote:The display should not go black after a cli+hlt is hit however emulators and virtual machines can handle this situation in different ways. In Bochs it should not. If it does it might imply a buffer overrun error.
I think you meant that the display should not turn blue.
Anyway, when my kernel enters an exception handler, it clears the screen to blue, before putting up some error details in white text, like the BSODs in Windows.
Since the errors occur when setting the cursor, the kernel is unable to put any text on the screen, so I made the code halt as soon as the screen was turned to blue, so I at least knew it hit an exception handler. I removed the hlt in the exception handler sometime later, so now it gets an exception inside an exception handler.
But the fact that the CPU jumps to addresses outside my kernel means that it's getting corrupted somewhere.
Mike wrote:If you are using the bootloader provided by the series do please note that it was not designed to load images larger then 64k. If your image is larger it might not load completely. A simple way to test this is trying a different bootloader (Ours, GrUB, LILO, etc) that was designed for it.
My kernel image is about 35 KB, so it should load completely.
Mike wrote:If you are able to narrow down where it happens, you can single step in the Bochs debugger to help find the cause.
The error happens when the kernel does some port I/O to the video adapter to set the cursor in the function shown below

Code: Select all

//! Updates hardware cursor
void DebugUpdateCur(int x, int y)
{
    // get location
    uint16_t cursorLocation = y * 80 + x;

    // send location to vga controller to set cursor
    disable();
    outportb(0x3D4, 14);
    outportb(0x3D5, cursorLocation >> 8); // Send the high byte.
    outportb(0x3D4, 15);
    outportb(0x3D5, cursorLocation);      // Send the low byte.
    enable();
}

Andyhhp
Moderator
Posts:387
Joined:Tue Oct 23, 2007 10:05 am
Location:127.0.0.1
Contact:

Re: Kernel trouble

Post by Andyhhp » Mon Mar 07, 2011 7:37 pm

I think what might help is if you identify which exception is happening.

That will give a clue as to the problem. They way I debugged this when it was happening to me was to assign each exception handler a character on a red background that it would print as soon as the exception was started.

~Andrew
Image

Post Reply