Linux Audio

Check our new training course

Embedded Linux Audio

Check our new training course
with Creative Commons CC-BY-SA
lecture materials

Bootlin logo

Elixir Cross Referencer

Loading...
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
The (New) Linux Kernel Driver Model

Version 0.04

Patrick Mochel	<mochel@osdl.org>

03 December 2001


Overview
~~~~~~~~

This driver model is a unification of all the current, disparate driver models
that are currently in the kernel. It is intended is to augment the
bus-specific drivers for bridges and devices by consolidating a set of data
and operations into globally accessible data structures.

Current driver models implement some sort of tree-like structure (sometimes
just a list) for the devices they control. But, there is no linkage between
the different bus types.

A common data structure can provide this linkage with little overhead: when a
bus driver discovers a particular device, it can insert it into the global
tree as well as its local tree. In fact, the local tree becomes just a subset
of the global tree.

Common data fields can also be moved out of the local bus models into the
global model. Some of the manipulation of these fields can also be
consolidated. Most likely, manipulation functions will become a set
of helper functions, which the bus drivers wrap around to include any
bus-specific items.

The common device and bridge interface currently reflects the goals of the
modern PC: namely the ability to do seamless Plug and Play, power management,
and hot plug. (The model dictated by Intel and Microsoft (read: ACPI) ensures
us that any device in the system may fit any of these criteria.)

In reality, not every bus will be able to support such operations. But, most
buses will support a majority of those operations, and all future buses will.
In other words, a bus that doesn't support an operation is the exception,
instead of the other way around.


Drivers
~~~~~~~

The callbacks for bridges and devices are intended to be singular for a
particular type of bus. For each type of bus that has support compiled in the
kernel, there should be one statically allocated structure with the
appropriate callbacks that each device (or bridge) of that type share.

Each bus layer should implement the callbacks for these drivers. It then
forwards the calls on to the device-specific callbacks. This means that
device-specific drivers must still implement callbacks for each operation.
But, they are not called from the top level driver layer.

This does add another layer of indirection for calling one of these functions,
but there are benefits that are believed to outweigh this slowdown.

First, it prevents device-specific drivers from having to know about the
global device layer. This speeds up integration time incredibly. It also
allows drivers to be more portable across kernel versions. Note that the
former was intentional, the latter is an added bonus.

Second, this added indirection allows the bus to perform any additional logic
necessary for its child devices. A bus layer may add additional information to
the call, or translate it into something meaningful for its children.

This could be done in the driver, but if it happens for every object of a
particular type, it is best done at a higher level.

Recap
~~~~~

Instances of devices and bridges are allocated dynamically as the system
discovers their existence. Their fields describe the individual object.
Drivers - in the global sense - are statically allocated and singular for a
particular type of bus. They describe a set of operations that every type of
bus could implement, the implementation following the bus's semantics.


Downstream Access
~~~~~~~~~~~~~~~~~

Common data fields have been moved out of individual bus layers into a common
data structure. But, these fields must still be accessed by the bus layers,
and sometimes by the device-specific drivers.

Other bus layers are encouraged to do what has been done for the PCI layer.
struct pci_dev now looks like this:

struct pci_dev {
	...

	struct device device;
};

Note first that it is statically allocated. This means only one allocation on
device discovery. Note also that it is at the _end_ of struct pci_dev. This is
to make people think about what they're doing when switching between the bus
driver and the global driver; and to prevent against mindless casts between
the two.

The PCI bus layer freely accesses the fields of struct device. It knows about
the structure of struct pci_dev, and it should know the structure of struct
device. PCI devices that have been converted generally do not touch the fields
of struct device. More precisely, device-specific drivers should not touch
fields of struct device unless there is a strong compelling reason to do so.

This abstraction is prevention of unnecessary pain during transitional phases.
If the name of the field changes or is removed, then every downstream driver
will break. On the other hand, if only the bus layer (and not the device
layer) accesses struct device, it is only those that need to change.


User Interface
~~~~~~~~~~~~~~

By virtue of having a complete hierarchical view of all the devices in the
system, exporting a complete hierarchical view to userspace becomes relatively
easy. This has been accomplished by implementing a special purpose virtual
file system named driverfs. It is hence possible for the user to mount the
whole driverfs on a particular mount point in the unified UNIX file hierarchy.

This can be done permanently by providing the following entry into the
/dev/fstab (under the provision that the mount point does exist, of course):

none     	/devices	driverfs    defaults		0	0

Or by hand on the command line:

~: mount -t driverfs none /devices

Whenever a device is inserted into the tree, a directory is created for it.
This directory may be populated at each layer of discovery - the global layer,
the bus layer, or the device layer.

The global layer currently creates two files - 'status' and 'power'. The
former only reports the name of the device and its bus ID. The latter reports
the current power state of the device. It also be used to set the current
power state.

The bus layer may also create files for the devices it finds while probing the
bus. For example, the PCI layer currently creates 'wake' and 'resource' files
for each PCI device.

A device-specific driver may also export files in its directory to expose
device-specific data or tunable interfaces.

These features were initially implemented using procfs. However, after one
conversation with Linus, a new filesystem - driverfs - was created to
implement these features. It is an in-memory filesystem, based heavily off of
ramfs, though it uses procfs as inspiration for its callback functionality.

Each struct device has a 'struct driver_dir_entry' which encapsulates the
device's directory and the files within.

Device Structures
~~~~~~~~~~~~~~~~~

struct device {
	struct list_head 	bus_list;
	struct iobus		*parent;
	struct iobus		*subordinate;

	char    		name[DEVICE_NAME_SIZE];
	char    		bus_id[BUS_ID_SIZE];

	struct driver_dir_entry	* dir;

	spinlock_t		lock;
	atomic_t		refcount;

	struct device_driver 	*driver;
	void            	*driver_data;
	void    		*platform_data;

	u32             	current_state;
	unsigned char 		*saved_state;
};

bus_list:
	List of all devices on a particular bus; i.e. the device's siblings

parent:
	The parent bridge for the device.

subordinate:
	If the device is a bridge itself, this points to the struct io_bus that is
	created for it.

name:
	Human readable (descriptive) name of device. E.g. "Intel EEPro 100"

bus_id:
	Parsable (yet ASCII) bus id. E.g. "00:04.00" (PCI Bus 0, Device 4, Function
	0). It is necessary to have a searchable bus id for each device; making it
	ASCII allows us to use it for its directory name without translating it.

dir:
	Driver's driverfs directory.

lock:
	Driver specific lock.

refcount:
	Driver's usage count.
	When this goes to 0, the device is assumed to be removed. It will be removed
	from its parent's list of children. It's remove() callback will be called to
	inform the driver to clean up after itself.

driver:
	Pointer to a struct device_driver, the common operations for each device. See
	next section.

driver_data:
	Private data for the driver.
	Much like the PCI implementation of this field, this allows device-specific
	drivers to keep a pointer to a device-specific data.

platform_data:
	Data that the platform (firmware) provides about the device.
	For example, the ACPI BIOS or EFI may have additional information about the
	device that is not directly mappable to any existing kernel data structure.
	It also allows the platform driver (e.g. ACPI) to a driver without the driver
	having to have explicit knowledge of (atrocities like) ACPI.


current_state:
	Current power state of the device. For PCI and other modern devices, this is
	0-3, though it's not necessarily limited to those values.

saved_state:
	Pointer to driver-specific set of saved state.
	Having it here allows modules to be unloaded on system suspend and reloaded
	on resume and maintain state across transitions.
	It also allows generic drivers to maintain state across system state
	transitions.
	(I've implemented a generic PCI driver for devices that don't have a
	device-specific driver. Instead of managing some vector of saved state
	for each device the generic driver supports, it can simply store it here.)



struct device_driver {
        int     (*probe)        (struct device *dev);
        int     (*remove)       (struct device *dev);

        int     (*suspend)      (struct device *dev, u32 state, u32 level);
        int     (*resume)       (struct device *dev, u32 level);
}

probe:
	Check for device existence and associate driver with it.

remove:
	Dissociate driver with device. Releases device so that it could be used by
	another driver. Also, if it is a hotplug device (hotplug PCI, Cardbus), an
	ejection event could take place here.

suspend:
	Perform one step of the device suspend process.

resume:
	Perform one step of the device resume process.

The probe() and remove() callbacks are intended to be much simpler than the
current PCI correspondents.

probe() should do the following only:

- Check if hardware is present
- Register device interface
- Disable DMA/interrupts, etc, just in case.

Some device initialisation was done in probe(). This should not be the case
anymore. All initialisation should take place in the open() call for the
device.

Breaking initialisation code out must also be done for the resume() callback,
as most devices will have to be completely reinitialised when coming back from
a suspend state.

remove() should simply unregister the device interface.


Device power management can be quite complicated, based exactly what is
desired to be done. Four operations sum up most of it:

- OS directed power management.
  The OS takes care of notifying all drivers that a suspend is requested,
  saving device state, and powering devices down.
- Firmware controlled power management.
  The OS only wants to notify devices that a suspend is requested.
- Device power management.
  A user wants to place only one device in a low power state, and maybe save
  state.
- System reboot.
  The system wants to place devices in a quiescent state before the system is
  reset.

In an attempt to please all of these scenarios, the power management
transition for any device is broken up into several stages - notify, save
state, and power down. The disable stage, which should happen after notify and
before save state has been considered and may be implemented in the future.

Depending on what the system-wide policy is (usually dictated by the power
management scheme present), each driver's suspend callback may be called
multiple times, each with a different stage.

On all power management transitions, the stages should be called sequentially
(notify before save state; save state before power down). However, drivers
should not assume that any stage was called before hand. (If a driver gets a
power down call, it shouldn't assume notify or save state was called first.)
This allows the framework to be used seamlessly by all power management
actions. Hopefully.

Resume transitions happen in a similar manner. They are broken up into two
stages currently (power on and restore state), though a third stage (enable)
may be added later.

For suspend and resume transitions, the following values are defined to denote
the stage:

enum{
	SUSPEND_NOTIFY,
	SUSPEND_SAVE_STATE,
	SUSPEND_POWER_DOWN,
};

enum {
	RESUME_POWER_ON,
	RESUME_RESTORE_STATE,
};


During a system power transition, the device tree must be walked in order,
calling the suspend() or resume() callback for each node. This may happen
several times.

Initially, this was done in kernel space. However, it has occurred to me that
doing recursion to a non-bounded depth is dangerous, and that there are a lot
of inherent race conditions in such an operation.

Non-recursive walking of the device tree is possible. However, this makes for
convoluted code.

No matter what, if the transition happens in kernel space, it is difficult to
gracefully recover from errors or to implement a policy that prevents one from
shutting down the device(s) you want to save state to.

Instead, the walking of the device tree has been moved to userspace. When a
user requests the system to suspend, it will walk the device tree, as exported
via driverfs, and tell each device to go to sleep. It will do this multiple
times based on what the system policy is.

[ FIXME: URL pointer to the corresponding utility is missing here! ]

Device resume should happen in the same manner when the system awakens.

Each suspend stage is described below:

SUSPEND_NOTIFY:

This level to notify the driver that it is going to sleep. If it knows that it
cannot resume the hardware from the requested level, or it feels that it is
too important to be put to sleep, it should return an error from this function.

It does not have to stop I/O requests or actually save state at this point.

SUSPEND_DISABLE:

The driver should stop taking I/O requests at this stage. Because the save
state stage happens afterwards, the driver may not want to physically disable
the device; only mark itself unavailable if possible.

SUSPEND_SAVE_STATE:

The driver should allocate memory and save any device state that is relevant
for the state it is going to enter.

SUSPEND_POWER_DOWN:

The driver should place the device in the power state requested.


For resume, the stages are defined as follows:

RESUME_POWER_ON:

Devices should be powered on and reinitialised to some known working state.

RESUME_RESTORE_STATE:

The driver should restore device state to its pre-suspend state and free any
memory allocated for its saved state.

RESUME_ENABLE:

The device should start taking I/O requests again.


Each driver does not have to implement each stage. But, it if it does
implemente a stage, it should do what is described above. It should not assume
that it performed any stage previously, or that it will perform any stage
later.

It is quite possible that a driver can fail during the suspend process, for
whatever reason. In this event, the calling process must gracefully recover
and restore everything to their states before the suspend transition began.

If a driver knows that it cannot suspend or resume properly, it should fail
during the notify stage. Properly implemented power management schemes should
make sure that this is the first stage that is called.

If a driver gets a power down request, it should obey it, as it may very
likely be during a reboot.


Bus Structures
~~~~~~~~~~~~~~

struct iobus {
	struct	list_head 	node;
	struct 	iobus 		*parent;
	struct 	list_head 	children;
	struct 	list_head 	devices;

	struct 	list_head 	bus_list;

	spinlock_t		lock;
	atomic_t		refcount;

	struct 	device 		*self;
	struct	driver_dir_entry * dir;

	char    name[DEVICE_NAME_SIZE];
	char    bus_id[BUS_ID_SIZE];

	struct  bus_driver	*driver;
};

node:
	Bus's node in sibling list (its parent's list of child buses).

parent:
	Pointer to parent bridge.

children:
	List of subordinate buses.
	In the children, this correlates to their 'node' field.

devices:
	List of devices on the bus this bridge controls.
	This field corresponds to the 'bus_list' field in each child device.

bus_list:
	Each type of bus keeps a list of all bridges that it finds. This is the
	bridges entry in that list.

self:
	Pointer to the struct device for this bridge.

lock:
	Lock for the bus.

refcount:
	Usage count for the bus.

dir:
	Driverfs directory.

name:
	Human readable ASCII name of bus.

bus_id:
	Machine readable (though ASCII) description of position on parent bus.

driver:
	Pointer to operations for bus.


struct iobus_driver {
	char    name[16];
	struct  list_head node;

	int     (*scan)         (struct io_bus*);
	int     (*add_device)   (struct io_bus*, char*);
};

name:
	ASCII name of bus.

node:
	List of buses of this type in system.

scan:
	Search the bus for new devices. This may happen either at boot - where every
	device discovered will be new - or later on - in which there may only be a few
	(or no) new devices.

add_device:
	Trigger a device insertion at a particular location.



The API
~~~~~~~

There are several functions exported by the global device layer, including
several optional helper functions, written solely to try and make your life
easier.

void device_init_dev(struct device * dev);

Initialise a device structure. It first zeros the device, the initialises all
of the lists. (Note that this would have been called device_init(), but that
name was already taken. :/)


struct device * device_alloc(void)

Allocate memory for a device structure and initialise it.
First, allocates memory, then calls device_init_dev() with the new pointer.


int device_register(struct device * dev);

Register a device with the global device layer.
The bus layer should call this function upon device discovery, e.g. when
probing the bus.
dev should be fully initialised when this is called.
If dev->parent is not set, it sets its parent to be the device root.
It then does the following:
	- inserts it into its parent's list of children
	- creates a driverfs directory for it
	- creates a set of default files for the device in its directory
	- calls platform_notify() to notify the firmware driver of its existence.


void get_device(struct device * dev);

Increment the refcount for a device.


int valid_device(struct device * dev);

Check if reference count is positive for a device (it's not waiting to be
freed). If it is positive, it increments the reference count for the device.
It returns whether or not the device is usable.


void put_device(struct device * dev);

Decrement the reference count for the device. If it hits 0, it removes the
device from its parent's list of children and calls the remove() callback for
the device.


void lock_device(struct device * dev);

Take the spinlock for the device.


void unlock_device(struct device * dev);

Release the spinlock for the device.



void 	iobus_init(struct iobus * iobus);
struct 	iobus * iobus_alloc(void);
int 	iobus_register(struct iobus * iobus);
void	get_iobus(struct iobus * iobus);
int	valid_iobus(struct iobus * iobus);
void	put_iobus(struct iobus * iobus);
void	lock_iobus(struct iobus * iobus);
void	unlock_iobus(struct iobus * iobus);

These functions provide the same functionality as the device_*
counterparts, only operating on a struct iobus. One important thing to note,
though is that iobus_register() and iobus_unregister() operate recursively. It
is possible to add an entire tree in one call.



int device_driver_init(void);

Main initialisation routine.

This makes sure driverfs is up and running and initialises the device tree.


void device_driver_exit(void);

This frees up the device tree.




Credits
~~~~~~~

The following people have been extremely helpful in solidifying this document
and the driver model.

Randy Dunlap		rddunlap@osdl.org
Jeff Garzik		jgarzik@mandrakesoft.com
Ben Herrenschmidt	benh@kernel.crashing.org