Booting a workstation when its root file system disk is unusable


Condition: You can't use the disk you boot a workstation from

  1. Prepare the file system for diskless operation.

    The files you will be using are exported by majorite server. These are (relatively) complete systems similar to the ones that diskless systems use, and they are booted in the same way. We just pretend for this procedure that the workstation's disk isn't there.

    -Sun-4 systems
    The files to be used are:
    root: majorite:/usr2/export/root/generic-4.1.3
    kvm: majorite:/usr2/export/exec/kvm/generic-4.1.3
    usr: majorite:/usr2/export/exec/sun4c-4.1.1
    swap: majorite:/usr2/export/swap/spare
    -Sun-3 systems
    root: majorite:/usr2/export/root/sun3
    kvm: majorite:/usr2/export/exec/kvm/sun3
    usr: majorite:/usr2/export/exec/sun3
    swap: majorite:/usr2/export/swap/spare
    The files are already set up - none of the workstations of this system type are standalone systems with their own disks.
    -linux-sparc systems
    root: garnet:/ld8/linux/diskless
    Linux systems don't need to distinguish file systems any further. Only root is required.
  2. Make sure the system you are going to boot will wake up with the right identity (the system name). Go to the root file system of the machine you will boot (e.g. cd /usr2/export/root/generic-4.1.3) and make sure that
  3. Make sure the ethernet address of the system you want to boot is in /etc/hosts (a new addition to your site may not yet be in it). If not, add it. If running the Yellow Pages/NIS, update them right now:
    (cd /var/yp ; make hosts)
  4. Make sure these file systems are exported for use by the target system (xxxxx in the examples below). They probably won't be. The permissions are laid out in the /etc/exports file on majorite. The permissions should include,
    
        /usr2/export/swap/spare xxxxx.gly.bris.ac.uk(rw,no_root_squash)
        /usr2/export/root/generic-4.1.3 xxxxx.gly.bris.ac.uk(rw,no_root_squash)
        /usr2/export/exec/kvm/generic-4.1.3 xxxxx.gly.bris.ac.uk(rw,no_root_squash)
        /usr2/export/exec/sun4c-4.1.1 xxxxx.gly.bris.ac.uk(rw,no_root_squash)
        /ld8/linux/diskless -root=xxxxx,access=xxxxx
        
    Use exportfs -a to update these access rules once you changed the file. Then make sure that rpc.mountd is aware of these changed too. To update this awareness, kill -HUP processid where the process id is that of rpc.mountd.
  5. Prepare the network boot daemon for boot requests from xxxxxx. Go to the directory /tftpboot and make sure that xxxxxx's IP address in hexadecimal is a symbolic link (e.g. use the "ln -s" command to make one) to the appropriate boot file for the right machine type. Use the showlinks script in /tftpboot to check.

    An IP address is of the form 137.222.20.128, which in hexadecimal is 89DE1480.

  6. Set up the file /etc/bootparams for xxxxxx. Copy and modify the entry labeled spare in the file to have the name of the system to be booted in the first field. Rewrite the file. If running the Yellow Pages/NIS, update them right now:
    (cd /var/yp ; make bootparams)
  7. Make sure that xxxxxx's 48 bit MUA ethernet address is listed in /etc/ethers. If it isn't, you can discover it in the boot messages in a recent /usr/adm/messages file or by running the program /etc/dmesg and scanning the output early in the booting procedure. If you still can't find it, re-boot xxxxx (the boot won't work) and the workstation will announce its address. If running the Yellow Pages/NIS, update them right now:
    (cd /var/yp ; make ethers)
  8. Make sure that a RARP daemon is running. The machine presently having this facility is majorite. Log in to it and type:
    /etc/rc.d/init.d/rarpd restart
  9. Boot (SunOS). What you say depends on which PROM Monitor program is running. The prompt distinguishes them. The old monitor prompt is ">". The new monitor prompt is "boot". You should say:
    b le() vmunix -s                   (with the old prom monitor)
    boot net vmunix -s                 (with the new prom monitor)

    If you see the message "Waiting for ARP/RARP packet," /etc/ethers isn't updated, or you have no rpc.rarpd running on your central server system - check and retry (see previous step).

    If you see numbers spin by and then a message like

    Line 15 interrupt
    or
    Type help for more information
    and you are dropped back to talking with the bootstrap monitor, your boot file in /tftpboot is bad. Get an appropriate one for your system architecture. You might find one in /usr/stand/boot.sunxx on a running system identical to the one you're trying to boot that you can copy into /tftpboot and then set up the appropriate symbolic link.

    If your system boots and you get messages about "Unable to mount NFS volume" or, worse, "NFS write error 13 on host zzzzz fh xxx y zzzzz ..." your /etc/exports file is not set up properly on your main file server. You must permit root access to the host to be booted up in /etc/exports.

    If your system boots and you get messages like "No bootparam server responding; still trying", then check that your netmask on the rarp server (majorite) is set properly for your network class.

  10. Boot (Linux). Use the new PROM Monitor; its prompt is `ok'. If you see `>', that is the old PROM monitor - type `n' to get to the new one.

    Now type

    boot net vmlinux nfsroot=zz.zz.zz.zz:/ddddd

    where zz.zz.zz.zz is the IP number of the host that has the root file system available via NFS (presently garnet), and /ddddd is the directory containing the root file system. If you get messages about "Root-NFS: Server returned error n while mounting /ddddd" then the your NFS file system isn't exported properly, or you typed the wrong name on the boot command. Check that your /etc/exports file is set up properly on your main file server. You must permit root access to the host to be booted up in /etc/exports.

  11. Check that you are properly booted and using the network server as a file system rather than the local disk attached to the workstation. Look at the output of the df command you should see that:
    sun1:/export/root/generic-4.1.3 /
    sun1:/export/exec/kvm/generic-4.1.3 /usr/kvm
    sun1:/export/exec/sun4c-4.1.1 /usr

    If you see anything else, you may still be using your workstation's disk. Don't proceed further until you've sorted out the problem.