7/25/2021 William Craig - Admin@wrcraig.com        https://wrcraig.com

Objective of these scripts:

On a Linux system with a basic Apache2 webserver already running, install a system written in the bash scripting language to:
    1. Create index.html for all directories of a web site
    2. Increase security
    3. Use html markup to make reading the index files easier
    4. Allow insertion of highly detailed information, perhaps copied from other sources, to better describe each file contained in the web site
    5. Automatically insert the duration, or length play, for Audio/Video files
    6. Make files available for download
    7. Create a second index sorted on the description as well as the primary index sorted on the file name.

Warranty:

    None whatsoever. Use at your own risk. 
    Always back up your system and data, and experiment using a test area.
    Works on my system using Ubuntu 20.04 or 18.04 with Apache2

Table of Contents

Installation

Files included

Description of files

Using the menu

Dependencies

Directory Structure Of Data Files To Be Indexed:

Using .htaccess and multiple index files on a website

Sample of .htaccess file

Samples of description lines for conversion to index files

Automation via cron jobs



#########################################
Installation
        Extract all the files in the zip file archive into a single directory.
        Open a terminal and change to that directory, then enter "bash menu.bash".

The rar file contains the following files:

 1. ReplSpacesinSubdirsFilenames.bash
 2. Add-eqsign-ToDirectoryNames.bash
 3. UpdateOwnerAndPermissions.bash
 4. UpdateOwnerAndPermissionsRecursively.bash
 5. htaccess-config.bash
 6. functions.bash
 7. htaccess2index.bash
 8. Run-Htaccess2Index-Recursively.bash
 9. enterdescriptions.xls
10. menu.bash
11. README.html
12. Sample.of.index.html.output.png

#########################################

Description of files

1. README.txt

    This file.

2. menu.bash

    Convenient system to access/invoke the scripts that generate web page index files

3. htaccess-config.bash

    This is the config file for variables used in the following four bash scripts.

ALL SCRIPTS MUST RESIDE IN THE SAME DIRECTORY

3a. Run-Htaccess2Index-Recursively.bash

    This script acts in concert with the next script and recursively parse all subdirectories of the directory you specify in the config file.

3b. htaccess2index.bash

    This script creates index.html and index2.html in the directories you specify in the config file.

3c. UpdateOwnerAndPermissionsRecursively.bash

    This script acts in concert with the next script to recursively parse all subdirectories of the directory you specify in the config file.

3d. UpdateOwnerAndPermissions.bash

    Updates file and directory ownerships and permissions in the directories specified above.


---- Other files to make your life easier ----


4. ReplSpacesinSubdirsFilenames.bash (Recursive action)

    Replaces spaces in filenames with a dot for those applications which fail if the filename or directory name has spaces.

5. Add-eqsign-ToDirectoryNames.bash (Recursive action)

    Adds an equal sign to directory names to avoid conflict with similar file names
    (Run number 4 first, this fails if there are any spaces in the directory name)

6. enterdescriptions.xls

    Spreadsheet (.xls) to make entry and formatting of description data easier.

          Sample of one line of the final output of the description data:
    AddDescription "<b>1931, IMDB 7.8, <a href='https://www.imdb.com/video/vi1168638233?playlistId=tt0021884&ref_=tt_ov_vi'>Trailer</a> </b>Henry Frankenstein is a doctor who is trying to discover a way to make the dead walk. He succeeds and creates a monster that has to deal with living again.<hr>" Frankenstein.1931.720p.BluRay.H264.AAC-RARBG.mp4

7. Sample.of.index.html.output.png

    Like the file name says.

8. functions.bash

    A collection of bash functions common to all the scripts.


#########################################

Using the menu
    Menu items are generally intended to be used in sequence

----------------

Menu item 1
    Display this README.html file

----------------

Menu item 2
    Edit the configuration file for all the scripts

----------------

Menu item 3
    Replace spaces with dots in Subdir names and File names
    -To avoid failure of some scripts

----------------

Menu item 4
    Add equals sign to Directory Names
    -to differentiate Directories from similar File Names
    -use item 2 first or this item will fail

----------------

Menu item 5
    Add user id and password
    If you want to restrict access to any directories
    this will allow you to grant access to them for specific users

----------------

Menu item 6
    Delete user id and password
    If you want to restrict access to any directories
    this will remove a specific user who was prior granted access

----------------

Menu item 7
    Enter descriptions
    Edit a spreadsheet to enter descriptive data and export formatted descriptions for conversion to index files
    -Spreadsheet is included in .xls format

----------------

Menu item 8
    UpdateOwnerAndPermissions
    -Ownership and permissions may be wrong when new files are added to a website which can affect the creation of the index files. For this reason, you may want to run this script to update all files and directories before running the index creation scripts.

    -UpdateOwnerAndPermissionsRecursively.bash calls UpdateOwnerAndPermissions.bash to run in each subdirectory.

    -You may direct in the config file whether or not the scripts will run in the BaseDirectory defined in the config file in addition to the default of all of it's subdirectories.

    -MUST RUN AS A SUPERUSER/Root. You will be asked to enter an administrator or sudoer password.

----------------

Menu item 9
    Create html index files
    -Run as a normal user, the script will read all the subdirectories below the base directory specified in the config file, extracting the filename (and the play-length of any A/V files therein). It also reads an Apache2 style "FancyIndex" .htaccess file, or a similarly formatted file, containing descriptive info about the filename (the spreadsheet is included to help you create the FancyIndex files).

    -It then produces html download links to the files and combines the description data from .htaccess (or similar) with the user's headers and footers into files named "index.html" and "index2.html", the first sorted by filename and the second sorted by description. Styles are not used, just plain old html.

    -Once the index files are created you may disable Apache's directory views as a security measure (Options -Indexes). This will help prevent evildoers from browsing freely where there are no index files.

----------------

Menu item 10
    View the error log.



#########################################

Dependencies:

1. Uses the bash scripting language included in most Linux flavors

    1a. Linux bash scripts can now be run on Windows 10, see: https://www.howtogeek.com/249966/how-to-install-and-use-the-linux-bash-shell-on-windows-10/   -I have not verified this as so.

2. Uses "mediainfo" to determine the play-length of any audio or video files (sudo apt install mediainfo). If mediainfo is not installed the duration of media files will not be included in the indexes.

3. ASSUMES that there are no spaces in the filenames or directories. You should use menu item 4 to recursively replace any spaces with dots in the source and sub directories and all filenames.

4. Be sure to differentiate directory names from partial filenames. You can ensure this by adding a tag to the end of directory names. Menu item #4 will make mass changes to append an equals sign to all directory names from subdirectory level 1 or 2 to the lowest subdirectories. If needed, you will have to manually add an equals sign to the BaseDirectory (the highest level)



###############################################

Directory Structure Of Data Files To Be Indexed:

    
 >>> BaseDirectory specified in the config file
             |
             |
             |--->> Level 1 Subdirectory
             |     |
             |     |---> Level 2 Subdirectory
             |          | 
             |          |---> Level 3 Subdirectory
             |          |
             |         Etc.
             |
             |
             |--->> Another Level 1 Subdirectory
             |
            Etc.

#########################################

Using .htaccess and multiple index files on a website

-- Why do we want and index file and/or an .htaccess file on a webserver?

Without an index file, or a properly configured .htaccess file, browsers might be able to freely browse any directory on your server. That is not an optimum way to run a server.

.htaccess is a file that the Apache webserver can use to control the access and display of all directories that do not have a file named index.xxx (where xxx can be htm, html, php or other apache2-acceptable extension).

An index file uses HTML coding to display a more presentable and flexible web page than the .htaccess file.

You can also have both files. The hidden .htaccess file (a hidden file is indicated by the leading dot) can act as a backup to any missing index file just in case some evildoer finds a way to get around your carefully crafted net of web pages with index files.



#######################################

Below is an example of an .htaccess file which, if other configuration options are completed, will frustrate most evildoers and require a userid and password to open the directory (even if you have an index file):

AuthName "Restricted Area"

AuthType Basic

# REQUIRE AN ID AND PASSWORD TO OPEN THIS AND ALL SUBDIRECTORIES (see the Apache2 documentation for details)

AuthUserFile /xxxx/xxx/.htpasswd

require valid-user

# STRONG HTACCESS PROTECTION

<Files ~ "^.*\.([Hh][Tt][Aa])">

order allow,deny

deny from all

</Files>

# Deny access to evil robots site rippers, offline browsers, and other nasty scum

RewriteBase /

RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]

RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]

RewriteCond %{HTTP_USER_AGENT} ^attach [OR]

RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [OR]

RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]

RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR]

RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]

RewriteCond %{HTTP_USER_AGENT} ^Zeus

RewriteRule ^.* - [F,L]

# Disable Directory views. Absence of index.html in a directory will now result in a "Forbidden" message:

Options -Indexes

# Make some file extensions download only, not in-line play:

<FilesMatch "\.(mov|mp3|mp4|jpg|pdf|mkv)$">

ForceType application/octet-stream

</FilesMatch>


# Security Headers
# Important: Before adding this code section to your site, make sure to learn abut their meaning and use. There may be important notes and information that you need to understand regarding each particular directive included in this code snippet.

# <IfModule mod_headers.c>
	Header set X-XSS-Protection "1; mode=block"
	Header set X-Frame-Options "SAMEORIGIN"
	Header set X-Content-Type-Options "nosniff"
	Header always set Strict-Transport-Security "max-age=63072000; includeSubDomains"
	# Header set Content-Security-Policy ... (This is one you REALLY need to study to complete)
	Header set Referrer-Policy "same-origin"
	Header set Feature-Policy "geolocation 'self'; vibrate 'none'"
# </IfModule>

# This code may be added to your site via .htaccess or Apache config.
# If used in .htaccess, leave the if statement commented out or deleted.
# If used in Apache2 config, uncomment the two lines: <IfModule mod_headers.c> and: </IfModule>

Understand that this technique includes commonly used configurations for each of the included headers. You can (and should) go through each one to make sure that the configuration matches the requirements and goals of your site. Also remember to test thoroughly before going live.



(Descriptions of files can be entered in the .htaccess file at this location in the format shown below, but Apache needs to be told to use FancyIndex displays.

For better and easier directory listings we recommend using a separate index.html file as created and processed by these scripts)





#########################################

DESCRIPTION SAMPLES

* A spreadsheet is included to help create file descriptions in the proper format which can be copied into .htaccess, or used by these scripts to create the index files.

* Since double quotes are required around the file description, any additional double quotes within the description may cause very unexpected results.



****** If NOT using the spreadsheet to create the description file

#### Simplest template for file descriptions #####

(note the quotes and the two spaces required; one after AddDescription and another before the file.name:

AddDescription "This is the actual description portion, may include spaces and ht ml." filename_cannot_include.spaces



##### Normal files example #####

AddDescription "<b>My favorite book- </b>How to pack a picnic lunch without a basket.<hr>" Picnic.txt



##### Audio or Video files examples #####

# Note the equals sign at the end of a directory name to differentiate from partial file names

AddDescription "<b>2020, IMDB 6.4, Drama, Horror, Sci-Fi <a href='URLofVideoTrailer'>Trailer</a> </b>At the height of ... it becomes clear that ...<hr>" movie.mp4

AddDescription "<b>1931, IMDB 7.8, <a href='https://www.imdb.com/video/vi1168638233?playlistId=tt0021884&ref_=tt_ov_vi'>Trailer</a> </b>Henry Frankenstein is a doctor who is trying to discover a way to make the dead walk. He succeeds and creates a monster that has to deal with living again.<hr>" Frankenstein.1931.720p.BluRay.H264.AAC-RARBG.mp4

AddDescription "<b>2017, TV Series, </b>A series about...<hr>" Name.of.Directory=

AddDescription "<b>2018, IMDB 6.1, </b>Three schoolgirls and their governesses mysteriously disappear on Valentines Day in 1900.<hr>" Picnic.Directory=

AddDescription "<b>2014, ArtistName </b>anything you want to describe the music file <hr>" MusicFileName.mp3


#Notice that index.html will be sorted on the file names and index2 will be sorted on descriptions


#######################################

Automation via cron jobs

Once your system is configured and working to your satisfaction you may automate the update of owner and permissions (UpdateOwnerAndPermissionsRecursively.bash) and the generation of index files (Run-Htaccess2index-Recursively.bash).

UpdateOwnerAndPermissionsRecursively.bash must be run as a root cron job, while Run-Htaccess2index-Recursively.bash must be run as a user cron job.