Maintain database

The backend of the software is based on Django REST Framework.

Requirements

Python (3.6, 3.7, 3.8, 3.9, 3.10)
Django (2.2, 3.0, 3.1, 3.2, 4.0)

It is recommended to use only the official support of the latest patch release of each Python and Django series.

Configure REST API

Before we start, go through the installation of the Django REST Framework.

Make sure that 'rest_framework' is added to INSTALLED_APPS setting within /backend/backend/settings/base.py.

INSTALLED_APPS = [
    ...
    # external apps
    "rest_framework",
    "rest_framework.authtoken",
    ...

However, since we are using a browsable API, we also have a file where we configure the URL paths for the API: /backend/backend/urls.py.

from accounts.views import UserViewSet
from cds.views import LectureDocumentView, LectureViewSet
from django.contrib import admin
from django.urls import include, path
from rest_framework import routers

# Default router
router = routers.DefaultRouter()

# Register users
router.register(r"users", UserViewSet)

# Register lectures
router.register(r"lectures", LectureViewSet)

# This one is for search
router.register(r"search/lectures", LectureDocumentView, basename="lecturedocument")

The urlpatterns list routes URLs to views. The last one (api-auth/) is responsible for the REST framework's login and logout views.

urlpatterns = [
    path("admin/", admin.site.urls),
    path("api/v1/", include(router.urls)),
    path("api-auth/", include("rest_framework.urls", namespace="rest_framework_auth")),
]

For accessing information about our project's users, we create a read-write API. There is one configuration dictionary named REST_FRAMEWORK that contains all global settings for a REST framework API that can be found in /backend/backend/settings/base.py.

REST_FRAMEWORK = {
    ...
    "DEFAULT_PERMISSION_CLASSES": ("rest_framework.permissions.IsAuthenticated",),
    "DEFAULT_AUTHENTICATION_CLASSES": [
        "rest_framework.authentication.TokenAuthentication",
        "rest_framework.authentication.SessionAuthentication",
    ],
    ...
}

As can be seen, a token is being used for the authentication in order to access the API. This must be generated.

Creating fields in our database

The required fields are defined in /backend/cds/models.py the following way:

class Lecture(models.Model):
    lecture_id = models.IntegerField(unique=True, db_index=True)
    title = models.CharField(max_length=250)
    date = models.DateField(null=True, blank=True)
    ...
    etc.

If the fields are changed or modified in any way, /backend/cds/documents.py:

from django.conf import settings
from django_opensearch_dsl import Document, KeywordField, TextField
from django_opensearch_dsl.registries import registry
from opensearch_dsl import analyzer

from .models import Lecture

@registry.register_document
class LectureDocument(Document):

    names_analyzer = analyzer(
        "name_analyzer",
        tokenizer="letter",
    )

    class Index:
        name = f"{settings.OPENSEARCH_INDEX_PREFIX}-lectures"

    settings = {"number_of_shards": 1, "number_of_replicas": 0}

    # These are fields that contain multiple elements.
    files = TextField(multi=True)
    type = KeywordField(multi=True)
    keywords = KeywordField(multi=True)
    series = TextField()
    sponsor = TextField(analyzer=names_analyzer)
    speaker = TextField(analyzer=names_analyzer)
    subject_category = TextField()

    class Django:
        model = Lecture
        fields = [
            "lecture_id",
            "title",
            "date",
            "corporate_author",
            "abstract",
            "speaker_details",
            "event_details",
            "thumbnail_picture",
            "language",
            "lecture_note",
            "imprint",
            "license",
            # add here, if needed
        ]

and /backend/cds/serializers.py should be updated too.

from django_elasticsearch_dsl_drf.serializers import DocumentSerializer
from rest_framework import serializers

from .documents import LectureDocument
from .models import Lecture


class LectureSerializer(serializers.ModelSerializer):
    class Meta:
        model = Lecture
        fields = "__all__"


class LectureDocumentSerializer(DocumentSerializer):
    class Meta:
        document = LectureDocument
        fields = (
            "lecture_id",
            "title",
            "date",
            "corporate_author",
            "abstract",
            "series",
            "speaker",
            "speaker_details",
            "event_details",
            "thumbnail_picture",
            "language",
            "subject_category",
            "lecture_note",
            "imprint",
            "license",
            "sponsor",
            "keywords",
            "type",
            "files",
            # add here, if needed
        )

Then create a new migration with the following command:

$ python manage.py makemigrations cds

You have created the migration, but to actually make any changes in the database, you have to apply it with:

$ python manage.py migrate`

Configure CORS headers

You might experience some authorization issues when you retrive data from the API for the webpages. Make sure you complete these steps:

Install django-cors-headers on the server side in /backend/settings.py:

$ poetry django-cors-headers

Then make sure that the following is added to /backend/settings.py:

INSTALLED_APPS = [
...
"corsheaders",
]

MIDDLEWARE = [
...
"corsheaders.middleware.CorsMiddleware",
]

Bleach the abstract

Note that the abstract of the lecture must be cleansed before displaying on the UI, because CDS records might have some formatting styles (like HTML tags) that were initially meant for CDS, but they still exist in the metadata. Consequently, they disturb the current UI.

First, add bleach on the terminal by navigating to the folder /backend:

$ poetry add "bleach~=4.1.0"
Then make sure that the following is imported in /backend/cds/models.py:

from bleach import clean

Now within this function the cleansing happens the following way:

def save(self, *args, **kwargs):
        try:
            self.abstract = clean(
                self.abstract,
                strip=True,
                tags=["p", "div", "strong", "span", "ul", "li"],
                attributes={"a": ["href"]},
                strip_comments=True,
            )
        except Exception:
            pass
        super().save(*args, **kwargs)

As can be seen, the abstract is taken, strip will remove entirely invalid markups (strip_comments similarly with built-in comments), and then tags and attributes will only show elements that can be allowed within the abstract.